Single-channel speech-music separation for robust ASR with mixture of NMF models

Demir, Cemil.

Single-channel speech-music separation for robust ASR with mixture of NMF models

dc.contributor	Ph.D. Program in Electrical and Electronic Engineering.
dc.contributor.advisor	Saraçlar, Murat.
dc.contributor.advisor	Cemgil, Ali Taylan.
dc.contributor.author	Demir, Cemil.
dc.date.accessioned	2023-03-16T10:25:09Z
dc.date.available	2023-03-16T10:25:09Z
dc.date.issued	2014.
dc.description.abstract	In this dissertation, we analyze the single-channel speech-music separation problem for automatic speech recognition (ASR). The motivation of the study is to increase the performance of the ASR systems by decreasing the effect of background music. We describe a single-channel speech-music separation method based on a mixture of nonnegative matrix factorization (NMF) model. Given a catalog of background music material, we propose a generative model for the superposed speech and music spectrograms. The background music signal is assumed to be generated by a jingle in the catalog and it is modeled by a scaled conditional mixture model representing the jingle. The speech signal is modeled by an NMF model that is estimated in a semi-supervised manner from the mixed signal. The approach is tested with Poisson and complex Gaussian observation models that correspond respectively to Kullback-Leibler (KL) and Itakura-Saito (IS) divergence measures. Our experiments show that the proposed mixture model outperforms a standard NMF method both in speech-music separation and automatic speech recognition (ASR) tasks. Moreover, we extend the mixture of NMF based single-channel speech-music separation method such that it incorporates prior speech information to enhance the separation performance of the method. Finally, we propose to use sub-word NMF-based speech models for the separation of speech and music signals. By applying such a strategy, it is demonstrated that the recognition accuracy can be improved as compared to using a general speech model.
dc.format.extent	30 cm.
dc.format.pages	xx, 166 leaves ;
dc.identifier.other	EE 2014 D46 PhD
dc.identifier.uri	https://hdl.handle.net/20.500.14908/13116
dc.publisher	Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2014.
dc.subject.lcsh	Automatic speech recognition.
dc.title	Single-channel speech-music separation for robust ASR with mixture of NMF models

Files

Original bundle

Now showing 1 - 1 of 1

Name:: b1792314.021732.001.PDF
Size:: 3.27 MB
Format:: Adobe Portable Document Format

Download

Collections

Ph.D. Theses