Repository logo

Single-channel speech-music separation for robust ASR with mixture of NMF models

dc.contributorPh.D. Program in Electrical and Electronic Engineering.
dc.contributor.advisorSaraçlar, Murat.
dc.contributor.advisorCemgil, Ali Taylan.
dc.contributor.authorDemir, Cemil.
dc.date.accessioned2023-03-16T10:25:09Z
dc.date.available2023-03-16T10:25:09Z
dc.date.issued2014.
dc.description.abstractIn this dissertation, we analyze the single-channel speech-music separation problem for automatic speech recognition (ASR). The motivation of the study is to increase the performance of the ASR systems by decreasing the effect of background music. We describe a single-channel speech-music separation method based on a mixture of nonnegative matrix factorization (NMF) model. Given a catalog of background music material, we propose a generative model for the superposed speech and music spectrograms. The background music signal is assumed to be generated by a jingle in the catalog and it is modeled by a scaled conditional mixture model representing the jingle. The speech signal is modeled by an NMF model that is estimated in a semi-supervised manner from the mixed signal. The approach is tested with Poisson and complex Gaussian observation models that correspond respectively to Kullback-Leibler (KL) and Itakura-Saito (IS) divergence measures. Our experiments show that the proposed mixture model outperforms a standard NMF method both in speech-music separation and automatic speech recognition (ASR) tasks. Moreover, we extend the mixture of NMF based single-channel speech-music separation method such that it incorporates prior speech information to enhance the separation performance of the method. Finally, we propose to use sub-word NMF-based speech models for the separation of speech and music signals. By applying such a strategy, it is demonstrated that the recognition accuracy can be improved as compared to using a general speech model.
dc.format.extent30 cm.
dc.format.pagesxx, 166 leaves ;
dc.identifier.otherEE 2014 D46 PhD
dc.identifier.urihttps://hdl.handle.net/20.500.14908/13116
dc.publisherThesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2014.
dc.subject.lcshAutomatic speech recognition.
dc.titleSingle-channel speech-music separation for robust ASR with mixture of NMF models

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
b1792314.021732.001.PDF
Size:
3.27 MB
Format:
Adobe Portable Document Format

Collections