Statistical and discriminative language modeling for Turkish large vocabulary continuous speech recognition

dc.contributorPh.D. Program in Electrical and Electronic Engineering.
dc.contributor.advisorSaraçlar, Murat.
dc.contributor.authorArısoy, Ebru.
dc.date.accessioned2023-03-16T10:25:02Z
dc.date.available2023-03-16T10:25:02Z
dc.date.issued2009.
dc.description.abstractTurkish, being an agglutinative language with rich morphology, presents challenges for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. First, the agglutinative nature of Turkish leads to a high number of Out-of-Vocabulary (OOV) words which in turn lower Automatic Speech Recognition (ASR) accuracy. Second, Turkish has a relatively free word order that leads to non-robust language model estimates. These challenges have been mostly handled by using meaningful segmentations of words, called sub-lexical units, in language modeling. However, a shortcoming of sub-lexical units is over-generation which needs to be dealt with for higher accuracies. This dissertation aims to address the challenges of Turkish in LVCSR. Grammatical and statistical sub-lexical units for language modeling are investigated and they yield substantial improvements over the word language models. Our novel approach inspired by dynamic vocabulary adaptation mostly recovers the errors caused by over-generation and further improves the accuracy of sub-lexical units. Additionally, discriminative language models (DLMs) with linguistically and statistically motivated features are utilized. DLM outperforms the conventional approaches, partly due to the improved parameter estimates with discriminative training and partly due to integrating the complex language characteristics of Turkish into language modeling. The significance of this dissertation lies in being a comparative study of several sub-lexical units on the same LVCSR system, addressing the over-generation problem of sub-lexical units and extending sub-lexical-based generative language modeling of Turkish to discriminative language modeling. These approaches can be easily extended to other morphologically rich languages that suffer from similar problems.
dc.format.extent30cm.
dc.format.pagesxx, 159 leaves;
dc.identifier.otherEE 2009 A75 PhD
dc.identifier.urihttps://digitalarchive.library.bogazici.edu.tr/handle/123456789/13091
dc.publisherThesis (Ph.D.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2009.
dc.relationIncludes appendices.
dc.relationIncludes appendices.
dc.subject.lcshAutomatic speech recognition.
dc.subject.lcshTurkish language -- Morphology.
dc.titleStatistical and discriminative language modeling for Turkish large vocabulary continuous speech recognition

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
b1634586.008351.001.PDF
Size:
871.26 KB
Format:
Adobe Portable Document Format

Collections