Statistical language models for large vocabulary Turkish speech recognition

dc.contributorGraduate Program in Electrical and Electronic Engineering.
dc.contributor.advisorArslan, Levent M.
dc.contributor.authorDutağacı, Helin.
dc.date.accessioned2023-03-16T10:16:43Z
dc.date.available2023-03-16T10:16:43Z
dc.date.issued2002.
dc.description.abstractIn this thesis we have compared four statistical language models for large vocabulary Turkish speech recognition. Turkish is an agglutinative language and has a productive morphotactics. This property of Turkish results in a vocabulary explosion and misestimation of N-gram probabilities while designing speech recognition systems. The solution is to parse the words, in order to get smaller base units that are capable of covering the language with relatively small vocabulary size. Three different ways of decomposing words into base units are described: Morpheme-based model, stem-ending-based model and syllable-based model. These models with the word-based model are compared with respect to vocabulary size, text coverage, bigram perplexity and speech recognition performance. We have constructed a Turkish text corpus of size 10 million words, containing various texts collected from the Web. These texts have been parsed into their morphemes, stems, endings and syllables and statistics of these base units are estimated. Finally we have performed speech recognition experiments with models constructed with these base units.
dc.format.extent30 cm.
dc.format.pagesxv, 89 leaves ;
dc.identifier.otherEE 2002 D88
dc.identifier.urihttps://digitalarchive.library.bogazici.edu.tr/handle/123456789/12644
dc.publisherThesis (M.S.) - Bogazici University. Institue for Graduate Studies in Science and Engineering, 2002.
dc.relationIncludes appendices.
dc.relationIncludes appendices.
dc.subject.lcshAutomatic speech recognition -- Statistical methods.
dc.subject.lcshTurkish language -- Morphology.
dc.subject.lcshTurkish language -- Word formation.
dc.subject.lcshTurkish language -- Data processing.
dc.titleStatistical language models for large vocabulary Turkish speech recognition

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
b1254240.007213.001.PDF
Size:
3.13 MB
Format:
Adobe Portable Document Format

Collections