Speech retrieval for Turkish broadcast news

dc.contributorGraduate Program in Electrical and Electronic Engineering.
dc.contributor.advisorSaraçlar, Murat.
dc.contributor.authorParlak, Sıddıka.
dc.date.accessioned2023-03-16T10:17:08Z
dc.date.available2023-03-16T10:17:08Z
dc.date.issued2008.
dc.description.abstractSpeech retrieval is a recently emerging field of information retrieval, in which the information is spoken, instead of written. So far, spoken information retrieval has been studied in several languages. In this thesis, we concentrate on the retrieval of Turkish Broadcast News. We implement two tasks: Spoken Term Detection (STD) and Spoken Document Retrieval (SDR). Although they both combine Automatic Speech Recognition (ASR) and Information Retrieval (IR) techniques to retrieve spoken data, their main goals are different. STD retrieves specific occurrences and requires an exact match, while SDR retrieves related documents and cares more about context. Automatic transcription and retrieval of speech is more complicated in agglutinative languages because a standard size recognition vocabulary is able to cover only a limited portion of the language. A common solution is segmenting the words into subwords and using subwords units in recognition. We employed grammatical and statistical subword units in recognition and indexing for STD. Best scores are obtained via combining word and statistical subword based approaches. Word segmentation algorithms are also useful in SDR since stems bear the meaning and provide a better representation of context. Experiments showed that stemming improves SDR performance but the segmenting methods do not have a significant difference. We also studied language-independent ASR errors. Indexing the alternative ASR hypotheses, as well as the best one, was shown to be effective on the STD task. Results are presented on our Turkish Broadcast News Corpus.
dc.format.extent30cm.
dc.format.pagesxix, 100 leaves;
dc.identifier.otherEE 2008 P37
dc.identifier.urihttps://digitalarchive.library.bogazici.edu.tr/handle/123456789/12709
dc.publisherThesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2008.
dc.relationIncludes appendices.
dc.relationIncludes appendices.
dc.subject.lcshSpeech perception.
dc.subject.lcshInformation retrieval.
dc.titleSpeech retrieval for Turkish broadcast news

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
b1540448.003850.001.PDF
Size:
1.15 MB
Format:
Adobe Portable Document Format

Collections