Identifying event nuggets in Turkish news texts using natural language processing and machine learning methods

dc.contributorGraduate Program in Computer Engineering.
dc.contributor.advisorÖzgür, Arzucan.
dc.contributor.authorDurna, Mehmet.
dc.date.accessioned2023-03-16T10:04:09Z
dc.date.available2023-03-16T10:04:09Z
dc.date.issued2019.
dc.description.abstractAn event nugget is the smallest textual instance that marks the existence of an event. Detecting event nuggets in a given text opens door to further research and many practical applications such as automatic classification of the events within a given text. Therefore, it has been studied extensively for some languages including English, Spanish and Chinese. In this thesis, event nugget detection and event type classification for Turkish are studied for the first time. Due to lack of annotated data for event nugget detection in Turkish, we developed a new annotated data set for this task. In this thesis we describe how we manually annotated our data set as well as our system to identify event nuggets in Turkish news texts. The data set consists of words from Turkish news texts. Each word in the data set is manually annotated in terms of sequence type, nugget type, realis value and whether the event nugget is the main event, thus enabling us to make analysis on this data set for event nugget detection, event type classification, realis classification and main event detection. We made use of language specific features like morphological features and dependency parser features in Turkish as well as some other features. We aimed to see the effect of language specific features on this kind of analysis. We also experimented with different machine learning algorithms to find the best fitting model for our tasks. After having completed our experiments, we have shown that Turkish specific morphological features, dependency tree related features as well as word embeddings enabled us to achieve better results.
dc.format.extent30 cm.
dc.format.pagesxiii, 56 leaves ;
dc.identifier.otherCMPE 2019 D87
dc.identifier.urihttps://digitalarchive.library.bogazici.edu.tr/handle/123456789/12395
dc.publisherThesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcshNatural language processing (Computer science)
dc.subject.lcshMachine learning.
dc.subject.lcshNews Web sites -- Turkey.
dc.titleIdentifying event nuggets in Turkish news texts using natural language processing and machine learning methods

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
b2035013.034204.001.PDF
Size:
272.34 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
b2035013.034205.001.zip
Size:
102.69 MB
Format:
Unknown data format

Collections