Repository logo
BOĞAZİÇİ UNIVERSITY
LIBRARY DIGITAL ARCHIVE

A semantic sentence similarity estimation approach for the biomedical domain

dc.contributorGraduate Program in Computer Engineering.
dc.contributor.advisorÖzgür, Arzucan.
dc.contributor.authorSoğancıoğlu, Gizem.
dc.date.accessioned2023-03-16T10:02:36Z
dc.date.available2023-03-16T10:02:36Z
dc.date.issued2016.
dc.description.abstractDuring the last decades, the use of semantic text similarity has been adopted as a major component in many Natural Language Processing tasks, including text retrieval, summarization, and document categorization. Integration of semantic information acts as a powerful tool for a better understanding and structuring of text. Among the many domains that benefit from text mining studies, biomedical literature is one of the most challenging areas because of its domain-specific language. As an inevitable result of the complex nature of the biomedical literature, domain-specific adaptations are crucial requirements. There are several semantic text similarity approaches that have been applied on the word-level. However, and to the best of our knowledge, there has not been any research on sentence-level semantic similarity in the biomedical domain. Furthermore, our experimental results revealed that domain-independent state-of-theart approaches in sentence-level semantic similarity do not effectively cover biomedical knowledge and produce poor results. In this study, we propose several different approaches for domain-specific semantic sentence-level similarity computation, including measures utilizing distributional vector representations of sentences, methods combining general and domain specific ontologies, as well as a supervised approach exploiting high-level features. Our proposed methods are evaluated using a manually annotated data set which consists of 100 sentence pairs from biomedical literature. The experiments showed that the supervised semantic similarity computation approach obtained the best performance and improved over the previous domain-independent systems up to 42.6% in terms of the Pearson correlation metric.
dc.format.extent30 cm.
dc.format.pagesxiv, 80 leaves ;
dc.identifier.otherCMPE 2016 S74
dc.identifier.urihttps://hdl.handle.net/20.500.14908/12330
dc.publisherThesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2016.
dc.subject.lcshSemantic integration (Computer systems)
dc.subject.lcshNatural language processing (Computer science)
dc.titleA semantic sentence similarity estimation approach for the biomedical domain

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
b1834324.027552.001.PDF
Size:
770.76 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
b1834324.027553.001.rar
Size:
117.35 MB
Format:
Unknown data format

Collections