Time efficient spam e-mail filtering for Turkish

dc.contributorGraduate Program in Computer Engineering.
dc.contributor.advisorGüngör, Tunga.
dc.contributor.authorÇıltık, Ali.
dc.date.accessioned2023-03-16T10:06:06Z
dc.date.available2023-03-16T10:06:06Z
dc.date.issued2006.
dc.description.abstractIn the present thesis, we propose spam e-mail filtering methods having high accuracies and low time complexities. The methods are based on the n-gram approach and a heuristics which is referred to as the first n-words heuristics. Though the main concern of the research is studying the applicability of these methods on Turkish e-mails, they were also applied to English e-mails. A data set for both languages was compiled. Tests were performed with different parameters. Success rates above 95% for Turkish e-mails and around 98% for English e-mails were obtained. In addition, it has been shown that the time complexities can be reduced significantly without sacrificing from success. We also propose a combined perception refinement (CPR) which improves baseline success rates around 2%, where development set is used in the first step of the CPR to find out the parameters used in the second step. Free word order is another characteristic of Turkish language; we will make an attempt to implement free word order aspect of Turkish.
dc.format.extent30cm.
dc.format.pagesx, 48 leaves;
dc.identifier.otherCMPE 2006 C55
dc.identifier.urihttps://digitalarchive.library.bogazici.edu.tr/handle/123456789/12492
dc.publisherThesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2006.
dc.relationIncludes appendices.
dc.relationIncludes appendices.
dc.subject.lcshSpam filtering (Electronic mail)
dc.titleTime efficient spam e-mail filtering for Turkish

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
b1431493.001080.001.PDF
Size:
407.5 KB
Format:
Adobe Portable Document Format

Collections