The development of mod_antiCrawl: an anti crawler add-on module for apache web servers

Loading...
Thumbnail Image

Date

2012.

Journal Title

Journal ISSN

Volume Title

Publisher

Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2012.

Abstract

A web crawler can be defined as automated software that extracts website maps by visiting all the links in a website. Website map extraction process can be used to build a basis for a web attack. Hence, crawling plays an important role in automated attacks. The most automated vulnerability scanners perform crawling before vulnerability tests in order to determine overall map and attack surface. Besides automated scanning features, crawlers can also be used for content theft. By utilising a crawler, one can copy all the pages and content of a website by visiting all pages in an orderly manner. Anti-crawling can be defined as a set of mechanisms that prevents websites from being crawled by automated crawlers. In this thesis, a set of anti-crawling mechanisms are combined into an Apache web server module called mod_antiCrawl. mod_antiCrawl is developed in C language by using Apache API and it has crawler detection and inhibition capabilities to protect servers from malicious crawlers. The performance of mod_antiCrawl has also been studied and our results show that website map discovery by crawlers decreases at least 70% after mod_antiCrawl is activated. This ratio increases to 90% by enabling different functionalities of the module.

Description

Keywords

Citation

Collections