The Method of Keyword Based Crawler Load Balancing

MO-JI WEI; YAN-QING ZHAO; SHI-WEI ZHU; AI-QIN YANG

doi:10.12783/dtcse/ceic2018/24546

The Method of Keyword Based Crawler Load Balancing

MO-JI WEI, YAN-QING ZHAO, SHI-WEI ZHU, AI-QIN YANG

Abstract

This paper researches feature of different data sources such as web site and social media, and proposes a load balancing method for distributed web crawlers by calculating weights of crawling data from various sources. Firstly, a seeding links allocation strategy is proposed based on analyzing differences of statistical data update frequency of different data sources. Then with the allocation strategy a data crawling solution using domain names and keywords as its task unit is given. Finally, by adjusting allocation among distributed web crawlers with calculating time expenditures of domain names and keywords as its weights, a load balancing method is proposed.

Keywords

Distributed web crawler, Load balancing strategy, Seeding links allocation strategyText

DOI
10.12783/dtcse/ceic2018/24546

Refbacks

There are currently no refbacks.

Username
Password
Remember me

COMPUTER SCIENCEand ENGINEERING

The Method of Keyword Based Crawler Load Balancing

Abstract

Keywords

Refbacks

COMPUTER SCIENCE
and ENGINEERING