Using Exclusive Web Crawlers to Store Better Results in Search Engines'   Database

Ali Tourani; Amir Seyed Danesh

arXiv:1305.2686·cs.IR·May 14, 2013

Using Exclusive Web Crawlers to Store Better Results in Search Engines' Database

Ali Tourani, Amir Seyed Danesh

PDF

Open Access

TL;DR

This paper introduces an exclusive web crawler approach that improves search engine database accuracy and update efficiency by storing site-specific data tables, reducing costs and preventing outdated results.

Contribution

It proposes a novel crawling method that stores site-specific data in dedicated tables, enhancing data accuracy and update speed in search engine databases.

Findings

01

Improved data accuracy in search engine results.

02

Reduced crawling and updating costs.

03

Elimination of 404 errors in search results.

Abstract

Crawler-based search engines are the mostly used search engines among web and Internet users, involve web crawling, storing in database, ranking, indexing and displaying to the user. But it is noteworthy that because of increasing changes in web sites search engines suffer high time and transfers costs which are consumed to investigate the existence of each page in database while crawling, updating database and even investigating its existence in any crawling operations. "Exclusive Web Crawler" proposes guidelines for crawling features, links, media and other elements and to store crawling results in a certain table in its database on the web. With doing this, search engines store each site's tables in their databases and implement their ranking results on them. Thus, accuracy of data in every table (and its being up-to-date) is ensured and no 404 result is shown in search results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Caching and Content Delivery · Spam and Phishing Detection