EEMCS

Home > Publications
Home University of Twente
Education
Research
Prospective Students
Jobs
Publications
Intranet (internal)
 
 Nederlands
 Contact
 Search
 Organisation

EEMCS EPrints Service


27190 Resource Selection for Federated Search on the Web
Home Policy Brochure Browse Search User Area Contact Help

Nguyen, Dong-Phuong and Demeester, T. and Trieschnigg, R.B. and Hiemstra, D. (2016) Resource Selection for Federated Search on the Web. Technical Report TR-CTIT-16-12, Centre for Telematics and Information Technology, University of Twente, Enschede, The Netherlands. ISSN 1381-3625

Full text available as:

PDF

329 Kb

Abstract

A publicly available dataset for federated search reflecting a real web environment has long been bsent, making it difficult for researchers to test the validity of their federated search algorithms for the web setting. We present several experiments and analyses on resource selection on the web using a recently released test collection containing the results from more than a hundred real search engines, ranging from large general web search engines such as Google, Bing and Yahoo to small domain-specific engines. First, we experiment with estimating the size of uncooperative search engines on the web using query based sampling and propose a new method using the ClueWeb09 dataset. We find the size estimates to be highly effective in resource selection. Second, we show that an optimized federated search system based on smaller web search engines can be an alternative to a system using large web search engines. Third, we provide an empirical comparison of several popular resource selection methods and find that these methods are not readily suitable for resource selection on the web. Challenges include the sparse resource descriptions and extremely skewed sizes of collections.

Item Type:Internal Report (Technical Report)
Research Group:EWI-HMI: Human Media Interaction, EWI-DB: Databases
Research Program:CTIT-General
Research Project:FACT: Folktales As Classifiable Texts, COMMIT/Infiniti: Information Retrieval for Information Services
ID Code:27190
Deposited On:20 September 2016
More Information:statistics

Export this item as:

To correct this item please ask your editor

Repository Staff Only: edit this item