EEMCS

Home > Publications
Home University of Twente
Education
Research
Prospective Students
Jobs
Publications
Intranet (internal)
 
 Nederlands
 Contact
 Search
 Organisation

EEMCS EPrints Service


22480 Federated Search in the Wild: the combined power of over a hundred search engines
Home Policy Brochure Browse Search User Area Contact Help

Nguyen, Dong-Phuong and Demeester, T. and Trieschnigg, R.B. and Hiemstra, D. (2012) Federated Search in the Wild: the combined power of over a hundred search engines. In: Proceedings of the 21th ACM international Conference on Information and Knowledge Management (CIKM 2012), 29 Oct - 02 Nov 2012, Maui, Hawaii, USA. pp. 1874-1878. ACM. ISBN 978-1-4503-1156-4

Full text available as:

PDF

228 Kb
Open Access



Official URL: http://dx.doi.org/10.1145/2396761.2398535

Exported to Metis

Abstract

Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved search results. However, a publicly available dataset for federated search reflecting an actual web environment has been absent. As a result, it has been difficult to assess whether proposed systems are suitable for the web setting. We introduce a new test collection containing the results from more than a hundred actual search engines, ranging from large general web search engines such as Google and Bing to small domain-specific engines. We discuss the design and analyze the effect of several sampling methods. For a set of test queries, we collected relevance judgements for the top 10 results of each search engine. The dataset is publicly available and is useful for researchers interested in resource selection for web search collections, result merging and size estimation of uncooperative resources.

Item Type:Conference or Workshop Paper (Full Paper, Talk, Poster)
Research Group:EWI-HMI: Human Media Interaction, EWI-DB: Databases
Research Program:CTIT-NICE: Natural Interaction in Computer-mediated Environments
Research Project:FACT: Folktales As Classifiable Texts, DIRKA: Distributed Information Retrieval by means of Keyword Auctions
Uncontrolled Keywords:Federated search, distributed information retrieval, evaluation, dataset, test collection, web search
ID Code:22480
Status:Published
Deposited On:05 December 2012
Refereed:Yes
International:Yes
More Information:statisticsmetis

Export this item as:

To correct this item please ask your editor

Repository Staff Only: edit this item