EEMCS

Home > Publications
Home University of Twente
Education
Research
Prospective Students
Jobs
Publications
Intranet (internal)
 
 Nederlands
 Contact
 Sitemap
 Search
 Organisation

EEMCS EPrints Service


12720 Parsimonious Language Models for a Terabyte of Text
Home Policy Brochure Browse Search User Area Contact Help

Hiemstra, D. and Kamps, J. and Kaptein, R. and Li, R.M. (2008) Parsimonious Language Models for a Terabyte of Text. In: Proceedings of the 16th Text Retrieval Conference (TREC), 5-9 Nov 2007, Gaithersburg, Maryland, USA. 64. NIST Special Publication 500 (274). US National Institute of Standards and Technology (NIST). ISBN not assigned

Full text available as:

PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
168 Kb

Official URL: http://trec.nist.gov/pubs/trec16/t16_proceedings.html

Exported to Metis

Abstract

The aims of this paper are twofold. Our first aim
is to compare results of the earlier Terabyte tracks
to the Million Query track. We submitted a number
of runs using different document representations
(such as full-text, title-fields, or incoming
anchor-texts) to increase pool diversity. The initial
results show broad agreement in system rankings
over various measures on topic sets judged at both
Terabyte and Million Query tracks, with runs using
the full-text index giving superior results on
all measures, but also some noteworthy upsets.
Our second aim is to explore the use of parsimonious
language models for retrieval on terabyte-scale
collections. These models are smaller thus
more efficient than the standard language models
when used at indexing time, and they may also improve
retrieval performance. We have conducted
initial experiments using parsimonious models in
combination with pseudo-relevance feedback, for
both the Terabyte and Million Query track topic
sets, and obtained promising initial results.

Item Type:Conference or Workshop Paper (Full Paper, Poster)
Research Group:EWI-DB: Databases
Research Program:CTIT-NICE: Natural Interaction in Computer-mediated Environments
Research Project:EfFoRT: Effective Focused Retrieval Techniques
ID Code:12720
Status:Published
Deposited On:27 May 2008
Refereed:No
International:Yes
More Information:statisticsmetis

Export this item as:

To correct this item please ask your editor

Repository Staff Only: edit this item