EEMCS

Home > Publications
Home University of Twente
Education
Research
Prospective Students
Jobs
Publications
Intranet (internal)
 
 Nederlands
 Contact
 Search
 Organisation

EEMCS EPrints Service


27808 Luhn Revisited: Significant Words Language Models
Home Policy Brochure Browse Search User Area Contact Help

Dehghani, M. and Azarbonyad, H. and Kamps, J. and Hiemstra, D. and Marx, M. (2016) Luhn Revisited: Significant Words Language Models. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM 2016), October 24 - 28, 2016, Indianapolis, Indiana, USA. pp. 1301-1310. ACM. ISBN 978-1-4503-4073-1

Full text available as:

PDF

202 Kb

Official URL: http://dx.doi.org/10.1145/2983323.2983814

Abstract

Users tend to articulate their complex information needs in only a few keywords, making underspecified statements of request the main bottleneck for retrieval effectiveness. Taking advantage of feedback information is one of the best ways to enrich the query representation, but can also lead to loss of query focus and harm performance - in particular when the initial query retrieves only little relevant information - when overfitting to accidental features of the particular observed feedback documents. Inspired by the early work of Hans Peter Luhn, we propose significant words language models of feedback documents that capture all, and only, the significant shared terms from feedback documents. We adjust the weights of common terms that are already well explained by the document collection as well as the weight of rare terms that are only explained by specific feedback documents, which eventually results in having only the significant terms left in the feedback model.
Establishing a set of 'Significant Words'

Our main contributions are the following. First, we present significant words language models as the effective models capturing the essential terms and their probabilities. Second, we apply the resulting models to the relevance feedback task, and see a better performance over the state-of-the-art methods. Third, we see that the estimation method is remarkably robust making the models insensitive to noisy non-relevant terms in feedback documents. Our general observation is that the significant words language models more accurately capture relevance by excluding general terms and feedback document specific terms.

Item Type:Conference or Workshop Paper (Full Paper, Talk)
Research Group:EWI-DB: Databases
Research Program:CTIT-General
Research Project:COMMIT/Infiniti: Information Retrieval for Information Services
ID Code:27808
Status:Published
Deposited On:20 April 2017
Refereed:Yes
International:Yes
More Information:statistics

Export this item as:

To correct this item please ask your editor

Repository Staff Only: edit this item