EEMCS

Home > Publications
Home University of Twente
Education
Research
Prospective Students
Jobs
Publications
Intranet (internal)
 
 Nederlands
 Contact
 Sitemap
 Search
 Organisation

EEMCS EPrints Service


18654 How Different are Language Models and Word Clouds?
Home Policy Brochure Browse Search User Area Contact Help

Kaptein, R. and Hiemstra, D. and Kamps, J. (2010) How Different are Language Models and Word Clouds? In: 32nd European Conference on Information Retrieval (ECIR 2010). pp. 556-568. Lecture Notes in Computer Science 5993. Springer Verlag. ISSN 0302-9743 ISBN 978-3-642-12274-3

Full text available as:

PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
318 Kb

Official URL: http://dx.doi.org/10.1007/978-3-642-12275-0_48

Exported to Metis

Abstract

Word clouds are a summarised representation of a document’s text, similar to tag clouds which summarise the tags assigned to documents. Word clouds are similar to language models in the sense that they represent a document by its word distribution. In this paper we investigate the differences between word cloud and language modelling approaches, and specifically whether effective language modelling techniques also improve word clouds. We evaluate the quality of the language model using a system evaluation test bed, and evaluate the quality of the resulting word cloud with a user study. Our experiments show that different language modelling techniques can be applied to improve a standard word cloud that uses a TF weighting scheme in combination with stopword removal. Including bigrams in the word clouds and a parsimonious term weighting scheme are the most effective in both the system evaluation and the user study.

Item Type:Conference or Workshop Paper (Full Paper, Talk)
Research Group:EWI-DB: Databases
Research Program:CTIT-NICE: Natural Interaction in Computer-mediated Environments
Research Project:EfFoRT: Effective Focused Retrieval Techniques
ID Code:18654
Status:Published
Deposited On:02 November 2010
Refereed:Yes
International:Yes
More Information:statisticsmetis

Export this item as:

To correct this item please ask your editor

Repository Staff Only: edit this item