Home > Publications
Home University of Twente
Prospective Students
Intranet (internal)

EEMCS EPrints Service

11033 The Influence of Basic Tokenization on Biomedical Document Retrieval
Home Policy Brochure Browse Search User Area Contact Help

Trieschnigg, R.B. and Kraaij, W. and de Jong, F.M.G. (2007) The Influence of Basic Tokenization on Biomedical Document Retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 23-27 Jul 2007, Amsterdam. pp. 803-804. ACM Press. ISBN 978-1-59593-597-7

Full text available as:

- Univ. of Twente only
144 Kb

Official URL:

Exported to Metis


Tokenization is a fundamental preprocessing step in Information Retrieval systems in which text is turned into index terms. This paper quantifies and compares the influence of various simple tokenization techniques on document retrieval effectiveness in two domains: biomedicine and news. As expected, biomedical retrieval is more sensitive to small changes in the tokenization method. The tokenization strategy can make the difference between a mediocre and well performing IR system, especially in the biomedical domain.

Item Type:Conference or Workshop Paper (Extended Abstract, Poster)
Research Group:EWI-HMI: Human Media Interaction
Research Program:CTIT-NICE: Natural Interaction in Computer-mediated Environments
Research Project:BioRange: A research programme to shape the future for bioinformatics in the Netherlands
ID Code:11033
Deposited On:06 September 2007
More Information:statisticsmetis

Export this item as:

To request a copy of the PDF please email us request copy

To correct this item please ask your editor

Repository Staff Only: edit this item