EEMCS

Home > Publications
Home University of Twente
Education
Research
Prospective Students
Jobs
Publications
Intranet (internal)
 
 Nederlands
 Contact
 Search
 Organisation

EEMCS EPrints Service


15378 MeSH Up: effective MeSH text classification for improved document retrieval
Home Policy Brochure Browse Search User Area Contact Help

Trieschnigg, R.B. and Pezik, P. and Lee, Vivian and de Jong, F.M.G. and Kraaij, W. and Rebholz-Schuhmann, D. (2009) MeSH Up: effective MeSH text classification for improved document retrieval. Bioinformatics, 25 (11). pp. 1412-1418. ISSN 1367-4803 *** ISI Impact 5,766 ***

Full text available as:

PDF

215 Kb
Open Access



Official URL: http://dx.doi.org/10.1093/bioinformatics/btp249

Exported to Metis

Abstract

Motivation: Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeSH concepts have been proposed to replace manual annotation, but they are either limited to a small subset of MeSH or have only been compared with a limited number of other systems.
Results: We compare the performance of six MeSH classification systems [MetaMap, EAGL, a language and a vector space model-based approach, a K-Nearest Neighbor (KNN) approach and MTI] in terms of reproducing and complementing manual MeSH annotations. A KNN system clearly outperforms the other published approaches and scales well with large amounts of text using the full MeSH thesaurus. Our measurements demonstrate to what extent manual MeSH annotations can be reproduced and how they can be complemented by automatic annotations. We also show that a statistically significant improvement can be obtained in information retrieval (IR) when the text of a user's query is automatically annotated with MeSH concepts, compared to using the original textual query alone.
Conclusions: The annotation of biomedical texts using controlled vocabularies such as MeSH can be automated to improve text-only IR. Furthermore, the automatic MeSH annotation system we propose is highly scalable and it generates improvements in IR comparable with those observed for manual annotations.

Item Type:Article
Research Group:EWI-HMI: Human Media Interaction
Research Program:CTIT-NICE: Natural Interaction in Computer-mediated Environments
Research Project:BioRange: A research programme to shape the future for bioinformatics in the Netherlands
Uncontrolled Keywords:Text classification, Genomics Information Retrieval
ID Code:15378
Status:Published
Deposited On:27 May 2009
Refereed:Yes
International:Yes
ISI Impact Factor:5,766
More Information:statisticsmetis

Export this item as:

To correct this item please ask your editor

Repository Staff Only: edit this item