EEMCS

Home > Publications
Home University of Twente
Education
Research
Prospective Students
Jobs
Publications
Intranet (internal)
 
 Nederlands
 Contact
 Search
 Organisation

EEMCS EPrints Service


22321 An exploration of language identification techniques for the Dutch folktale database
Home Policy Brochure Browse Search User Area Contact Help

Trieschnigg, R.B. and Hiemstra, D. and Theune, M. and de Jong, F.M.G. and Meder, T. (2012) An exploration of language identification techniques for the Dutch folktale database. In: Proceedings of the Workshop on Adaptation of Language Resources and Tools for Processing Cultural Heritage (LREC 2012), 26 May 2012, Istanbul, Turkey. pp. 47-51. LREC organization. ISBN not assigned

Full text available as:

PDF

329 Kb
Open Access



Official URL: http://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdf

Exported to Metis

Abstract

The Dutch Folktale Database contains fairy tales, traditional legends, urban legends, and jokes written in a large variety and combination of languages including (Middle and 17th century) Dutch, Frisian and a number of Dutch dialects. In this work we compare a number of approaches to automatic language identification for this collection. We show that in comparison to typical language identification tasks, classification performance for highly similar languages with little training data is low. The studied dataset consisting of over 39,000 documents in 16 languages and dialects is available on request for followup research.

Item Type:Conference or Workshop Paper (Full Paper, Talk)
Research Group:EWI-HMI: Human Media Interaction, EWI-DB: Databases
Research Program:CTIT-NICE: Natural Interaction in Computer-mediated Environments
Research Project:FACT: Folktales As Classifiable Texts, DIRKA: Distributed Information Retrieval by means of Keyword Auctions
Uncontrolled Keywords:language detection, text classification
ID Code:22321
Status:Published
Deposited On:15 October 2012
Refereed:Yes
International:Yes
More Information:statisticsmetis

Export this item as:

To correct this item please ask your editor

Repository Staff Only: edit this item