EEMCS

Home > Publications
Home University of Twente
Education
Research
Prospective Students
Jobs
Publications
Intranet (internal)
 
 Nederlands
 Contact
 Search
 Organisation

EEMCS EPrints Service


25496 Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment
Home Policy Brochure Browse Search User Area Contact Help

Nguyen, Dong-Phuong and Trieschnigg, R.B. and Doğruöz, A.S. and Gravel, R. and Theune, M. and Meder, T. and de Jong, F.M.G. (2014) Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment. In: Proceedings of the 25th International Conference on Computational Linguistics, COLING 2014, 23-29 Aug 2014, Dublin, Ireland. pp. 1950-1961. Association for Computational Linguistics. ISBN 978-1-941643-26-6

Full text available as:

PDF

906 Kb
Open Access



Official URL: http://anthology.aclweb.org/C/C14/C14-1184.pdf

Exported to Metis

Abstract

There is a growing interest in automatically predicting the gender and age of authors from texts. However, most research so far ignores that language use is related to the social identity of speakers, which may be different from their biological identity. In this paper, we combine insights from sociolinguistics with data collected through an online game, to underline the importance of approaching age and gender as social variables rather than static biological variables. In our game, thousands of players guessed the gender and age of Twitter users based on tweets alone. We show that more than 10% of the Twitter users do not employ language that the crowd associates with their biological sex. It is also shown that older Twitter users are often perceived to be younger. Our findings highlight the limitations of current approaches to gender and age prediction from texts.

Item Type:Conference or Workshop Paper (Full Paper, Talk)
Research Group:EWI-HMI: Human Media Interaction
Research Program:CTIT-NICE: Natural Interaction in Computer-mediated Environments
Research Project:FACT: Folktales As Classifiable Texts
Uncontrolled Keywords:gender, age, classification, natural language processing, crowdsourcing, twitter
ID Code:25496
Status:Published
Deposited On:27 January 2015
Refereed:Yes
International:Yes
More Information:statisticsmetis

Export this item as:

To correct this item please ask your editor

Repository Staff Only: edit this item