Hi, I am a research fellow at the Alan Turing Institute. I am also affiliated with Edinburgh University. I was a PhD student at the University of Twente and I was also affiliated with the Meertens Institute. I received a master's degree from the Language Technologies Institute at Carnegie Mellon University and a bachelor's degree in Computer Science from the University of Twente. I have interned at Facebook (fall 2011), Microsoft Research (fall 2013), and Google (summer 2014). In fall 2015 I visited Georgia Tech.
Maria Liakata and I run the NLP Interest Group at the Turing Institute.
I'm joining the University of Utrecht in May 2019
Email: (Languages: Dutch and English)
I defended my PhD thesis Text as social and cultural data: A computational perspective on variation in text at the University of Twente in 2017.
Research (full publication list)
I'm interested in Natural Language Processing and Information Retrieval, and in particular computational text analysis for research questions from the social sciences and humanities. I especially enjoy working with social media data. Some topics I have worked on:
- Computational sociolinguistics: Gender and age ( LATECH 2011, ICWSM 2013, COLING 2014), multilingualism (EMNLP 2013), dialects (ICWSM 2015), language change in online communities (LSM 2011), and geographical language variation (Computational Linguistics, 2017). We also wrote a survey on this topic (Computational Linguistics, 2016).
- Folktale similarity: Automatic classification and clustering of folktales from the Dutch Folktale Database (ECIR 2013, CIKM 2014).
- Online health campaigns: With a focus on cancer campaigns using data from the Twitter Data Grant (EMNLP 2015).
- Federated web search: I have been involved in organizing the Federated Web Search track at TREC (2013, 2014) and the creation of various datasets to support research in federated search (CIKM 2012, WWW 2015).
- Co-organizer Federated Web Search track at TREC (2013-2014).
- Program Committee (conferences): ACL (2014-2018, area chair 2019), EACL 2017, ECIR (2015-2019), NAACL (2016, 2018-2019), SIGIR short + full (2015-2016), CIKM (2015-2016), EMNLP (2013, 2017-2018), FAT* 2019 etc.
- Journal reviewer: TACL (2014-2018), PLOS ONE (2015), ACM TIST (2015), IEEE Transactions on Big Data (2015), Transactions on Information Systems (2016), Artificial Intelligence Review (2016), etc.
- Together with Vincent Traag I organized a 4TU seminar on Computational Social Science, 7 April, 2017.
- I gave a tutorial on NLP for computational social science at the Language, Data and Knowledge (LDK) conference (2017). [webpage] [slides]
- I gave a tutorial on Computational Sociolinguistics at the 3rd International Conference on Computational Social Science (2017).
Conference and workshop organization
- Digital diasporas: Interdisciplinary perspectives conference in London, 2019.
- Lorentz workshop: New methods in computational sociolinguistics, Leiden 2018
- Computational sociolinguistics workshop at NWAV47, New York 2018. (slides are here)
- Workshop on "Bridging disciplines in analysing text as social and cultural data", 21-22 September 2017 at the Turing Institute in London.
- Media coverage: New York Times, Time Magazine, New Scientist, Radio 538, Volkskrant, etc.
- Invited talks: Eindhoven University (2014), Google (2014), Radboud University Nijmegen (2014), CLIN panel (2015), University of Amsterdam (2015), CBS Heerlen (2016), University of Edinburgh (Oct 2016), Women in Machine Intelligence Dinner (2017), Understanding Euroscepticism Through the Lens of Big-Data workshop (2017), Data Science Festival meetup (2018), University of Amsterdam (2018), Bell Labs Cambridge (2018), SAGE Ocean Speaker Series (2018), University of Cambridge (2019)
- A video on Youtube of a talk I gave in Feb 2017 summarizing some of my work.
- Featured in Women in NLP spotlights.
- Special issue on computational sociolinguistics.
- My resume (pdf)