Federated Web SearchWe have released several datasets to support research on federated web search. The datasets contain samples from real search engines.
- FedWeb Greatest Hits (WWW 2015)
Recommended. This dataset combines Fedweb 2013 and 2014 and contains additional data.
- FedWeb 2014 dataset (TREC 2014)
- FedWeb 2013 dataset (TREC 2013)
- FedWeb 2012 dataset (CIKM 2012)
Motivations provided in Movember profiles annotated according
to the Social Identity Model of Col- lective Action (van Zomeren et al., 2008).
Download: [zip file]
D. Nguyen, T. van den Broek, C. Hauff, D. Hiemstra and M. Ehrenhard: #SupportTheCause: Identifying Motivations to Participate in Online Health Campaigns at EMNLP 2015. [pdf]
NL-TR word level language identification
Posts from a Turkish/Dutch online forum with word level language annotations
[Download it here]
D. Nguyen, A.S. Doğruöz : Word Level Language Identification in Online Multilingual Communication at EMNLP 2013. [pdf]
Annotations used in the ICWSM 2013 paper on age prediction. Download: [zip file]D. Nguyen, R. Gravel, D. Trieschnigg and T. Meder: "How Old Do You Think I Am?": A Study of Language and Age in Twitter at ICWSM 2013. [pdf]
Kernel independence testing
D. Nguyen and J. Eisenstein. A Kernel Independence Test for Geographical Language Variation. Arxiv [pdf] Computational Linguistics, Volume 43, Issue 3. 2017.
Evaluating local explanations for text classification
D. Nguyen. Comparing automatic and human evaluation of local explanations for text classification. NAACL 2018.
D. Nguyen, B. McGillivray, T. Yasseri. Emo, love, and god: Making sense of Urban Dictionary, a crowd-sourced online dictionary. To appear in Royal Society Open Science. [Arxiv preprint]