SIGIR’22 contributions

The following papers got accepted at SIGIR’22:

  • Analyzing and Simulating User Utterance Reformulation in Conversational Recommender Systems” — full paper with Shuo Zhang and Mu Chun Wang [PDF]
  • On Natural Language User Profiles for Transparent and Scrutable Recommendation” — perspectives paper with Filip Radlinski, Fernando Diaz, Lucas Dixon, and Ben Wedin [PDF]
  • Would You Ask it that Way? Measuring and Improving Question Naturalness for Knowledge Graph Question Answering” — resource paper with Trond Linjordet [PDF]

Additionally, together with Alessandro Piscopo, Oana Inel, Sanne Vrijenhoek, and Martijn Millecamp, I’ll be co-organizing a workshop on Measuring Quality of Explanations in Recommender Systems [PDF].

update May 31: pre-prints added

Best paper award at DESIRES’21

I was honoured to receive a Best Paper Award for my paper “Conversational AI from an Information Retrieval Perspective: Remaining Challenges and a Case for User Simulation” at the 2nd International Conference on Design of Experimental Search & Information REtrieval Systems (DESIRES’21), which took place last week. The paper as well as the presentation slides are available online.

SIGIR’21 preprints and resources

Thanks to a fruitful collaboration with colleagues at Google, Bloomberg, Radboud University, Shandong University, and the University of Amsterdam, and, of course, students at the University of Stavanger, I have the following papers to appear at SIGIR this year. All are around conversational and/or recommender systems and come with publicly released resources.

Three recent papers

I’m sharing in this post the preprints of three recent full papers, covering a diverse set of topics, that are to appear in the coming weeks.

The KDD’20 paper “Evaluating Conversational Recommender Systems via User Simulation” (w/ Shuo Zhang) [PDF] represents a new line of work that I’m really excited about. We develop a user simulator for evaluating conversational agents on an item recommendation task. Our user simulator aims to generate responses that a real human would give by considering both individual preferences and the general flow of interaction with the system. We compare three existing conversational recommender systems and show that our simulation methods can achieve high correlation with real users using both automatic evaluation measures and manual human assessments.

The ICTIR’20 paper “Sanitizing Synthetic Training Data Generation for Question Answering over Knowledge Graphs” (w/ Trond Linjordet) [PDF] studies template-based synthetic data generation for neural KGQA systems. We show that there is a leakage of information in current approaches between training and test splits, which affects performance. We raise a series of challenging questions around training models with synthetic (template-based) data using fair conditions, which extend beyond the particular flavor of question answering task we study here.

The CIKM’20 paper “Generating Categories for Sets of Entities” (w/ Shuo Zhang and Jamie Callan) [PDF] addresses problems associated with the maintenance of category systems of large knowledge repositories, like Wikipedia. We aim to aid knowledge editors in the manual process of expanding a category system. Given a set of entities, e.g., in a list or table, we generate suggestions for new categories, which are specific, important and non-redundant. In addition to generating category labels, we also find the appropriate place of these new categories in the hierarchy, by locating the parent nodes that should be extended.

Two journal papers on online evaluation

I am a co-author of two journal papers that appeared in the special issues of the Journal of Data and Information Quality on Reproducibility in IR.

The article entitled “OpenSearch: Lessons Learned from an Online Evaluation Campaign” by Jagerman et al. reports on our experience with TREC OpenSearch, an online evaluation campaign that enabled researchers to evaluate their experimental retrieval methods using real users of a live website. TREC OpenSearch focused on the task of ad hoc document retrieval within the academic search domain. We describe our experimental platform, which is based on the living labs methodology, and report on the experimental results obtained. We also share our experiences, challenges, and the lessons learned from running this track in 2016 and 2017.

The article entitled “Evaluation-as-a-Service for the Computational Sciences: Overview and Outlook” by Hopfgartner et al. discusses the Evaluation-as-a-Service paradigm, where data sets are not provided for download, but can be accessed via application programming interfaces (APIs), virtual machines (VMs), or other possibilities to ship executables. We summarize and compare current approaches, consolidate the experiences of these approaches, and outline next steps toward sustainable research infrastructures.