I’m sharing in this post the preprints of three recent full papers, covering a diverse set of topics, that are to appear in the coming weeks.
The KDD’20 paper “Evaluating Conversational Recommender Systems via User Simulation” (w/ Shuo Zhang) [PDF] represents a new line of work that I’m really excited about. We develop a user simulator for evaluating conversational agents on an item recommendation task. Our user simulator aims to generate responses that a real human would give by considering both individual preferences and the general flow of interaction with the system. We compare three existing conversational recommender systems and show that our simulation methods can achieve high correlation with real users using both automatic evaluation measures and manual human assessments.
The ICTIR’20 paper “Sanitizing Synthetic Training Data Generation for Question Answering over Knowledge Graphs” (w/ Trond Linjordet) [PDF] studies template-based synthetic data generation for neural KGQA systems. We show that there is a leakage of information in current approaches between training and test splits, which affects performance. We raise a series of challenging questions around training models with synthetic (template-based) data using fair conditions, which extend beyond the particular flavor of question answering task we study here.
The CIKM’20 paper “Generating Categories for Sets of Entities” (w/ Shuo Zhang and Jamie Callan) [PDF] addresses problems associated with the maintenance of category systems of large knowledge repositories, like Wikipedia. We aim to aid knowledge editors in the manual process of expanding a category system. Given a set of entities, e.g., in a list or table, we generate suggestions for new categories, which are specific, important and non-redundant. In addition to generating category labels, we also find the appropriate place of these new categories in the hierarchy, by locating the parent nodes that should be extended.