Evaluating document filtering systems over time


Performance of three systems over time. Systems A and B degrade, while System C improves over time, but they all have the same average performance over the entire period. We express the change in system performance using the derivative of the fitted line (in orange) and compare performance at what we call the “estimated end-point” (the large orange dots).

Our IPM paper “Evaluating document filtering systems over time” with Tom Kenter and Maarten de Rijke as co-authors is available online. In this paper we propose a framework for measuring the performance of document filtering systems. Such systems, up to now, have been evaluated in terms of traditional metrics like precision, recall, MAP, nDCG, F1 and utility. We argue that these metrics lack support for the temporal dimension of the task. We propose a time-sensitive way of measuring performance by employing trend estimation. In short, the performance is calculated for batches, a trend line is fitted to the results, and the estimated performance of systems at the end of the evaluation period is used to compare systems. To demonstrate the results of our proposed evaluation methodology, we analyze the runs submitted to the Cumulative Citation Recommendation task of the 2012 and 2013 editions of the TREC Knowledge Base Acceleration track, and show that important new insights emerge.