I am a co-author of two journal papers that appeared in the special issues of the Journal of Data and Information Quality on Reproducibility in IR.
The article entitled “OpenSearch: Lessons Learned from an Online Evaluation Campaign” by Jagerman et al. reports on our experience with TREC OpenSearch, an online evaluation campaign that enabled researchers to evaluate their experimental retrieval methods using real users of a live website. TREC OpenSearch focused on the task of ad hoc document retrieval within the academic search domain. We describe our experimental platform, which is based on the living labs methodology, and report on the experimental results obtained. We also share our experiences, challenges, and the lessons learned from running this track in 2016 and 2017.
The article entitled “Evaluation-as-a-Service for the Computational Sciences: Overview and Outlook” by Hopfgartner et al. discusses the Evaluation-as-a-Service paradigm, where data sets are not provided for download, but can be accessed via application programming interfaces (APIs), virtual machines (VMs), or other possibilities to ship executables. We summarize and compare current approaches, consolidate the experiences of these approaches, and outline next steps toward sustainable research infrastructures.