Authors
Jin Young Kim,
Emine Yilmaz,
Paul Thomas,
Publication date
Publisher
Total citations
Cited by
Description
Offline evaluation characterizes an information retrieval (IR) system without relying on actual users in a real-world environment. Offline evaluation, notably test collection based evaluation, has been the dominant approach in IR evaluation and it is no exaggeration to say that shared evaluation efforts such as the TREC conferences have defined IR research over the years. The reason for this success lies in the ability to compare retrieval systems in a reusable manner. Several recent trends however necessitate a change in the role and methods of offline evaluation. First and foremost, online search engines with large-scale user base has become commonplace, enabling online evaluation based on user behavior There are new endpoints for search, such as mobile phones and conversational agents, and the types of search results has diversified beyond a list of web documents to include other result types. Finally, crowdsourcing has provided ways for human judgments of any kind to be collected at a large scale. However, online evaluation based on user behavior has its own challenges due to repeatability as well the extensive amount of time needed to get online evaluation signals from the users. Furthermore, most smaller companies and academic researchers do not have access to such large scale user base. Hence, recent research in IR evaluation has focused on the advent of new offline evaluation paradigms which are more user-centric, diverse and agile.