PUBLICATIONS

An analysis of systematic judging errors in information retrieval

Authors

Gabriella Kazai,

Nick Craswell,

Emine Yilmaz,

Publication date

2012

Publisher

Total citations

Cited by 29

Description

Test collections are powerful mechanisms for the evaluation and optimization of information retrieval systems. However, there is reported evidence that experiment outcomes can be affected by changes to the judging guidelines or changes in the judge population. This paper examines such effects in a web search setting, comparing the judgments of four groups of judges: NIST Web Track judges, untrained crowd workers and two groups of trained judges of a commercial search engine. Our goal is to identify systematic judging errors by comparing the labels contributed by the different groups, working under the same or different judging guidelines. In particular, we focus on detecting systematic differences in judging depending on specific characteristics of the queries and URLs. For example, we ask whether a given population of judges, working under a given set of judging guidelines, are more likely to consistently …

Publication

PUBLICATIONS

An analysis of systematic judging errors in information retrieval

OptimalAI