Authors
Javed A Aslam,
Virgiliu Pavlu,
Emine Yilmaz,
Publication date
2005
Publisher
Total citations
Description
We consider the problem of evaluating the performance of query retrieval systems, and we propose a sampling technique for efficiently estimating standard measures of retrieval performance using incomplete judgments. Unlike existing techniques which (1) rely on effectively complete, and thus prohibitively expensive, relevance judgment sets,(2) produce biased estimates of standard performance measures, or (3) produce estimates of non-standard measures thought to be correlated with these standard measures, our proposed sampling technique produces unbiased estimates of the standard measures themselves. Our technique is based on random sampling, and as such, the greater the number of random samples (ie, relevance judgments), the higher the accuracy of our estimators. We further derive a number of enhancements to the general technique which allow one to determine accurate estimates for the standard performance measures associated with large collections of systems from a single, small judgment pool. Our experiments with the benchmark TREC data collection indicate that highly accurate estimates of these standard measures can be obtained using a number of relevance judgments as small as 2% of the typical judgment pool.