PUBLICATIONS

A statistical method for system evaluation using incomplete judgments

Authors

Javed A Aslam,

Virgil Pavlu,

Emine Yilmaz,

Publication date

2006

Publisher

Total citations

Cited by 205

Description

We consider the problem of large-scale retrieval evaluation, and we propose a statistical method for evaluating retrieval systems using incomplete judgments. Unlike existing techniques that (1) rely on effectively complete, and thus prohibitively expensive, relevance judgment sets, (2) produce biased estimates of standard performance measures, or (3) produce estimates of non-standard measures thought to be correlated with these standard measures, our proposed statistical technique produces unbiased estimates of the standard measures themselves.Our proposed technique is based on random sampling. While our estimates are unbiased by statistical design, their variance is dependent on the sampling distribution employed; as such, we derive a sampling distribution likely to yield low variance estimates. We test our proposed technique using benchmark TREC data, demonstrating that a sampling pool derived …

Publication

PUBLICATIONS

A statistical method for system evaluation using incomplete judgments

OptimalAI