PUBLICATIONS

Detecting duplicate web documents using clickthrough data

Authors

Filip Radlinski,

Paul N Bennett,

Emine Yilmaz,

Publication date

2011

Publisher

Total citations

Cited by 29

Description

The web contains many duplicate and near-duplicate documents. Given that user satisfaction is negatively affected by redundant information in search results, a significant amount of research has been devoted to developing duplicate detection algorithms. However, most such algorithms rely solely on document content to detect duplication, ignoring the fact that a primary goal of duplicate detection is to identify documents that contain redundant information with respect to a particular user query. Similarly, although query-dependent result diversification algorithms compute a query-dependent ranking, they tend to do so on the basis of a query-independent content similarity score. In this paper, we bridge the gap between query-dependent redundancy and query-independent duplication by showing how user click behavior following a query provides evidence about the relative novelty of web documents. While most …

Publication

PUBLICATIONS

Detecting duplicate web documents using clickthrough data

OptimalAI