Authors
Dunja Mladenic,
Marko Grobelnik,
Publication date
1999
Publisher
Total citations
Description
This paper describes an approach to prediction of a document content based on the hyperlink that points to the document. The k-Nearest Neighbor algorithm is used to predict a set of words that appear in the document. Experiments are performed on realworld data obtained from the Web. The proposed approach gives promising results. On the tested data in average about 33% of document words are correctly predicted while among all the predicted words about 15% appeared in the document. The predicted words are chosen from about 4,000 to 8,000 dierent words and word pairs that appear in the training examples.