PUBLICATIONS

Feature selection on hierarchy of web documents

Authors

Dunja Mladenić,

Marko Grobelnik,

Publication date

2003

Publisher

North-Holland

Total citations

Cited by 178

Description

The paper describes feature subset selection used in learning on text data (text learning) and gives a brief overview of feature subset selection commonly used in machine learning. Several known and some new feature scoring measures appropriate for feature subset selection on large text data are described and related to each other. Experimental comparison of the described measures is given on real-world data collected from the Web. Machine learning techniques are used on data collected from Yahoo, a large text hierarchy of Web documents. Our approach includes some original ideas for handling large number of features, categories and documents. The high number of features is reduced by feature subset selection and additionally by using ‘stop-list’, pruning low-frequency features and using a short description of each document given in the hierarchy instead of using the document itself. Documents are …

Publication

PUBLICATIONS

Feature selection on hierarchy of web documents

OptimalAI