Authors
Marko Grobelnik,
Dunja Mladenic,
Publication date
1998
Publisher
Total citations
Description
We present an approach to text categorization using machine learning/data mining techniques. The approach is developed and tested on large text hierarchy named Yahoo available on the Web. The goal is to classify an arbitrary text document as accurate and as fast as possible to the right category within Yahoo hierarchy. To achieve this, we have to handle large number of features (words, word sequences) and training examples (items in the Yahoo nodes) by taking into account hierarchical structure of examples and using feature subset selection adapted for large text data. In our previous work we show that a rather high quality of classication can be achieved. Here our main concern is to classify as fast as possible while keeping classication quality still high. Classication of a document is performed by collecting votes from the category nodes. To achieve fast classication the number of category nodes involved in …