PUBLICATIONS

Large-scale hierarchical text classification using svm and coding matrices

Authors

Janez Brank,

D Mladenic,

Marko Grobelnik,

Publication date

2010

Publisher

Total citations

Cited by 9

Description

We deal with the problem of classifying textual documents into a topical hierarchy of categories. Multi-class classification problems such as this one are often dealt with by converting them into several two-class classification problems; a binary classifier can be trained for each of these problems and their predictions are then combined to form the final classification of a document into the topic hierarchy. The conversion from the original multi-class problem into a group of two-class problems can be succinctly described by a" coding matrix". In traditional approaches, the coding matrix is either completely random or (more commonly) completely fixed in advance (eg 1-vs-1, 1-vsrest); in both cases, the training data does not affect the design of the coding matrix. Our approach constructs the coding matrix gradually, one column at a time, with each new column being defined in such a way that the new binary classifier attempts to rectify the most common mistakes of the ensemble of binary classifiers built up to that point. The goal is to achieve good performance with a smaller number of binary classifiers. We also present systematic experiments on a small dataset which demonstrate that good coding matrices with a small number of columns exist, but are rare.

Publication

PUBLICATIONS

Large-scale hierarchical text classification using svm and coding matrices

OptimalAI