Authors
Janez Brank,
Marko Grobelnik,
Natasa Milic-Frayling,
Publication date
2002
Publisher
Total citations
Description
In this paper we explore effects of various feature selection algorithms on document classification performance. We propose to use two, possibly distinct linear classifiers: one used exclusively for feature selection in order to obtain the feature space for training the second classifier, using possibly a different training set. The resulting classifier is used to classify new documents. Experiments show that feature selection based on the linear SVM algorithm combines well with different types of classifiers. Based on the experimental results we make a conjecture that it is the level of sophis tication at which the scoring method takes into account information about features, rather than its compatibility with the classifier in terms of its design, that makes the feature selection method more or less successful.