PUBLICATIONS

Department of Intelligent Systems J. Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Dunja. Mladenic@ ijs. si

Authors

Dunja Mladenic,

Marko Grobelnik,

Publication date

Publisher

Total citations

Cited by

Description

This paper describes an approach to feature subset selection that takes into account problem specics and learning algorithm characteristics. It is developed for the Naive Bayesianclassier applied on text data, since it combines well with the addressed learning problems. We focus on domains with many features that also have a highly unbalanced class distribution and asymmetric misclassication costs given only implicitly in the problem. By asymmetric misclassication costs we mean that one of the class values is the target class value for which we want to get predictions and we prefer false positive over false negative. Our example problem is automatic document categorization using machine learning, where we want to identify documents relevant for the selected category. Usually, only about 1%-10% of examples belongtotheselectedcategory. Ourexperimental comparison of eleven feature scoring measures show …

Publication

PUBLICATIONS

Department of Intelligent Systems J. Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Dunja. Mladenic@ ijs. si

OptimalAI