PUBLICATIONS

Training text classifiers with SVM on very few positive examples

Authors

Janez Brank,

Marko Grobelnik,

Nataša Milic-Frayling,

Publication date

2003

Publisher

Technical Report MSR-TR-2003-34, Microsoft Corp

Total citations

Cited by 64

Description

Text categorization involves a predefined set of categories and a set of documents that need to be classified using that categorization scheme. Each document can be assigned one or multiple categories (or perhaps none at all). We address the multi-class categorization problem as a set of binary problems where, for each category, the set of positive examples consists of documents belonging to the category while all other documents are considered negative examples. Labelled documents are used as input to various learning algorithms to train classifiers and automatically categorize new unlabelled documents. Traditionally, machine learning research has assumed that the class distribution in the training data is reasonably balanced. More recently it has been recognized that this is often not the case with realistic data sets where many more negative examples than positive ones are available. The question then …

Publication

PUBLICATIONS

Training text classifiers with SVM on very few positive examples

OptimalAI