PUBLICATIONS

J. Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia

Authors

Dunja Mladenic,

Marko Grobelnik,

Publication date

Publisher

Total citations

Cited by

Description

This paper proposes an ecient algorithm for the generation of new features that enrich the known bagof-words document representation. New features are generated based on word sequences of dierent length. Learning is performed using Naive Bayesian classier on feature-vectors, where only highly scored features are used. The performance of enriched document representation is evaluated on the problem of automatic document categorization using Yahoo text hierarchy. Our experiments show that using word sequences of length up to 3 instead of using only single words improves the performance, while longer sequences in average have no inuence to the performance.

Publication

PUBLICATIONS

J. Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia

OptimalAI