Authors
Janez Brank,
Dunja Mladenić,
Marko Grobelnik,
Publication date
Publisher
Total citations
Cited by
Description
This report focuses on dealing with very large ontologies by addressing the ontology population aspect that was identified as important for ontology learning using OntoGen. We present an approach for scalable population of ontologies with a large number of concepts and instances. The ontology is assumed to consist of concepts organized into a hierarchy via the is-a relation. The task of ontology population is to assign each instance (eg a textual document) to a suitable and relevant concept of the ontology. This is treated as a machine learning problem, based on the assumption that some training data is available (ie a set of instances for which the correct assignment to concepts is already known). In principle, a classifier could be trained for each concept to predict whether some instance belongs to that concept or not. However, since this approach may be difficult to scale to ontologies with a large number of concepts (which would require a large number of classifiers), we present an approach that uses coding matrices to convert this multiple-class classification problem (with a large number of classes) into a moderate number of binary classification problems.