Authors
Marko Grobelnik,
Janez Brank,
D Mladenic,
Blaž Fortuna,
Blaž Fortuna,
Publication date
2006
Publisher
IEEE
Total citations
Description
This paper presents an approach for constructing an ontology from a stream of documents. Named entities extracted from the documents are used as instances of the ontology. Entities and co-occurring entity pairs are represented by feature vectors based on the content of the documents where they occurred. In general, concepts and relations can be formed into an ontological structure either by clustering or by classification into an existing topic hierarchy. We propose the latter using DMoz as an existing topic hierarchy. The approach is efficient and can scale to large data sets. We propose a framework that incorporates the stream mining process into a formal definition of the ontology. We describe a software component implementing this approach, and present experiments using a large collection of news