Authors
Gregor Leban,
Mauricio Ciprián,
Lorand Dali,
Jasna Škrbec,
Jasna Škrbec,
Publication date
Publisher
Total citations
Cited by
Description
The purpose of this deliverable is to describe the details of the first prototype of the module for knowledge extraction from unstructured sources. We present our results in constructing the Annotation ontology, processing and annotating the posts and processing the user requests. The Annotation ontology consists of two parts–the part that contains general computer science (CS) terminology and the part with project specific concepts. The part with the general CS terminology will be shared among all projects using ALERT. To create it we used two websites that contain a dictionary of CS terms and their descriptions. We used the terms as ontology concepts and the descriptions of terms in order to create relationships between the concepts. We also enriched this set of terms by finding additional related terms on the Wikipedia. To find project specific terms we used the existing data sources of the project. We applied the k-means algorithm to identify clusters of similar documents. For each cluster we then extracted a set of most descriptive keywords. These keywords are considered as good candidates for the project specific terms. To identify new terms in the online phase we used a similar algorithm that inspects the recently created posts and identifies the most relevant keywords.