Authors
Blaž Fortuna,
Delia Rusu,
Mitja Trampuš,
Tadej Štajner,
Tadej Štajner,
Publication date
2011
Publisher
RENDER Project Deliverable
Total citations
Description
This deliverable describes a prototype of the Fact Mining Toolkit, which provides fact extraction functionality and is designed as a Web Service with nine components: one for pre-processing plain-text, six core components providing annotations, assertions and categories and two components for rendering the output–either in RDF or as a graphical visualization. As an example application we developed a News Fact Extraction Service, which applies fact extraction to a stream of news articles. Finally, we present on-going research work in the lines of improving fact extraction by identifying fact templates across multiple documents. The Fact Extraction Service (also referred to as Enrycher) provides shallow as well as deep text processing functionalities at the text document level. Shallow text processing regards topic and keyword detection and named entity extraction: names of people, locations and organizations, dates, percentages and money amounts occurring in text. Deep text processing implies named entity resolution and merging, word sense disambiguation and assertion extraction. Named entity resolution is performed with respect to existing knowledge bases: DBpedia, YAGO, OpenCyc with the goal of using existing knowledge to enrich the set of features associated to named entities. Entity merging involves co-reference and anaphora resolution for named entities while assertion extraction takes the form of identifying subject–predicate–object sentence elements together with their modifiers (adjectives, adverbs) and negations.