Authors
Jure Leskovec,
Marko Grobelnik,
Natasa Milic-Frayling,
Publication date
Publisher
Total citations
Cited by
Description
In this paper we present a method for learning a relevance function assessing importance of substructures in the form of subject-predicate-object triples in documents represented by semantic graphs. First, we start with deep syntactic analysis of the document text and, for each sentence, extract logical form triples, subject–predicate–object. We then apply cross-sentence pronoun resolution, co-reference resolution, and semantic normalization to refine the set of triples and merge them into a semantic graph. This procedure is applied to both documents and corresponding human made summary extracts. We train linear Support Vector Machine on the logical form triples to learn how to extract triples that belong to sentences in document summaries which gives as a real valued function being able to weight importance of each individual triple to belong to the document summary. Our experiments with the DUC 2002 data show that increasing the set of attributes to include semantic properties and topological graph properties of logical triples yields statistically significant improvement of the micro-average F1 measure for the extracted triples. We also observe that attributes describing various aspects of semantic graph are weighted highly by SVM in the learned model. The work nicely fits in the context of ontology learning from text where the problem formulation corresponds to a variant of standard document summarization task.