Authors
Jure Leskovec,
Marko Grobelnik,
Natasa Milic-Frayling,
Publication date
2004
Publisher
Total citations
Description
We present a method for summarizing document by creating a semantic graph of the original document and identifying the substructure of such a graph that can be used to extract sentences for a document summary. We start with deep syntactic analysis of the text and, for each sentence, extract logical form triples, subject–predicate–object. We then apply cross-sentence pronoun resolution, co-reference resolution, and semantic normalization to refine the set of triples and merge them into a semantic graph. This procedure is applied to both documents and corresponding summary extracts. We train Support Vector Machine on the logical form triples to learn automatic creation of document summaries. Our experiments with the DUC 2002 data show that increasing the set of attributes to include semantic properties and topological graph properties of logical triples yields statistically significant improvement of the F1 measure for the extracted summaries.