GraphLSS: Integrating Lexical, Structural, and Semantic Features for   Long Document Extractive Summarization

Margarita Bugue\~no; Hazem Abou Hamdan; Gerard de Melo

arXiv:2410.21315·cs.CL·October 30, 2024

GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization

Margarita Bugue\~no, Hazem Abou Hamdan, Gerard de Melo

PDF

Open Access 1 Video

TL;DR

GraphLSS introduces a novel heterogeneous graph model for long document extractive summarization that combines lexical, structural, and semantic features without auxiliary models, achieving competitive results.

Contribution

It presents a new graph construction method that integrates multiple feature types for summarization, eliminating the need for external tools or additional learning models.

Findings

01

Outperforms recent non-graph models on benchmark datasets

02

Achieves competitive results with top graph-based methods

03

Uses a simplified, intuitive graph structure without auxiliary models

Abstract

Heterogeneous graph neural networks have recently gained attention for long document summarization, modeling the extraction as a node classification task. Although effective, these models often require external tools or additional machine learning models to define graph components, producing highly complex and less intuitive structures. We present GraphLSS, a heterogeneous graph construction for long document extractive summarization, incorporating Lexical, Structural, and Semantic features. It defines two levels of information (words and sentences) and four types of edges (sentence semantic similarity, sentence occurrence order, word in sentence, and word semantic similarity) without any need for auxiliary learning models. Experiments on two benchmark datasets show that GraphLSS is competitive with top-performing graph-based methods, outperforming recent non-graph models. We release…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GraphLSS: Integrating Lexical, Structural, and Semantic Features for Long Document Extractive Summarization· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management