Topic-Selective Graph Network for Topic-Focused Summarization
Shi Zesheng, Zhou Yucheng

TL;DR
This paper introduces a topic-selective graph network with a topic-arc recognition objective to improve topic-focused summarization, effectively filtering non-relevant information and achieving state-of-the-art results on benchmark datasets.
Contribution
The paper proposes a novel topic-arc recognition objective and a topic-selective graph network to enhance topic-focused summarization beyond prompt-guided methods.
Findings
Achieves state-of-the-art performance on NEWTS and COVIDET datasets.
Effectively discriminates relevant topics for improved summarization.
Outperforms existing prompt-guided topic summarization methods.
Abstract
Due to the success of the pre-trained language model (PLM), existing PLM-based summarization models show their powerful generative capability. However, these models are trained on general-purpose summarization datasets, leading to generated summaries failing to satisfy the needs of different readers. To generate summaries with topics, many efforts have been made on topic-focused summarization. However, these works generate a summary only guided by a prompt comprising topic words. Despite their success, these methods still ignore the disturbance of sentences with non-relevant topics and only conduct cross-interaction between tokens by attention module. To address this issue, we propose a topic-arc recognition objective and topic-selective graph network. First, the topic-arc recognition objective is used to model training, which endows the capability to discriminate topics for the model.…
| Method | R-1 | R-2 | R-L | Topic Focus |
|---|---|---|---|---|
| BART [2] | 16.48 | 0.75 | 11.71 | 0.0080 |
| BART+T-W [2] | 31.14 | 10.46 | 19.94 | 0.1375 |
| BART+CNN-DM [2] | 26.23 | 7.24 | 17.12 | 0.1338 |
| T5+T-W [2] | 31.78 | 10.83 | 20.54 | 0.1386 |
| T5+CNN-DM [2] | 27.87 | 8.55 | 18.41 | 0.1305 |
| ProphetNet+T-W [2] | 31.91 | 10.8 | 20.66 | 0.1362 |
| ProphetNet+CNN-DM [2] | 28.71 | 8.53 | 18.69 | 0.1295 |
| PPLM [2] | 29.63 | 9.08 | 18.76 | 0.1482 |
| Ours | 34.24 | 12.65 | 23.08 | 0.1512 |
| Method | Ang | Dis | Fea | Joy | Sad | Tru | Ant | AVG. |
|---|---|---|---|---|---|---|---|---|
| Metric: R-L | ||||||||
| BART [31] | 0.161 | 0.138 | 0.164 | 0.149 | 0.157 | 0.158 | 0.164 | 0.156 |
| PEGASUS-FT [31] | 0.185 | 0.155 | 0.199 | 0.158 | 0.173 | 0.164 | 0.193 | 0.175 |
| BART-FT [31] | 0.190 | 0.159 | 0.206 | 0.165 | 0.177 | 0.162 | 0.198 | 0.180 |
| BART-FT-JOINT [31] | 0.190 | 0.158 | 0.203 | 0.163 | 0.175 | 0.165 | 0.196 | 0.179 |
| Ours | 0.202 | 0.177 | 0.223 | 0.208 | 0.166 | 0.201 | 0.186 | 0.195 |
| Metric: BERT Score | ||||||||
| BART[31] | 0.587 | 0.558 | 0.529 | 0.551 | 0.559 | 0.571 | 0.558 | 0.559 |
| PEGASUS-FT [31] | 0.681 | 0.713 | 0.739 | 0.683 | 0.705 | 0.663 | 0.736 | 0.703 |
| BART-FT [31] | 0.705 | 0.695 | 0.748 | 0.699 | 0.718 | 0.653 | 0.749 | 0.710 |
| BART-FT-JOINT [31] | 0.701 | 0.706 | 0.729 | 0.694 | 0.713 | 0.659 | 0.746 | 0.707 |
| Ours | 0.885 | 0.880 | 0.889 | 0.888 | 0.879 | 0.881 | 0.881 | 0.883 |
| Method | R-1 | R-2 | R-L | Topic Foucs |
|---|---|---|---|---|
| Ours | 34.24 | 12.65 | 23.08 | 0.1512 |
| Ours w/o TSGN | 34.12 | 12.08 | 22.61 | 0.1407 |
| Ours w/o TAR | 34.30 | 11.97 | 22.27 | 0.1453 |
| Ours w/o TSGN, TAR | 31.14 | 10.46 | 19.94 | 0.1375 |
| Type of evaluation | Win Percentage | |
|---|---|---|
| BART+T-W | Ours | |
| Topic Relevance | 0.27 | 0.73 |
| Content Consistency | 0.36 | 0.64 |
| Logic | 0.41 | 0.59 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
11institutetext: 1Nankai University 2University of Technology Sydney
11email: [email protected], 11email: [email protected]
Topic-Selective Graph Network for Topic-Focused Summarization
Zesheng Shi1
Yucheng Zhou2
Abstract
Due to the success of the pre-trained language model (PLM), existing PLM-based summarization models show their powerful generative capability. However, these models are trained on general-purpose summarization datasets, leading to generated summaries failing to satisfy the needs of different readers. To generate summaries with topics, many efforts have been made on topic-focused summarization. However, these works generate a summary only guided by a prompt comprising topic words. Despite their success, these methods still ignore the disturbance of sentences with non-relevant topics and only conduct cross-interaction between tokens by attention module. To address this issue, we propose a topic-arc recognition objective and topic-selective graph network. First, the topic-arc recognition objective is used to model training, which endows the capability to discriminate topics for the model. Moreover, the topic-selective graph network can conduct topic-guided cross-interaction on sentences based on the results of topic-arc recognition. In the experiments, we conduct extensive evaluations on NEWTS and COVIDET datasets. Results show that our methods achieve state-of-the-art performance.
Keywords:
Text Summarization Topic Model Graph Neural Network.
1 Introduction
Text summarization aims to compress a long article into a short and clear summary, which is a fundamental task in many NLP applications. With the success of sequence-to-sequence (seq2seq) language models, it is widely integrated into many real-world applications, e.g., document snippets generation in search engines [25], automatic news summaries [1] and legal document summarization [14]. In recent years, text summarization has been an essential area in academia and industry.
With advanced deep learning, summarization models are generally designed based on the seq2seq framework [5]. Recently, many pre-trained language models (PLM) are proposed by pre-training a Transformer [26] in a large-scale unlabeled corpus in a self-supervised manner. Since the PLM encapsulates large-scale language prior knowledge, it shows an excellent generative capability on many natural language generation tasks, e.g., caption generation [33], machine translation [19] and event generation [35]. Therefore, PLM-based models also have become a mainstream paradigm for text summarization. However, since the training summarization model by only finetuning is still somewhat insufficient, researchers propose many methods to improve the PLM-based summarization model, e.g., contrastive learning [23] and information retrieval [4].
Although it is very successful in text summarization based on PLM, these methods suffer from a non-focused problem. Since most existing summarization models are trained on general-purpose datasets, they focus on generating general-purpose summaries. A general-purpose summary fails to satisfy the needs of different readers and reflects the full range of content of the article. Recently, many text generation methods focus on controllable generation processes guided by sentiment polarity [24] and specific topic distributions [13]. However, these methods lack effective evaluation due to no topic-focused summarization dataset existing. Therefore, Bahrainian et al.[2] introduce a NEWs Topic-focused Summarization (NEWTS) corpus to close this gap and propose prompt-based methods to improve PLM-based summarization methods.
Despite the success of these works, they neglect topic-guided cross-interaction between sentences. As shown in Figure 1, there are multiple sentences with different topics in an article. To generate a summary more relevant to a given topic, a topic-focused summarization model is required to distinguish topics in sentences and conducts cross-interaction between sentences with the given topic. In contrast, existing methods [22] generate a summary only guided by a prompt comprising topic words, which ignores the disturbance of sentences with non-relevant topics and only conducts cross-interaction between tokens by attention module.
Due to sentences with multiple topics in an article, we propose a topic-arc recognition objective to distinguish the topic of each sentence in the article. Since the summary has a definite topic, we leverage summaries and their topics on an article to train the model that can predict the topic of sentences in the article. Moreover, summaries selected from an article have a similar context, which pushes the model to focus on the topic instead of bias in content. In addition, we propose a topic-selective graph network, which can conduct topic-guided cross-interaction on sentences in the article. Specifically, we first construct a graph based on sentence and topic candidates sampled from prediction results of topic-arc recognition on the article. Then, the graph nodes are updated by relational graph convolution layers. Lastly, the updated node representations are delivered to the decoder for summary generation.
In the experiments, we conduct extensive evaluations on two datasets, i.e., NEWTS [2] and COVIDET [31]. Experimental results show that our method outperforms other strong competitors and achieves state-of-the-art performance on NEWTS and COVIDET. Moreover, we further analyze the effectiveness of our method by providing additional quantitative and qualitative results.
2 Related Work
2.1 PLM-based Summarization
With the rise of seq2seq models, researchers are increasingly interested in text summarization. The original topic-based model was the TOPIARY model proposed by Li et al.[11] in 2004, which combined language-driven compression techniques and unsupervised topic detection methods. With the advance of the pre-training technique [16], the development of PTMs in text summarization is also booming [10]. There are many sequence-to-sequence pre-trained language models that show their powerful capability for summarization, e.g., BART [9], T5 [18] and Prophetnet [15]. In addition, to be better compatible with discriminative and generation tasks, Dong et al. [7] propose UniLM that can be used in natural language understanding and generation tasks. The model also shows excellent summarization capability.
2.2 Topic-Guided Summarization
With the advance of text summarization, increasing researchers are interested in generating topic-specific summaries. Initially, the LDA model is used to guide the topic of summary [11]. For example, Xing et al.[29] propose a topic aware seq2seq model named Twitter LDA for a response, which introduces topic information through a joint attention mechanism and a bias generation probability. Another work, CATS [3], is a neural sequence-to-sequence model based on an attentional encoder-decoder architecture, which introduces a new attention mechanism named topic attention controlled by an unsupervised topic model. In PTM-based models, the Plug and Play Language Model (PPLM) [6] is based on GPT-2 [17]. In addition, the BART-FT-JOINT proposed in [31] can simultaneously use a sentiment inducer for sentiment-specific summary generation.
2.3 Graph Neural Network
Graph neural network [20] has been valued in the field of deep learning for its excellent processing ability on unstructured data and node-centric information aggregation mode. With the advance of graph neural networks, there are many graph networks with special structures, e.g., GCN [21], GAT [27], HAN [30] and r-GCN [21]. Moreover, GNN is often used for downstream tasks such as text classification, information extraction, and text generation. In text summarization, Wang et al.[28] propose a heterogeneous graph-based neural network for extracting summaries, which contains semantic nodes of different granularity levels except sentences. These extra nodes act as “intermediaries” between sentences and enrich cross-sentence relations. The introduction of document nodes allows the graph structure to be flexibly extended from a single document setup to multiple documents. Another work [8] proposes a multiplex graph summary (Multi-GraS) model based on multiplex graph convolutional networks that can be used to extract text summaries. This model not only considers Various types of inter-sentential relations (such as semantic similarity and natural connection), and intra-sentential relations (such as semantic and syntactic relations between words) are also modeled.
3 Method
This section starts with a base topic-focused summarization model, followed by our proposed methods, i.e., topic-arc recognition and topic-selective graph network. Lastly, we elaborate on the details of our model training.
3.1 Base Topic-Focused Summarization Model
Topic-focused summarization aims to generate a topic-relevant summary based on a long article. Since pre-trained language models (PLMs) based on Transformer show powerful capability for text generation, a recent trend is to finetune a PLM as a summarization model. In this work, we first introduce a base topic-focused summarization model based on PLM. Specifically, given an article and topic words corresponding to topic , we first use topic words as a prefix prompt for the article and pass them into the Transformer encoder, i.e.,
[TABLE]
where denotes token representations generated by the Transformer encoder, and . is the length of the input.
Next, we deliver the token representations and the gold summary into the Transformer decoder, i.e.,
[TABLE]
where and is a probability distribution over vocabulary . is the length of the summary.
Lastly, we train the pre-trained Transformer by maximum likelihood estimation, and its loss function is defined as:
[TABLE]
where is -th word in the ground truth summary .
3.2 Topic-Arc Recognition
Due to multiple sentences with different topics in an article, distinguishing topics of sentences in an article is essential to topic-focused summarization. To recognize the sentence’s topic in articles, we propose a topic-arc recognition (TAR) objective, as shown in Figure 2. In this objective, we concatenate two summaries ( and ) with different topics ( and ) from an article and pass them into the Transformer encoder, i.e.,
[TABLE]
Next, we conduct mean pooling to token representations to obtain sentence representations , and and denote number of sentences in two summaries, respectively. Then, we deliver sentence representations to a multilayer perceptron (MLP) to predict the topic of each sentence, i.e.,
[TABLE]
where denotes the topic probability distribution of -th sentence on all topic categories. Lastly, we optimize the Transformer encoder and MLP by a cross-entropy loss, i.e.,
[TABLE]
where is the ground truth topic category of -th sentence.
3.3 Summarization with Topic-Selective Graph Network
As mentioned in Figure 1, sentences with the same topic in an article are usually not connected together. To integrate semantic information on the same topic, we propose a topic-selective graph network (TSGN) to conduct sentence-level cross-interaction through topic nodes as bridge. As shown in Figure 3, we first extract token representations in an article via Equ.1. Next, we conduct a mean pooling operation to token representations to obtain sentence representations . Then, we deliver sentence representations to the MLP in Equ.5 to predict the topic probability distribution of each sentence. According to the ranking of probability , we select top- topics for each sentence as its topic candidates.
To conduct topic-guided cross-interaction, we construct a semantic graph, and sentence and topic nodes consist of sentence representations and topic representations that are derived from an embedding layer; and is number of topic categories. Due to heterogeneous nodes of sentence and topic, we connect these nodes through edges with different relations . The semantic graph comprises two rules of connecting nodes:
- We take the edge to connect sentence nodes in sentence order.
- Each sentence node is only connected to its topic candidate nodes through edge , which can alleviate the disturbance of sentences with non-relevant topics. Moreover, we use relational graph convolution layer [21, 34] to update sentence representations, i.e.,
[TABLE]
where denote a set of all edges types, and is the neighborhood of node under relation . is number of the relational graph convolution layer. Updated sentence representations are represented as . Furthermore, we take plus into the corresponding token representations and pass them to Equ. 2 to predict probability distributions . is length of the summary. Lastly, we train our model by cross-entropy loss, which is defined as:
[TABLE]
where is -th word in the ground truth summary .
3.4 Training
During training, we train our model by jointly topic-arc recognition and topic-selective graph network and minimize their loss functions:
[TABLE]
4 Experiments
4.1 Dataset and Evaluation Metrics
In the experiments, we train and evaluate our approach on two datasets, i.e., NEWTS [2] and COVIDET [31]. The NEWTS dataset is based on the famous CNN/Dailymail dataset and annotated by online crowd-sourcing. Every source article is paired with two summaries focusing on different topics and provides topic words to denote the topic. The dataset consists of 2,400 and 600 samples in training and test sets. The results of the test set are reported by ROUGE (i.e., R-1, R-2, R-L) [12] and Topic Focus [2]. The COVIDET dataset is sourced from 1,883 English Reddit posts about the COVID-19 pandemic. Each post is annotated with 7 fine-grained emotion labels; for each emotion, annotators provided a concise, abstractive summary describing the triggers of the emotion. We follow the official data split with 2,234/1,526 samples in training/test sets. The evaluation metrics are ROUGE-L (R-L) and BERT Score [32].
4.2 Experimental Setting
The pre-trained transformer we used is BART-base. We use AdamW optimizer with learning rate of 5 , and learning rate of the relational graph convolution layer is 1 . The weight decay and the dropout rate are both 1.0. The maximum training epoch, sentence number and batch size are set to 3, 60 and 2. The length of input and output are set to 1024 and 128. We use NLTK to separate sentences. In Equ.9, and are set to 1.0 and 0.8, respectively. Experiments are conducted on an NVIDIA RTX3080 GPU, and training time is around 3 hours.
4.3 Main Results
Comparison results of our methods and other strong competitors are shown in Table 1 and Table 2. From the tables, we find two observations.
- Methods using topic words as a prompt outperform that of no prompt, which demonstrates that the topic words as prompt are significant in generating topic-focused summarization.
- our approach is superior to others and achieves state-of-the-art, demonstrating the effectiveness of our method. Meanwhile, it shows that making a topic-guided sentence-level cross-interaction can improve topic-focused summarization by capturing the topic-relevant content in the article.
4.4 Ablation Study
As shown in Table 3, we conduct an ablation study on our method. First, we remove the topic-arc recognition (TAR) objective, and the results show performance drops. It demonstrates that the TAR objective can distinguish the topic category of the sentence, which improve the capability of topic awareness in the model. Moreover, we investigate the effectiveness of TSGN by removing it from our method. Results show that the performance drops, which demonstrates that TSGN can improve topic-focused summarization through cross-sentence interaction. Lastly, we remove TAR and TSGN to verify the effectiveness of our method. The results show a large drop, which further supports the effectiveness of our method.
4.5 Impact of Topic Node
It can be seen from Figure 4 that the best performance is achieved when the number of topics selected in TSGN is 3. Moreover, the performance drops as the number of topic numbers increases or decreases. The reason is that the topic was sampled from the distribution with some noise. Introducing too few topic nodes (e.g., 1 node) into TSGN leads to missing the correct topic. In addition, to ensure that the correct topic node is introduced, the introduction of too many topic nodes (e.g., more than 3) leads to too much noise into the TSGN, which is also not conducive to the performance of the model.
4.6 Case Study
To conduct an extensive evaluation of our model, we take a qualitative comparison of our model. We randomly sample some generated summaries and their article and topic words, as are shown in Figure 5 In articles, the green area denotes that the sentence’s topic is recognized as the given topic. Moreover, we can observe that the recognized sentences and given topics are relevant, which verifies the effectiveness of the TAR objective. In addition, we can see that the generated summaries comprise the sentence content of the green area, which demonstrate the effectiveness of TSGN.
4.7 Human Evaluation
To comprehensively evaluate our method, d, we conducted a human evaluation to compare our model and BART+T-W [2]. We considered the topic relevance, content consistency and logic. Therefore, there are three types of human evaluation. We randomly sampled 150 samples from the test set, and each sample includes an article, topic words and generated summary. We displayed these samples to 3 recruited annotators. They need to distinguish which summaries are better quality based on the type of evaluation. As shown in Table 4, results show that the performance of our model is significantly better than BART+T-W.
5 Conclusion
In this work, we dive into the limitations of previous topic-guided summarization methods, i.e., these methods still ignore the disturbance of sentences with non-relevant topics and only conduct cross-interaction between tokens by attention module. To address the limitations and improve the summarization model, we propose a topic-arc recognition objective and topic-selective graph network. The topic-arc recognition objective aims to discriminate topics of sentences. Moreover, the topic-selective graph network conducts topic-guided cross-interaction on sentences based on the results of topic-arc recognition. Experimental results show that our methods achieve state-of-the-art performance on NEWTS and COVIDET datasets.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Ahuja, O., Xu, J., Gupta, A., Horecka, K., Durrett, G.: ASPECTNEWS: aspect-oriented summarization of news documents. In: ACL 2022,. pp. 6494–6506. Association for Computational Linguistics (2022)
- 2[2] Bahrainian, S.A., Feucht, S., Eickhoff, C.: NEWTS: A corpus for news topic-focused summarization. In: Findings of the Association for Computational Linguistics: ACL 2022. pp. 493–503. Association for Computational Linguistics, Dublin, Ireland (2022)
- 3[3] Bahrainian, S.A., Zerveas, G., Crestani, F., Eickhoff, C.: Cats: Customizable abstractive topic-based summarization. ACM Trans. Inf. Syst. 40 (1) (oct 2021)
- 4[4] Bouras, C., Tsogkas, V.: Improving text summarization using noun retrieval techniques. In: KES 2008,. Lecture Notes in Computer Science, vol. 5178, pp. 593–600. Springer (2008)
- 5[5] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. ar Xiv preprint ar Xiv:1406.1078 (2014)
- 6[6] Dathathri, S., Madotto, A., Lan, J., Hung, J., Frank, E., Molino, P., Yosinski, J., Liu, R.: Plug and play language models: A simple approach to controlled text generation. ar Xiv preprint ar Xiv:1912.02164 (2019)
- 7[7] Dong, L., Yang, N., Wang, W., Wei, F., Liu, X., Wang, Y., Gao, J., Zhou, M., Hon, H.W.: Unified language model pre-training for natural language understanding and generation. Advances in Neural Information Processing Systems 32 (2019)
- 8[8] Jing, B., You, Z., Yang, T., Fan, W., Tong, H.: Multiplex graph neural network for extractive text summarization. ar Xiv preprint ar Xiv:2108.12870 (2021)
