BERT-VBD: Vietnamese Multi-Document Summarization Framework
Tuan-Cuong Vuong, Trang Mai Xuan, and Thien Van Luong

TL;DR
This paper introduces a Vietnamese multi-document summarization framework that combines extractive and abstractive methods using BERT and VBD-LLaMA2-7B-50b, achieving superior ROUGE scores on the VN-MDS dataset.
Contribution
It proposes a novel two-component pipeline integrating modified BERT for extraction and VBD-LLaMA2 for abstraction tailored to Vietnamese MDS.
Findings
Achieved ROUGE-2 score of 39.6% on VN-MDS dataset
Outperformed existing state-of-the-art baselines
Demonstrated effectiveness of combined extractive-abstractive approach
Abstract
In tackling the challenge of Multi-Document Summarization (MDS), numerous methods have been proposed, spanning both extractive and abstractive summarization techniques. However, each approach has its own limitations, making it less effective to rely solely on either one. An emerging and promising strategy involves a synergistic fusion of extractive and abstractive summarization methods. Despite the plethora of studies in this domain, research on the combined methodology remains scarce, particularly in the context of Vietnamese language processing. This paper presents a novel Vietnamese MDS framework leveraging a two-component pipeline architecture that integrates extractive and abstractive techniques. The first component employs an extractive approach to identify key sentences within each document. This is achieved by a modification of the pre-trained BERT network, which derives…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Advanced Text Analysis Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Softmax · Layer Normalization · Dropout · Attention Dropout · WordPiece · Dense Connections · Residual Connection · Linear Layer
