Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence
Kelvin Lo, Yuan Jin, Weicong Tan, Ming Liu, Lan Du, Wray Buntine

TL;DR
This paper introduces Transformer$^2$, a hierarchical transformer framework that leverages pre-trained models for improved neural text segmentation and topic coherence, outperforming existing methods.
Contribution
It presents a novel transformer-over-transformer architecture utilizing pre-trained sentence encoders and a multi-task training approach for better segmentation accuracy.
Findings
Outperforms state-of-the-art segmentation models on semantic coherence.
Pre-trained knowledge enhances segmentation performance.
Language-specific pre-trained encoders yield better results than domain-specific ones.
Abstract
This paper proposes a transformer over transformer framework, called Transformer, to perform neural text segmentation. It consists of two components: bottom-level sentence encoders using pre-trained transformers, and an upper-level transformer-based segmentation model based on the sentence embeddings. The bottom-level component transfers the pre-trained knowledge learned from large external corpora under both single and pair-wise supervised NLP tasks to model the sentence embeddings for the documents. Given the sentence embeddings, the upper-level transformer is trained to recover the segmentation boundaries as well as the topic labels of each sentence. Equipped with a multi-task loss and the pre-trained knowledge, Transformer can better capture the semantic coherence within the same segments. Our experiments show that (1) Transformer manages to surpass state-of-the-art text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
