Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness
Lars Hillebrand, Prabhupad Pradhan, Christian Bauckhage, Rafet Sifa

TL;DR
This paper presents a novel pre-training technique called pointer-guided segment ordering that improves large language models' understanding of paragraph structures by restoring shuffled text segments, leading to better performance in document classification tasks.
Contribution
The paper introduces a self-attention-based pointer network for segment ordering pre-training and a dynamic sampling fine-tuning method to enhance document-level contextual understanding in language models.
Findings
Achieves state-of-the-art results in sequential text classification.
Enhances model understanding of document structure and coherence.
Demonstrates effectiveness across scientific and financial datasets.
Abstract
We introduce "pointer-guided segment ordering" (SO), a novel pre-training technique aimed at enhancing the contextual understanding of paragraph-level text representations in large language models. Our methodology leverages a self-attention-driven pointer network to restore the original sequence of shuffled text segments, addressing the challenge of capturing the structural coherence and contextual dependencies within documents. This pre-training approach is complemented by a fine-tuning methodology that incorporates dynamic sampling, augmenting the diversity of training instances and improving sample efficiency for various downstream applications. We evaluate our method on a diverse set of datasets, demonstrating its efficacy in tasks requiring sequential text classification across scientific literature and financial reporting domains. Our experiments show that pointer-guided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsSparse Evolutionary Training · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Softmax · [LivE@PeRson]How do I talk to a real person at Expedia? · Pointer Network
