DEPTH: Discourse Education through Pre-Training Hierarchically

Zachary Bamberger; Ofek Glick; Chaim Baskin; Yonatan Belinkov

arXiv:2405.07788·cs.CL·February 17, 2026

DEPTH: Discourse Education through Pre-Training Hierarchically

Zachary Bamberger, Ofek Glick, Chaim Baskin, Yonatan Belinkov

PDF

Open Access 1 Repo 1 Video

TL;DR

DEPTH is a hierarchical pre-training approach for language models that enhances discourse understanding by learning sentence-level representations with novel objectives, leading to improved performance on discourse and NLU tasks.

Contribution

It introduces DEPTH, a discourse-oriented pre-training method that combines hierarchical sentence representations with novel objectives, improving discourse capabilities of language models.

Findings

01

DEPTH outperforms T5 in span-corruption loss.

02

DEPTH learns faster and better on discourse tasks.

03

Minimal impact on other NLU capabilities.

Abstract

Language Models (LMs) struggle with linguistic understanding at the discourse level, even though discourse patterns such as coherence, cohesion, and narrative flow are prevalent in their pre-training data. To improve the discourse capabilities of LMs already at the pre-training stage, we introduce DEPTH, an encoder-decoder model that learns latent representations for sentences using a discourse-oriented pre-training objective. DEPTH combines hierarchical sentence representations with two objectives: (1) Sentence Un-Shuffling, and (2) Span-Corruption. Our approach trains the model to represent both sub-word-level and sentence-level dependencies over a pre-training corpora. When trained either from scratch or continuing from a pre-trained T5 checkpoint, DEPTH learns semantic and discourse-level representations faster than T5, outperforming it in span-corruption loss despite the additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zbambergerNLP/depth
pytorchOfficial

Videos

DEPTH: Discourse Education through Pre-Training Hierarchically· underline

Taxonomy

TopicsDiscourse Analysis in Language Studies · EFL/ESL Teaching and Learning

MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Multi-Head Attention · Dense Connections · Attention Dropout · Adafactor · SentencePiece