DOREMI: Optimizing Long Tail Predictions in Document-Level Relation Extraction

Laura Menotti; Stefano Marchesin; Gianmaria Silvello

arXiv:2601.11190·cs.CL·January 19, 2026

DOREMI: Optimizing Long Tail Predictions in Document-Level Relation Extraction

Laura Menotti, Stefano Marchesin, Gianmaria Silvello

PDF

Open Access

TL;DR

DOREMI is an iterative framework that improves document-level relation extraction by actively selecting informative examples and incorporating minimal manual annotations to address long-tail relation distribution issues.

Contribution

It introduces a scalable, model-agnostic method that enhances rare relation predictions without relying on noisy data or heuristics.

Findings

01

Improves performance on rare relations in DocRE tasks.

02

Reduces reliance on large-scale noisy data.

03

Enhances model robustness and generalization.

Abstract

Document-Level Relation Extraction (DocRE) presents significant challenges due to its reliance on cross-sentence context and the long-tail distribution of relation types, where many relations have scarce training examples. In this work, we introduce DOcument-level Relation Extraction optiMizing the long taIl (DOREMI), an iterative framework that enhances underrepresented relations through minimal yet targeted manual annotations. Unlike previous approaches that rely on large-scale noisy data or heuristic denoising, DOREMI actively selects the most informative examples to improve training efficiency and robustness. DOREMI can be applied to any existing DocRE model and is effective at mitigating long-tail biases, offering a scalable solution to improve generalization on rare relations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification