VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level   Relation Extraction

Khai Phan Tran; Wen Hua; Xue Li

arXiv:2412.13503·cs.CL·January 14, 2025

VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction

Khai Phan Tran, Wen Hua, Xue Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces VaeDiff-DocRE, a novel end-to-end data augmentation framework using VAE and Diffusion Models to improve document-level relation extraction, especially for imbalanced datasets.

Contribution

It presents a new VAE-based data augmentation method combined with diffusion models and a hierarchical training framework for better DocRE performance.

Findings

01

Outperforms state-of-the-art models on benchmark datasets.

02

Effectively addresses long-tail distribution in DocRE.

03

Enhances data for underrepresented relations.

Abstract

Document-level Relation Extraction (DocRE) aims to identify relationships between entity pairs within a document. However, most existing methods assume a uniform label distribution, resulting in suboptimal performance on real-world, imbalanced datasets. To tackle this challenge, we propose a novel data augmentation approach using generative models to enhance data from the embedding space. Our method leverages the Variational Autoencoder (VAE) architecture to capture all relation-wise distributions formed by entity pair representations and augment data for underrepresented relations. To better capture the multi-label nature of DocRE, we parameterize the VAE's latent space with a Diffusion Model. Additionally, we introduce a hierarchical training framework to integrate the proposed VAE-based augmentation module into DocRE systems. Experiments on two benchmark datasets demonstrate that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

khaitran22/vaediff-docre
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsDiffusion