# Representation learning approach for understanding structured documents

**Authors:** Akkshita Trivedi, Sandeep Khanna, Santanu Chaudhury, Gaurav Harit

PMC · DOI: 10.1038/s41598-025-33642-y · Scientific Reports · 2025-12-26

## TL;DR

This paper introduces D-REEL, a new method for understanding document layouts by learning relationships between elements like text and figures.

## Contribution

The novel D-REEL framework and the Semantic Structural Congruence (SSC) metric for measuring document element relationships.

## Key findings

- D-REEL improves correlation accuracy and extraction performance on public datasets.
- The SSC metric achieves almost a 10% improvement on the PRIMA dataset.
- The method effectively handles diverse and irregular document layouts.

## Abstract

Current document understanding methods struggle with complex layouts and fail to grasp the deep logical connections between elements like text, figures, and tables. To address this, we introduce the Document Relationship Entity Embedding Learner (D-REEL). This is a novel representation learning framework designed to model intricate semantic relationships within documents. D-REEL works by generating extraction candidates for each article. It then learns dense vector representations (embeddings) for these candidates. By comparing these embeddings, the system accurately assesses semantic correlations between document fields. This allows it to effectively determine if articles are related, regardless of their position on the page. This approach uniquely combines spatial information with domain specific schemas. This enables precise extraction and robust correlation scoring, even across diverse and irregular document layouts. To quantify these connections, we also propose the Semantic Structural Congruence (SSC). This new metric uses location agnostic localization to measure relationships effectively. Experiments on public datasets show significant improvements in correlation accuracy and extraction performance. We achieved an average mAP increment of 2-3% and SSC improvement of almost 10% for the PRIMA dataset.

## Full-text entities

- **Genes:** VIT (vitrin) [NCBI Gene 5212] {aka VIT1}
- **Diseases:** CORD (OMIM:120970)
- **Chemicals:** D (MESH:D003903)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12847979/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12847979/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12847979/full.md

---
Source: https://tomesphere.com/paper/PMC12847979