# Enhancing biomedical relation extraction through data-centric and preprocessing-robust ensemble learning approach

**Authors:** Wilailack Meesawad, Jen-Chieh Han, Chun-Yu Hsueh, Yu Zhang, Hsi-Chuan Hung, Richard Tzong-Han Tsai

PMC · DOI: 10.1093/database/baae127 · Database: The Journal of Biological Databases and Curation · 2025-05-22

## TL;DR

This paper presents a biomedical relation extraction system using an ensemble learning approach that outperforms existing benchmarks.

## Contribution

The study introduces a data-centric and preprocessing-robust ensemble learning method for biomedical relation extraction.

## Key findings

- The system outperforms the established benchmark score in the BioRED Track.
- A data-centric approach significantly improves model performance and robustness.
- The use of PubMedBERT and Max Rule ensemble enhances relation extraction accuracy.

## Abstract

The paper describes our biomedical relation extraction system, which is designed to participate in the BioCreative VIII challenge Track 1: BioRED Track, which emphasizes the relation extraction from biomedical literature. Our system employs an ensemble learning method, leveraging the PubTator API in conjunction with multiple pretrained bidirectional encoder representations from transformer (BERT) models. Various preprocessing inputs are incorporated, encompassing prompt questions, entity ID pairs, and co-occurrence contexts. To enhance model comprehension, special tokens and boundary tags are incorporated. Specifically, we utilize PubMedBERT alongside the Max Rule ensemble learning mechanism to amalgamate outputs from diverse classifiers. Our findings surpass the established benchmark score, thereby providing a robust benchmark for evaluating performance in this task. Moreover, our study introduces and demonstrates the effectiveness of a data-centric approach, emphasizing the significance of prioritizing high-quality data instances in enhancing model performance and robustness.

## Full-text entities

- **Genes:** GPT (glutamic--pyruvic transaminase) [NCBI Gene 2875] {aka AAT1, ALT, ALT1, GPT1, SGPT}
- **Chemicals:** BioREx (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12097206/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12097206/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/PMC12097206/full.md

---
Source: https://tomesphere.com/paper/PMC12097206