Efficient Scientific Full Text Classification: The Case of EICAT Impact   Assessments

Marc Felix Brinner; Sina Zarrie{\ss}

arXiv:2502.06551·cs.CL·February 11, 2025

Efficient Scientific Full Text Classification: The Case of EICAT Impact Assessments

Marc Felix Brinner, Sina Zarrie{\ss}

PDF

Open Access

TL;DR

This paper presents methods for efficient scientific full text classification using small BERT models and large language models, focusing on sentence selection strategies to reduce input size and improve accuracy in impact assessments.

Contribution

It introduces a novel dataset and demonstrates that sentence selection and repeated sampling enhance classification performance and efficiency over full-text models.

Findings

01

Sentence selection improves model accuracy and efficiency.

02

Repeated sampling of shorter inputs further boosts performance.

03

Models trained on selected sentences outperform full-text models.

Abstract

This study explores strategies for efficiently classifying scientific full texts using both small, BERT-based models and local large language models like Llama-3.1 8B. We focus on developing methods for selecting subsets of input sentences to reduce input size while simultaneously enhancing classification performance. To this end, we compile a novel dataset consisting of full-text scientific papers from the field of invasion biology, specifically addressing the impacts of invasive species. These papers are aligned with publicly available impact assessments created by researchers for the International Union for Conservation of Nature (IUCN). Through extensive experimentation, we demonstrate that various sources like human evidence annotations, LLM-generated annotations or explainability scores can be used to train sentence selection models that improve the performance of both encoder-…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques