A Strategy to Combine 1stGen Transformers and Open LLMs for Automatic   Text Classification

Claudio M. V. de Andrade; Washington Cunha; Davi Reis; Adriana Silvina; Pagano; Leonardo Rocha; Marcos Andr\'e Gon\c{c}alves

arXiv:2408.09629·cs.CL·August 20, 2024

A Strategy to Combine 1stGen Transformers and Open LLMs for Automatic Text Classification

Claudio M. V. de Andrade, Washington Cunha, Davi Reis, Adriana Silvina, Pagano, Leonardo Rocha, Marcos Andr\'e Gon\c{c}alves

PDF

Open Access

TL;DR

This paper proposes a confidence-based hybrid strategy combining first-generation transformers and open LLMs for cost-effective sentiment analysis, outperforming standalone models and reducing costs.

Contribution

It introduces a novel confidence-based method to integrate 1stTRs and open LLMs, improving performance while lowering computational costs.

Findings

01

Hybrid approach outperforms individual models in sentiment analysis

02

Cost savings achieved by using less expensive models for high-confidence cases

03

Close performance to fine-tuned LLMs at a fraction of the cost

Abstract

Transformer models have achieved state-of-the-art results, with Large Language Models (LLMs), an evolution of first-generation transformers (1stTR), being considered the cutting edge in several NLP tasks. However, the literature has yet to conclusively demonstrate that LLMs consistently outperform 1stTRs across all NLP tasks. This study compares three 1stTRs (BERT, RoBERTa, and BART) with two open LLMs (Llama 2 and Bloom) across 11 sentiment analysis datasets. The results indicate that open LLMs may moderately outperform or match 1stTRs in 8 out of 11 datasets but only when fine-tuned. Given this substantial cost for only moderate gains, the practical applicability of these models in cost-sensitive scenarios is questionable. In this context, a confidence-based strategy that seamlessly integrates 1stTRs with open LLMs based on prediction certainty is proposed. High-confidence documents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Linear Layer · Attention Dropout · Dropout · WordPiece · Residual Connection · Layer Normalization · Multi-Head Attention · Linear Warmup With Linear Decay