Can Large Language Models Follow Concept Annotation Guidelines? A Case   Study on Scientific and Financial Domains

Marcio Fonseca; Shay B. Cohen

arXiv:2311.08704·cs.CL·June 28, 2024·1 cites

Can Large Language Models Follow Concept Annotation Guidelines? A Case Study on Scientific and Financial Domains

Marcio Fonseca, Shay B. Cohen

PDF

Open Access 1 Repo 1 Video

TL;DR

This study evaluates large language models' ability to follow concept annotation guidelines in sentence labeling tasks across scientific and financial domains, revealing scale and proprietary model advantages and highlighting gaps in open-source models.

Contribution

The paper introduces a systematic evaluation of LLMs' capacity to follow in-context concept guidelines, comparing open-source and proprietary models across different contexts.

Findings

01

Larger models improve task performance with concept definitions.

02

Proprietary models recognize nonsensical guidelines better.

03

Fine-tuning outperforms scale increases in model effectiveness.

Abstract

Although large language models (LLMs) exhibit remarkable capacity to leverage in-context demonstrations, it is still unclear to what extent they can learn new concepts or facts from ground-truth labels. To address this question, we examine the capacity of instruction-tuned LLMs to follow in-context concept guidelines for sentence labeling tasks. We design guidelines that present different types of factual and counterfactual concept definitions, which are used as prompts for zero-shot sentence classification tasks. Our results show that although concept definitions consistently help in task performance, only the larger models (with 70B parameters or more) have limited ability to work under counterfactual contexts. Importantly, only proprietary models such as GPT-3.5 and GPT-4 can recognize nonsensical guidelines, which we hypothesize is due to more sophisticated alignment methods.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thefonseca/concept-guidelines
noneOfficial

Videos

Can Large Language Models Follow Concept Annotation Guidelines? A Case Study on Scientific and Financial Domains· underline

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Machine Learning in Materials Science

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Linear Layer · Adam · Softmax · Attention Is All You Need · Attention Dropout · Weight Decay · Cosine Annealing