Can Large Language Models Follow Concept Annotation Guidelines? A Case Study on Scientific and Financial Domains
Marcio Fonseca, Shay B. Cohen

TL;DR
This study evaluates large language models' ability to follow concept annotation guidelines in sentence labeling tasks across scientific and financial domains, revealing scale and proprietary model advantages and highlighting gaps in open-source models.
Contribution
The paper introduces a systematic evaluation of LLMs' capacity to follow in-context concept guidelines, comparing open-source and proprietary models across different contexts.
Findings
Larger models improve task performance with concept definitions.
Proprietary models recognize nonsensical guidelines better.
Fine-tuning outperforms scale increases in model effectiveness.
Abstract
Although large language models (LLMs) exhibit remarkable capacity to leverage in-context demonstrations, it is still unclear to what extent they can learn new concepts or facts from ground-truth labels. To address this question, we examine the capacity of instruction-tuned LLMs to follow in-context concept guidelines for sentence labeling tasks. We design guidelines that present different types of factual and counterfactual concept definitions, which are used as prompts for zero-shot sentence classification tasks. Our results show that although concept definitions consistently help in task performance, only the larger models (with 70B parameters or more) have limited ability to work under counterfactual contexts. Importantly, only proprietary models such as GPT-3.5 and GPT-4 can recognize nonsensical guidelines, which we hypothesize is due to more sophisticated alignment methods.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Machine Learning in Materials Science
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Linear Layer · Adam · Softmax · Attention Is All You Need · Attention Dropout · Weight Decay · Cosine Annealing
