Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and   Refinement

Joseph Shtok; Amit Alfassy; Foad Abo Dahood; Eliyahu Schwartz; Sivan; Doveh; Assaf Arbelle

arXiv:2410.10348·cs.CL·October 15, 2024

Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement

Joseph Shtok, Amit Alfassy, Foad Abo Dahood, Eliyahu Schwartz, Sivan, Doveh, Assaf Arbelle

PDF

Open Access

TL;DR

This paper introduces ADLR, a method that automatically generates and refines intermediate-step demonstrations to enhance in-context learning in LLMs, significantly improving performance in code-based table QA and mathematical reasoning tasks.

Contribution

The paper presents a novel automatic data labeling and refinement technique to generate intermediate demonstrations, reducing manual effort and improving LLM performance.

Findings

01

Achieved up to 5.5% performance gain in code-based table QA.

02

Effective automatic generation and filtering of demonstrations.

03

Enhanced in-context learning with minimal manual examples.

Abstract

It has been shown that Large Language Models' (LLMs) performance can be improved for many tasks using Chain of Thought (CoT) or In-Context Learning (ICL), which involve demonstrating the steps needed to solve a task using a few examples. However, while datasets with input-output pairs are relatively easy to produce, providing demonstrations which include intermediate steps requires cumbersome manual work. These steps may be executable programs, as in agentic flows, or step-by-step reasoning as in CoT. In this work, we propose Automatic Data Labeling and Refinement (ADLR), a method to automatically generate and filter demonstrations which include the above intermediate steps, starting from a small seed of manually crafted examples. We demonstrate the advantage of ADLR in code-based table QA and mathematical reasoning, achieving up to a 5.5% gain. The code implementing our method is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies