Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement
Joseph Shtok, Amit Alfassy, Foad Abo Dahood, Eliyahu Schwartz, Sivan, Doveh, Assaf Arbelle

TL;DR
This paper introduces ADLR, a method that automatically generates and refines intermediate-step demonstrations to enhance in-context learning in LLMs, significantly improving performance in code-based table QA and mathematical reasoning tasks.
Contribution
The paper presents a novel automatic data labeling and refinement technique to generate intermediate demonstrations, reducing manual effort and improving LLM performance.
Findings
Achieved up to 5.5% performance gain in code-based table QA.
Effective automatic generation and filtering of demonstrations.
Enhanced in-context learning with minimal manual examples.
Abstract
It has been shown that Large Language Models' (LLMs) performance can be improved for many tasks using Chain of Thought (CoT) or In-Context Learning (ICL), which involve demonstrating the steps needed to solve a task using a few examples. However, while datasets with input-output pairs are relatively easy to produce, providing demonstrations which include intermediate steps requires cumbersome manual work. These steps may be executable programs, as in agentic flows, or step-by-step reasoning as in CoT. In this work, we propose Automatic Data Labeling and Refinement (ADLR), a method to automatically generate and filter demonstrations which include the above intermediate steps, starting from a small seed of manually crafted examples. We demonstrate the advantage of ADLR in code-based table QA and mathematical reasoning, achieving up to a 5.5% gain. The code implementing our method is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
