From Brittle to Robust: Improving LLM Annotations for SE Optimization
Lohith Senthilkumar, Tim Menzies

TL;DR
This paper introduces SynthCore, a novel ensemble prompting strategy for LLMs that improves annotation quality in high-dimensional multi-objective software engineering tasks, outperforming traditional methods without human input.
Contribution
SynthCore is a simple yet effective ensemble approach that combines multiple LLM prompts to enhance multi-objective task annotation, addressing blind spots in existing LLM labeling techniques.
Findings
SynthCore outperforms state-of-the-art optimization methods.
Ensemble approach effectively handles high-dimensional multi-objective tasks.
Achieved high-quality annotations without human labels.
Abstract
Software analytics often builds from labeled data. Labeling can be slow, error prone, and expensive. When human expertise is scarce, SE researchers sometimes ask large language models (LLMs) for the missing labels. While this has been successful in some domains, recent results show that LLM-based labeling has blind spots. Specifically, their labeling is not effective for higher dimensional multi-objective problems. To address this task, we propose a novel LLM prompting strategy called SynthCore. When one opinion fails, SynthCore's combines multiple separated opinions generated by LLMs (with no knowledge of each others' answers) into an ensemble of few-shot learners. Simpler than other strategies (e.g. chain-of-thought, multi-agent-debate, etc) SynthCore aggregates results from multiple single prompt sessions (with no crossover between them). SynthCore has been tested on 49 SE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Machine Learning and Data Classification
