Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers

Daniel D'souza; Julia Kreutzer; Adrien Morisot; Ahmet \"Ust\"un; Sara Hooker

arXiv:2506.14702·cs.CL·June 18, 2025

Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers

Daniel D'souza, Julia Kreutzer, Adrien Morisot, Ahmet \"Ust\"un, Sara Hooker

PDF

Open Access 1 Video

TL;DR

This paper introduces a training protocol that explicitly incorporates data and task markers to improve model performance on long-tail, underrepresented use cases, offering better control and significant gains in specialized domains.

Contribution

It presents a novel training approach using explicit markers for data and task characteristics, enhancing controllability and performance on rare and underrepresented tasks.

Findings

01

Average 5.7% improvement in open-ended generation quality.

02

Over 9.1% gains in underrepresented domains.

03

Up to 14.1% relative lift on specialized tasks.

Abstract

One of the most profound challenges of modern machine learning is performing well on the long-tail of rare and underrepresented features. Large general-purpose models are trained for many tasks, but work best on high-frequency use cases. After training, it is hard to adapt a model to perform well on specific use cases underrepresented in the training corpus. Relying on prompt engineering or few-shot examples to maximize the output quality on a particular test case can be frustrating, as models can be highly sensitive to small changes, react in unpredicted ways or rely on a fixed system prompt for maintaining performance. In this work, we ask: "Can we optimize our training protocols to both improve controllability and performance on underrepresented use cases at inference time?" We revisit the divide between training and inference techniques to improve long-tail performance while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers· slideslive

Taxonomy

TopicsStock Market Forecasting Methods · Time Series Analysis and Forecasting

MethodsSparse Evolutionary Training · Balanced Selection