LOCUS: A System and Method for Low-Cost Customization for Universal Specialization

Dhanasekar Sundararaman; Keying Li; Wayne Xiong; Aashna Garg

arXiv:2512.06239·cs.CL·December 9, 2025

LOCUS: A System and Method for Low-Cost Customization for Universal Specialization

Dhanasekar Sundararaman, Keying Li, Wayne Xiong, Aashna Garg

PDF

Open Access

TL;DR

LOCUS is a low-cost, efficient NLP pipeline that leverages few-shot data, retrieval, synthetic data, and parameter-efficient tuning to outperform larger models on NER and text classification tasks.

Contribution

Introduces LOCUS, a novel pipeline combining retrieval, synthetic data, and low-rank tuning for cost-effective NLP model customization with minimal data.

Findings

01

Outperforms strong baselines including GPT-4o on NER and TC benchmarks.

02

Achieves 99% of fully fine-tuned accuracy with only 5% of memory.

03

Uses less than 1% of GPT-4o's parameters while outperforming it.

Abstract

We present LOCUS (LOw-cost Customization for Universal Specialization), a pipeline that consumes few-shot data to streamline the construction and training of NLP models through targeted retrieval, synthetic data generation, and parameter-efficient tuning. With only a small number of labeled examples, LOCUS discovers pertinent data in a broad repository, synthesizes additional training samples via in-context data generation, and fine-tunes models using either full or low-rank (LoRA) parameter adaptation. Our approach targets named entity recognition (NER) and text classification (TC) benchmarks, consistently outperforming strong baselines (including GPT-4o) while substantially lowering costs and model sizes. Our resultant memory-optimized models retain 99% of fully fine-tuned accuracy while using barely 5% of the memory footprint, also beating GPT-4o on several benchmarks with less than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Machine Learning and Data Classification