LingGen: Scalable Multi-Attribute Linguistic Control via Power-Law Masking
Mohamed Elgaar, Hadi Amiri

TL;DR
LingGen is a scalable controlled text generation model that enables fine-grained multi-attribute control using a novel Pareto-based masking technique and BOS embedding injection, achieving high accuracy and fluency.
Contribution
It introduces P-MASKING with Pareto distribution sampling and BOS embedding injection for robust multi-attribute control in language models.
Findings
Achieves lowest control error across 1-40 attributes
Maintains high fluency scores in human evaluations
Efficient inference with scalable attribute control
Abstract
We present LingGen, a controlled text generation model that allows fine-grained control over a large number of real-valued linguistic attributes. It encodes target attribute values with a dedicated linguistic attribute encoder and conditions the language model by injecting the resulting representation into the language model using the beginning-of-sequence (BOS) embeddings. To improve robustness when controlling different attribute subsets, we introduce P-MASKING, which samples per-example attribute masking rates from a truncated Pareto distribution during training. Across 1-40 control attributes, LingGen achieves the lowest average control error among evaluated methods, while remaining efficient at inference and receiving the highest fluency scores in human evaluation. Ablations show that Pareto-sampled masking and BOS-based injection are effective choices compared to alternative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Reservoir Computing · Power System Optimization and Stability · Neural Networks and Applications
MethodsBalanced Selection
