COLLIE: Systematic Construction of Constrained Text Generation Tasks
Shunyu Yao, Howard Chen, Austin W. Hanjie, Runzhe Yang, Karthik, Narasimhan

TL;DR
COLLIE introduces a grammar-based framework for constructing diverse constrained text generation tasks, enabling systematic evaluation of language models on complex, compositional constraints beyond simple keyword inclusion.
Contribution
The paper presents COLLIE, a flexible, extensible framework and dataset for constrained text generation with rich, compositional constraints, addressing limitations of existing benchmarks.
Findings
State-of-the-art models struggle with complex constraints.
COLLIE-v1 dataset contains 2080 instances with diverse constraints.
Analysis reveals significant performance gaps on challenging constraints.
Abstract
Text generation under constraints have seen increasing interests in natural language processing, especially with the rapidly improving capabilities of large language models. However, existing benchmarks for constrained generation usually focus on fixed constraint types (e.g.,generate a sentence containing certain words) that have proved to be easy for state-of-the-art models like GPT-4. We present COLLIE, a grammar-based framework that allows the specification of rich, compositional constraints with diverse generation levels (word, sentence, paragraph, passage) and modeling challenges (e.g.,language understanding, logical reasoning, counting, semantic planning). We also develop tools for automatic extraction of task instances given a constraint structure and a raw text corpus. Using COLLIE, we compile the COLLIE-v1 dataset with 2080 instances comprising 13 constraint structures. We…
Peer Reviews
Decision·ICLR 2024 poster
1. Well written paper with evaluation on competitive LLM baselines. 2. Combination of rule based and neural based generation enables NLP grounded generations 3. Open-sourcing of code and the related dataset for promoting further research. 4. Comprehensive analysis to highlight the shortcoming of current LLMs that needs to be addressed.
1. Some important details for instruction rendering should be moved to the main paper. 2. The paper mentions that the technique can be used for constraining words, word blacklisting, however a qualitative analysis is missing for the same in the current version.
1. The authors evaluated different off-the-shelf LLMs and showed that they don't fully solve this task.
1. The idea is not very novel, and similar ideas have already been explored. For example, [1] also constructed a dataset using similar constraints for instruction fine-tuning. 2. The dataset may not be very useful. Specifically, because the rules are too vague/arbitrary, the extracted ground truth is not useful for the evaluation process: the authors only use them for comparing fluency. In addition, since the rules can be arbitrarily designed, this compiled dataset does not hold much value, beca
1. This paper provides a method which allows future work to construct data of their interest in a scalable manner. 2. The analyses that the this paper conducts provide insights to researchers who are focusing on developing LLMs with better logical, reasoning, and compositional capacities. 3. This paper is generally well-written and easy to follow.
There is no great weaknesses that I can find - but there is a minor one: Although I understand that the authors are focusing on more "basic" units, such as tokens, sentence, etc, so this paper can be more practical and useful for downstream applications, such as pretraining and evaluating LLMs, most of current work on constrained text generation seem to focus on text summarization, including controllable text summarization (e.g., MACSum Zhang et al, 2023). However this paper does not mention an
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Adam · Layer Normalization
