Task-Adaptive Tokenization: Enhancing Long-Form Text Generation Efficacy in Mental Health and Beyond
Siyang Liu, Naihao Deng, Sahand Sabour, Yilin Jia, Minlie Huang, Rada, Mihalcea

TL;DR
This paper introduces task-adaptive tokenization, which customizes token segmentation for specific tasks, improving long-form text generation in mental health applications by reducing token usage and enhancing performance.
Contribution
The paper presents a novel task-adaptive tokenization method that samples variable segmentations and integrates task-specific tokens into pre-trained models, tailored for mental health text generation.
Findings
Significant performance improvements in psychological question-answering tasks.
Up to 60% reduction in token usage while maintaining quality.
Promising initial results with large language models.
Abstract
We propose task-adaptive tokenization as a way to adapt the generation pipeline to the specifics of a downstream task and enhance long-form generation in mental health. Inspired by insights from cognitive science, our task-adaptive tokenizer samples variable segmentations from multiple outcomes, with sampling probabilities optimized based on task-specific data. We introduce a strategy for building a specialized vocabulary and introduce a vocabulary merging protocol that allows for the integration of task-specific tokens into the pre-trained model's tokenization step. Through extensive experiments on psychological question-answering tasks in both Chinese and English, we find that our task-adaptive tokenization approach brings a significant improvement in generation performance while using up to 60% fewer tokens. Preliminary experiments point to promising results when using our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Topic Modeling · Machine Learning in Healthcare
