Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into   Multimodal LLMs

Dongxing Yu

arXiv:2505.04637·cs.CL·May 9, 2025

Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs

Dongxing Yu

PDF

TL;DR

This paper introduces a dynamic tokenization framework inspired by human cognitive chunking, significantly improving multimodal model performance and alignment with human processing patterns.

Contribution

It proposes a novel adaptive tokenization method that incorporates hierarchical and context-sensitive boundaries, bridging cognitive science insights with AI model design.

Findings

01

7.8% improvement on Visual Question Answering

02

5.3% improvement on Complex Scene Description

03

More human-like error and attention patterns

Abstract

Recent advancements in multimodal large language models (MLLMs) have demonstrated remarkable capabilities in processing diverse data types, yet significant disparities persist between human cognitive processes and computational approaches to multimodal information integration. This research presents a systematic investigation into the parallels between human cross-modal chunking mechanisms and token representation methodologies in MLLMs. Through empirical studies comparing human performance patterns with model behaviors across visual-linguistic tasks, we demonstrate that conventional static tokenization schemes fundamentally constrain current models' capacity to simulate the dynamic, context-sensitive nature of human information processing. We propose a novel framework for dynamic cross-modal tokenization that incorporates adaptive boundaries, hierarchical representations, and alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need