FreeChunker: A Cross-Granularity Chunking Framework
Wenxuan Zhang, Yuan-Hao Jiang, Yang Cao, Yonghe Wu

TL;DR
FreeChunker introduces a flexible, cross-granularity encoding framework for chunking in RAG systems, improving adaptability and efficiency by treating sentences as atomic units and supporting arbitrary combinations.
Contribution
It shifts from static boundary-based chunking to a flexible retrieval approach that treats sentences as atomic units, reducing computational overhead and enhancing query adaptability.
Findings
Outperforms existing chunking methods in retrieval accuracy.
Reduces computational overhead for semantic boundary detection.
Enhances adaptability to complex queries.
Abstract
Chunking strategies significantly impact the effectiveness of Retrieval-Augmented Generation (RAG) systems. Existing methods operate within fixed-granularity paradigms that rely on static boundary identification, limiting their adaptability to diverse query requirements. This paper presents FreeChunker, a Cross-Granularity Encoding Framework that fundamentally transforms the traditional chunking paradigm: the framework treats sentences as atomic units and shifts from static chunk segmentation to flexible retrieval supporting arbitrary sentence combinations. This paradigm shift not only significantly avoids the computational overhead required for semantic boundary detection, but also enhances adaptability to complex queries. Experimental evaluation on LongBench V2 demonstrates that FreeChunker possesses significant advantages in both retrieval performance and time efficiency compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Natural Language Processing Techniques
