TC-SSA: Token Compression via Semantic Slot Aggregation for Gigapixel Pathology Reasoning
Zhuo Chen, Shawn Young, Lijian Xu

TL;DR
This paper introduces TC-SSA, a learnable token compression method that aggregates patch features into semantic slots, enabling efficient and accurate gigapixel pathology analysis with significantly reduced tokens.
Contribution
The paper presents a novel semantic slot aggregation framework that effectively compresses visual tokens in large pathology images, maintaining diagnostic accuracy while reducing computational load.
Findings
Achieves 78.34% overall accuracy on SlideBench with only 1.7% of original tokens.
Outperforms sampling-based methods under similar token budgets.
Generalizes well to multiple MIL classification tasks with high AUC scores.
Abstract
The application of large vision-language models to computational pathology holds great promise for diagnostic assistants but faces a critical computational bottleneck: the gigapixel scale of Whole Slide Images (WSIs). A single WSI typically contains over 105 patches, creating sequence lengths that exceed the constraints of standard Transformer architectures. Existing solutions often resort to spatial sampling, which risks discarding diagnostically critical evidence. To address this, we propose TC-SSA (Token Compression via Semantic Slot Aggregation), a learnable token compression framework that aggregates patch features into a fixed number of semantic slots. A gated routing module assigns patches to slots using sparse Top-2 routing, followed by weighted aggregation, enabling global slide coverage under a strict token budget. The resulting representation retains diagnostically relevant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Multimodal Machine Learning Applications · Face recognition and analysis
