TL;DR
This paper introduces HistoSelect, a tissue-aware, question-guided retrieval framework for pathology slide analysis that improves accuracy and efficiency by focusing on relevant tissue regions and informative patches.
Contribution
HistoSelect mimics human tissue examination by selectively retrieving relevant regions and patches, reducing visual tokens by 70% and enhancing pathology question-answering accuracy.
Findings
Reduces visual token usage by 70% on average.
Outperforms existing methods on three pathology QA tasks.
Produces answers grounded in interpretable, pathologist-consistent regions.
Abstract
Computational pathology has advanced rapidly in recent years, driven by domain-specific image encoders and growing interest in using vision-language models to answer natural-language questions about diseases. Yet, the core problem behind pathology question-answering remains unsolved, considering that a gigapixel slide contains far more information than necessary for a given question. Pathologists naturally navigate tissue and morphology complexity by scanning broadly, and zooming in selectively according to the clinical questions. Current models, in contrast, rely on uniform patch sampling or broad attention maps, often attending equally to irrelevant regions while overlooking key visual evidence. In this work, we try to bring models closer to how humans actually examine slides. We propose a question-guided, tissue-aware, and coarse-to-fine retrieval framework, HistoSelect, that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
