Beyond Masks: The Case for Medical Image Parsing
Siddharth Gupta, Alan L. Yuille, Zongwei Zhou

TL;DR
The paper advocates for shifting medical imaging analysis from simple masks to comprehensive structured parsing that includes entities, attributes, and relationships, enabling richer understanding and prediction.
Contribution
It introduces the concept of medical image parsing as a central output, emphasizing decision, reconstruction, and prediction properties, and audits current systems against these criteria.
Findings
Current systems largely solve entity identification.
Attributes and relationships are underdeveloped in existing models.
No existing system produces a well-formed parse as defined.
Abstract
Medical imaging research has spent a decade getting very good at one thing: producing per-voxel masks. Masks tell us size, volume, and location, and a decade of clinical infrastructure rests on those outputs. Yet the report a radiologist writes contains almost nothing a mask can express. We argue that medical imaging research should adopt medical image parsing as its central output: a structured representation in which entities, attributes, and relationships are emitted together and mutually consistent. Entities are the named structures and findings, present or absent. Attributes describe those entities, capturing things like margin regularity, enhancement pattern, or severity grade. Relationships connect them, naming where one structure sits relative to another, what abuts what, and what has changed since the prior scan. A good parse satisfies three properties, in order: (1) decision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
