TL;DR
DermAgent is a multi-tool, self-reflective system that enhances dermatological image analysis by integrating domain-specific reasoning, external evidence retrieval, and self-correction, significantly outperforming existing models.
Contribution
It introduces DermAgent, a novel collaborative multi-tool agent with a Plan-Execute-Reflect framework for improved dermatological diagnosis and reasoning.
Findings
Outperforms state-of-the-art models in skin disease diagnosis accuracy by 17.6%.
Achieves superior results in concept annotation and clinical captioning tasks.
Demonstrates effective self-correction and evidence anchoring in dermatological analysis.
Abstract
Dermatological diagnosis requires integrating fine-grained visual perception with expert clinical knowledge. Although Multimodal Large Language Models (MLLMs) facilitate interactive medical image analysis, their application in dermatology is hindered by insufficient domain-specific grounding and hallucinations. To address these issues, we propose DermAgent, a collaborative multi-tool agent that orchestrates seven specialized vision and language modules within a Plan-Execute-Reflect framework. DermAgent delivers stepwise, traceable diagnostic reasoning through three core components. First, it employs complementary visual perception tools for comprehensive morphological description, dermoscopic concept annotation, and disease diagnosis. Second, to overcome the lack of domain prior, a dual-modality retrieval module anchors every prediction in external evidence by cross-referencing 413,210…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
