DuoTeach: Dual Role Self-Teaching for Coarse-to-Fine Decision Coordination in Vision--Language Models
Wei Yang, Yiran Zhu, Zilin Li, Xunjia Zhang, Jun Xia, Hongtao Wang

TL;DR
This paper introduces DuoTeach, a self-distillation framework that enhances vision-language models' ability to make coherent, multi-level taxonomy decisions in a coarse-to-fine manner, significantly improving accuracy and zero-shot generalization.
Contribution
DuoTeach is a novel dual-role self-teaching distillation method that improves cross-level decision coordination in vision-language models without requiring ground-truth labels.
Findings
Up to 30.24-point improvement in DWPA on benchmarks.
Zero-shot performance on unseen taxonomies increased from 17.17% to 43.66%.
Enhanced within-call multi-level decision coordination.
Abstract
Coarse-to-fine path decision-making requires predicting a valid taxonomy path in which earlier decisions constrain later ones. However, existing benchmarks score each level independently, obscuring cross-level validity and consistency. To better align evaluation with this setting, we introduce a Joint Path Decision (JPD) protocol that requires predicting the full path in one call, together with Depth-Weighted Prefix Accuracy (DWPA), a metric family that measures path reliability with tunable emphasis on deeper levels. Under JPD, strong vision-language models (VLMs) frequently produce invalid parent-child pairs and brittle full-path predictions, suggesting that their failures stem not only from incomplete taxonomic knowledge but also from unstable cross-level decision coordination. To address this problem, we propose DuoTeach, a dual-role self-teaching distillation framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
