Lightweight Transformer Framework for Weakly Supervised Semantic Segmentation
Ali Torabi, Sanjog Gaihre, Yaqoob Majeed

TL;DR
This paper introduces CrispFormer, a lightweight decoder modification for weakly supervised semantic segmentation that improves boundary accuracy and noise resistance without heavy computation or backbone changes.
Contribution
CrispFormer proposes three small, synergistic decoder enhancements—boundary supervision, uncertainty-guided refinement, and dynamic multi-scale fusion—that significantly improve WSSS performance.
Findings
Improves boundary F-score and small-object recall
Enhances mIoU over baseline models
Adds minimal computational overhead
Abstract
Weakly supervised semantic segmentation (WSSS) must learn dense masks from noisy, under-specified cues. We revisit the SegFormer decoder and show that three small, synergistic changes make weak supervision markedly more effective-without altering the MiT backbone or relying on heavy post-processing. Our method, CrispFormer, augments the decoder with: (1) a boundary branch that supervises thin object contours using a lightweight edge head and a boundary-aware loss; (2) an uncertainty-guided refiner that predicts per-pixel aleatoric uncertainty and uses it to weight losses and gate a residual correction of the segmentation logits; and (3) a dynamic multi-scale fusion layer that replaces static concatenation with spatial softmax gating over multi-resolution features, optionally modulated by uncertainty. The result is a single-pass model that preserves crisp boundaries, selects appropriate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Generative Adversarial Networks and Image Synthesis
