BridgeTA: Bridging the Representation Gap in Knowledge Distillation via Teacher Assistant for Bird's Eye View Map Segmentation
Beomjun Kim, Suhan Woo, Sejong Heo, Euntai Kim

TL;DR
BridgeTA introduces a lightweight Teacher Assistant network to effectively bridge the representation gap in knowledge distillation for Bird's Eye View map segmentation, improving accuracy without increasing inference costs.
Contribution
It proposes a novel cost-effective distillation framework using a Teacher Assistant network and a theoretical loss grounded in Young's Inequality to enhance knowledge transfer.
Findings
Achieves 4.2% mIoU improvement over baseline
Up to 45% better than other KD methods
Effective on the nuScenes dataset
Abstract
Bird's-Eye-View (BEV) map segmentation is one of the most important and challenging tasks in autonomous driving. Camera-only approaches have drawn attention as cost-effective alternatives to LiDAR, but they still fall behind LiDAR-Camera (LC) fusion-based methods. Knowledge Distillation (KD) has been explored to narrow this gap, but existing methods mainly enlarge the student model by mimicking the teacher's architecture, leading to higher inference cost. To address this issue, we introduce BridgeTA, a cost-effective distillation framework to bridge the representation gap between LC fusion and Camera-only models through a Teacher Assistant (TA) network while keeping the student's architecture and inference cost unchanged. A lightweight TA network combines the BEV representations of the teacher and student, creating a shared latent space that serves as an intermediate representation. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
