Seeing in the Dark: A Teacher-Student Framework for Dark Video Action Recognition via Knowledge Distillation and Contrastive Learning
Sharana Dharshikgan Suresh Dass, Hrishav Bakul Barua, Ganesh Krishnasamy, Raveendran Paramesran, Raphael C.-W. Phan

TL;DR
This paper introduces ActLumos, a teacher-student framework that leverages contrastive learning and knowledge distillation to improve dark video action recognition accuracy while maintaining single-stream inference efficiency.
Contribution
It proposes a novel teacher-student architecture with dynamic feature fusion and contrastive loss, achieving state-of-the-art results in dark video action recognition.
Findings
Student model achieves 96.92% Top-1 accuracy on ARID V1.0
Dynamic feature fusion outperforms static fusion methods
Two-view SSL surpasses spatial-only or temporal-only variants
Abstract
Action recognition in dark or low-light (under-exposed) videos is a challenging task due to visibility degradation, which can hinder critical spatiotemporal details. This paper proposes ActLumos, a teacher-student framework that attains single-stream inference while retaining multi-stream level accuracy. The teacher consumes dual stream inputs, which include original dark frames and retinex-enhanced frames, processed by weight-shared R(2+1)D-34 backbones and fused by a Dynamic Feature Fusion (DFF) module, which dynamically re-weights the two streams at each time step, emphasising the most informative temporal segments. The teacher is also included with a supervised contrastive loss (SupCon) that sharpens class margins. The student shares the R(2+1)D-34 backbone but uses only dark frames and no fusion at test time. The student is first pre-trained with self-supervision on dark clips of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications · Human Pose and Action Recognition
MethodsAttention Is All You Need · Adam · Softmax · Linear Warmup With Linear Decay · Dropout · Weight Decay · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Layer Normalization
