TempT: Temporal consistency for Test-time adaptation
Onur Cezmi Mutlu, Mohammadmahdi Honarmand, Saimourya Surabhi, Dennis, P. Wall

TL;DR
TempT leverages temporal coherence in videos to improve test-time adaptation, demonstrating competitive results in facial expression recognition using a simple 2D CNN backbone.
Contribution
Introduces TempT, a novel test-time adaptation method utilizing temporal consistency as self-supervision for video analysis tasks.
Findings
TempT achieves competitive performance on AffWild2 dataset.
TempT effectively uses a simple 2D CNN backbone.
Preliminary results validate TempT's potential for real-world applications.
Abstract
We introduce Temporal consistency for Test-time adaptation (TempT) a novel method for test-time adaptation on videos through the use of temporal coherence of predictions across sequential frames as a self-supervision signal. TempT is an approach with broad potential applications in computer vision tasks including facial expression recognition (FER) in videos. We evaluate TempT performance on the AffWild2 dataset. Our approach focuses solely on the unimodal visual aspect of the data and utilizes a popular 2D CNN backbone in contrast to larger sequential or attention-based models used in other approaches. Our preliminary experimental results demonstrate that TempT has competitive performance compared to the previous years reported performances and its efficacy provides a compelling proof-of-concept for its use in various real-world applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Image Enhancement Techniques
