Continual Multimodal Egocentric Activity Recognition via Modality-Aware Novel Detection
Wonseon Lim, Hyejeong Im, Dae-Won Kim

TL;DR
This paper introduces MAND, a novel framework for open-world continual learning in egocentric activity recognition that effectively detects novel activities by leveraging modality-aware scoring and stabilizing modality-specific features.
Contribution
The paper proposes MAND, a modality-aware framework that improves novelty detection and knowledge retention in multimodal egocentric continual learning.
Findings
MAND improves novel activity detection AUC by up to 10%.
MAND increases known-class classification accuracy by up to 2.8%.
The approach effectively exploits multimodal cues and mitigates catastrophic forgetting.
Abstract
Multimodal egocentric activity recognition integrates visual and inertial cues for robust first-person behavior understanding. However, deploying such systems in open-world environments requires detecting novel activities while continuously learning from non-stationary streams. Existing methods rely on the main logits for novelty scoring, without fully exploiting the complementary evidence available from individual modalities. Because these logits are often dominated by RGB, cues from other modalities, particularly IMU, remain underutilized, and this imbalance worsens over time under catastrophic forgetting. To address this, we propose MAND, a modality-aware framework for multimodal egocentric open-world continual learning. At inference, Modality-aware Adaptive Scoring (MoAS) estimates sample-wise modality reliability from energy scores and adaptively integrates modality logits to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Multimodal Machine Learning Applications
