Ensemble Modeling for Multimodal Visual Action Recognition
Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah

TL;DR
This paper introduces an ensemble approach with a novel exponentially decaying focal loss for multimodal visual action recognition, effectively handling long-tailed data distributions and combining RGB and Depth modalities.
Contribution
It presents a new exponentially decaying focal loss and a late fusion strategy for multimodal action recognition, improving performance on long-tailed datasets.
Findings
Effective handling of long-tailed class distributions.
Improved accuracy with ensemble and late fusion.
Demonstrated success on MECCANO dataset.
Abstract
In this work, we propose an ensemble modeling approach for multimodal action recognition. We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset. Based on the underlying principle of focal loss, which captures the relationship between tail (scarce) classes and their prediction difficulties, we propose an exponentially decaying variant of focal loss for our current task. It initially emphasizes learning from the hard misclassified examples and gradually adapts to the entire range of examples in the dataset. This annealing process encourages the model to strike a balance between focusing on the sparse set of hard samples, while still leveraging the information provided by the easier ones. Additionally, we opt for the late fusion strategy to combine the resultant probability distributions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
MethodsFocal Loss · OPT
