Ensemble Modeling for Multimodal Visual Action Recognition

Jyoti Kini; Sarah Fleischer; Ishan Dave; Mubarak Shah

arXiv:2308.05430·cs.CV·September 26, 2023

Ensemble Modeling for Multimodal Visual Action Recognition

Jyoti Kini, Sarah Fleischer, Ishan Dave, Mubarak Shah

PDF

Open Access 1 Repo

TL;DR

This paper introduces an ensemble approach with a novel exponentially decaying focal loss for multimodal visual action recognition, effectively handling long-tailed data distributions and combining RGB and Depth modalities.

Contribution

It presents a new exponentially decaying focal loss and a late fusion strategy for multimodal action recognition, improving performance on long-tailed datasets.

Findings

01

Effective handling of long-tailed class distributions.

02

Improved accuracy with ensemble and late fusion.

03

Demonstrated success on MECCANO dataset.

Abstract

In this work, we propose an ensemble modeling approach for multimodal action recognition. We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset. Based on the underlying principle of focal loss, which captures the relationship between tail (scarce) classes and their prediction difficulties, we propose an exponentially decaying variant of focal loss for our current task. It initially emphasizes learning from the hard misclassified examples and gradually adapts to the entire range of examples in the dataset. This annealing process encourages the model to strike a balance between focusing on the sparse set of hard samples, while still leveraging the information provided by the easier ones. Additionally, we opt for the late fusion strategy to combine the resultant probability distributions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jkini/Meccano
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods

MethodsFocal Loss · OPT