$\beta$-Multivariational Autoencoder for Entangled Representation Learning in Video Frames
Fatemeh Nouri, Robert Bergevin

TL;DR
This paper introduces $eta$MVAE and $eta$MVUnet, novel models for learning entangled multivariate Gaussian priors from video frames to improve object tracking and segmentation accuracy.
Contribution
The paper proposes the $eta$MVAE and $eta$MVUnet models that learn complex entangled priors directly from video data, enhancing object tracking and segmentation.
Findings
$eta$MVUnet improves posterior estimation accuracy.
$eta$MVUnet enhances segmentation performance.
Models trained on large-scale video datasets.
Abstract
It is crucial to choose actions from an appropriate distribution while learning a sequential decision-making process in which a set of actions is expected given the states and previous reward. Yet, if there are more than two latent variables and every two variables have a covariance value, learning a known prior from data becomes challenging. Because when the data are big and diverse, many posterior estimate methods experience posterior collapse. In this paper, we propose the -Multivariational Autoencoder (MVAE) to learn a Multivariate Gaussian prior from video frames for use as part of a single object-tracking in form of a decision-making process. We present a novel formulation for object motion in videos with a set of dependent parameters to address a single object-tracking task. The true values of the motion parameters are obtained through data analysis on the training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
MethodsTest · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Max Pooling · Concatenated Skip Connection · U-Net
