$\beta$-Multivariational Autoencoder for Entangled Representation   Learning in Video Frames

Fatemeh Nouri; Robert Bergevin

arXiv:2211.12627·cs.CV·November 24, 2022

$\beta$-Multivariational Autoencoder for Entangled Representation Learning in Video Frames

Fatemeh Nouri, Robert Bergevin

PDF

Open Access 1 Repo

TL;DR

This paper introduces $eta$MVAE and $eta$MVUnet, novel models for learning entangled multivariate Gaussian priors from video frames to improve object tracking and segmentation accuracy.

Contribution

The paper proposes the $eta$MVAE and $eta$MVUnet models that learn complex entangled priors directly from video data, enhancing object tracking and segmentation.

Findings

01

$eta$MVUnet improves posterior estimation accuracy.

02

$eta$MVUnet enhances segmentation performance.

03

Models trained on large-scale video datasets.

Abstract

It is crucial to choose actions from an appropriate distribution while learning a sequential decision-making process in which a set of actions is expected given the states and previous reward. Yet, if there are more than two latent variables and every two variables have a covariance value, learning a known prior from data becomes challenging. Because when the data are big and diverse, many posterior estimate methods experience posterior collapse. In this paper, we propose the $β$ -Multivariational Autoencoder ( $β$ MVAE) to learn a Multivariate Gaussian prior from video frames for use as part of a single object-tracking in form of a decision-making process. We present a novel formulation for object motion in videos with a set of dependent parameters to address a single object-tracking task. The true values of the motion parameters are obtained through data analysis on the training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fatemehN/entangled_representation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods

MethodsTest · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Max Pooling · Concatenated Skip Connection · U-Net