Deep kernel video approximation for unsupervised action segmentation
Silvia L. Pintea, Jouke Dijkstra

TL;DR
This paper introduces a novel unsupervised video action segmentation method using deep kernel space approximation with neural tangent kernels and maximum mean discrepancy, achieving competitive results on standard benchmarks.
Contribution
It proposes a new approach leveraging deep kernel space and NTKs for unsupervised action segmentation, improving reliability and efficiency over existing methods.
Findings
Achieves competitive results on six standard benchmarks.
Outperforms prior agglomerative methods when segment count is unknown.
Uses MMD with NTKs for more reliable and faster distribution approximation.
Abstract
This work focuses on per-video unsupervised action segmentation, which is of interest to applications where storing large datasets is either not possible, or nor permitted. We propose to segment videos by learning in deep kernel space, to approximate the underlying frame distribution, as closely as possible. To define this closeness metric between the original video distribution and its approximation, we rely on maximum mean discrepancy (MMD) which is a geometry-preserving metric in distribution space, and thus gives more reliable estimates. Moreover, unlike the commonly used optimal transport metric, MMD is both easier to optimize, and faster. We choose to use neural tangent kernels (NTKs) to define the kernel space where MMD operates, because of their improved descriptive power as opposed to fixed kernels. And, also, because NTKs sidestep the trivial solution, when jointly learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
