CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions
Yuliang Zhan, Jian Li, Wenbing Huang, Wenbing Huang, Yang Liu, Hao Sun

TL;DR
This paper introduces CloDS, an unsupervised framework for learning cloth dynamics solely from multi-view visual observations, capable of handling unknown conditions and severe deformations without prior physical knowledge.
Contribution
CloDS is a novel unsupervised learning framework that models cloth dynamics from visual data using a three-stage pipeline with mesh-based Gaussian splatting and dual-position opacity modulation.
Findings
CloDS effectively learns cloth dynamics from visual data.
The method generalizes well to unseen configurations.
It handles large deformations and self-occlusions robustly.
Abstract
Deep learning has demonstrated remarkable capabilities in simulating complex dynamic systems. However, existing methods require known physical properties as supervision or inputs, limiting their applicability under unknown conditions. To explore this challenge, we introduce Cloth Dynamics Grounding (CDG), a novel scenario for unsupervised learning of cloth dynamics from multi-view visual observations. We further propose Cloth Dynamics Splatting (CloDS), an unsupervised dynamic learning framework designed for CDG. CloDS adopts a three-stage pipeline that first performs video-to-geometry grounding and then trains a dynamics model on the grounded meshes. To cope with large non-linear deformations and severe self-occlusions during grounding, we introduce a dual-position opacity modulation that supports bidirectional mapping between 2D observations and 3D geometry via mesh-based Gaussian…
Peer Reviews
Decision·ICLR 2026 Poster
- The use of Spatial Mapping Gaussian Splatting to establish a mapping between the 2D pixel space and 3D space is interesting. SMGS handles large deformations and severe self-occlusion by using both relative and absolute positions of the Gaussian components. This design ensures an accurate mapping between the 2D and 3D spaces during rendering.
- The visual results are shown under wind force, it would have been interesting to see cloth dynamics under various type snd source of forces e.g. objects colliding with cloth. How to model them inside the current framework. - A detailed analysis on cloth-cloth collision, cloth-object collision is missing. - Do add following relevant references under neural garment simulator GarSim: Particle Based Neural Garment Simulator WACV 2023 and GenSim: Unsupervised Generic Garment Simulator CVPR 2023
* The method learns cloth dynamics directly from video. * The method achieves performance comparable to approaches trained on ground-truth mesh data.
1. Clarity and consistency. The writing is unnecessarily complex. If I understand correctly, a simple and clear description would be: learn dynamics directly from videos by first performing video-to-geometry grounding, then training a dynamics model on the grounded meshes. Also, there appears to be a typo/inconsistency: Equations (7) and (9) for geometry should take the same input parameters; please verify and correct. 2. Related work coverage (missing citations). Given the focus on data-driven,
- The paper clearly defines a new and challenging problem, Cloth Dynamics Grounding, which focuses on unsupervised learning of cloth dynamics solely from visual data. - The introduction of Spatial Mapping Gaussian Splatting, a mesh-based Gaussian splatting module, provides a differentiable mapping between 2D pixel space and 3D geometry. The proposed dual-position opacity modulation in SMGS is a clever solution to address severe self-occlusions and large non-linear deformations inherent to cloth
- The method assumes an initial mesh state ($M_1$) is available to build the initial Gaussian component representation. Although robustness to initial mesh errors is analyzed, I still suggest that some visual results should be prepared and presented to incorporate the results reported in FigureS.2. - Performance degrades under complex lighting conditions due to temporal inconsistency caused by shadows and illumination, suggesting the current approach is sensitive to visual changes beyond pure g
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis
