VideoCAD: A Dataset and Model for Learning Long-Horizon 3D CAD UI Interactions from Video
Brandon Man, Ghadi Nehme, Md Ferdous Alam, Faez Ahmed

TL;DR
VideoCAD introduces a large-scale synthetic dataset and a novel model for learning complex, long-horizon 3D CAD UI interactions from video, enabling advancements in AI-driven engineering tools and multimodal reasoning.
Contribution
The paper presents VideoCAD, the first extensive dataset for professional CAD UI interactions, and VideoCADFormer, a model that effectively learns from this data for long-horizon tasks.
Findings
VideoCAD contains over 41K annotated CAD interaction videos.
VideoCADFormer outperforms existing behavior cloning methods.
The dataset enables new benchmarks for spatial reasoning and UI understanding.
Abstract
Computer-Aided Design (CAD) is a time-consuming and complex process, requiring precise, long-horizon user interactions with intricate 3D interfaces. While recent advances in AI-driven user interface (UI) agents show promise, most existing datasets and methods focus on short, low-complexity tasks in mobile or web applications, failing to capture the demands of professional engineering tools. In this work, we introduce VideoCAD, the first attempt to model UI interactions for precision engineering tasks. Specifically, VideoCAD is a large-scale synthetic dataset consisting of over 41K annotated video recordings of CAD operations, generated using an automated framework for collecting high-fidelity UI action data from human-made CAD designs. Compared to existing datasets, VideoCAD offers an order-of-magnitude increase in complexity for real-world engineering UI tasks, with time horizons up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Software Engineering Research
MethodsFocus
