Summer-22B: A Systematic Approach to Dataset Engineering and Training at Scale for Video Foundation Model
Simo Ryu, Chunghwan Han

TL;DR
This paper details the comprehensive process of developing and training Summer-22B, a large-scale video foundation model, emphasizing dataset engineering, architectural choices, and lessons learned from scaling to 50 million clips.
Contribution
It introduces a systematic approach to dataset curation, model training, and engineering challenges for large-scale video models, including new tools like Lavender Data system.
Findings
Dataset engineering was the most resource-intensive step.
Architectural variants had limited impact on performance.
$mbda$P hyperparameter transfer was effective under constraints.
Abstract
We describe our experience training Summer-22B, a video foundation model developed from scratch. This report documents the engineering challenges, design decisions, and lessons learned while scaling from raw footage collection to a functional model trained on approximately 50 million clips. We outline our approach combining metadata-driven dataset curation, multi-stage filtering, P parameterization, and hypersphere-constrained optimization. We developed the Lavender Data system for dataset management and adopted inference-aware architectural choices. We share observations on what worked in our setting: dataset engineering consumed the majority of effort, architectural variants showed smaller differences than we expected, and P hyperparameter transfer appeared effective even under geometric constraints. We hope this account proves useful to others undertaking similar projects.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Cell Image Analysis Techniques
