High-Performance Self-Supervised Learning by Joint Training of Flow Matching
Kosuke Ukita, Tsuyoshi Okita

TL;DR
This paper introduces FlowFM, a joint training framework combining a representation encoder and a flow matching generator, achieving high-quality data generation and recognition with improved efficiency and speed for self-supervised learning.
Contribution
The paper presents a novel FlowFM model that jointly trains a flow matching generator and encoder, enhancing SSL performance and efficiency compared to diffusion-based methods.
Findings
FlowFM reduces training time by 50.4% over diffusion models.
FlowFM outperforms state-of-the-art SSL methods on wearable sensor datasets.
FlowFM achieves up to 51.0x inference speedup while maintaining high quality.
Abstract
Diffusion models can learn rich representations during data generation, showing potential for Self-Supervised Learning (SSL), but they face a trade-off between generative quality and discriminative performance. Their iterative sampling also incurs substantial computational and energy costs, hindering industrial and edge AI applications. To address these issues, we propose the Flow Matching-based Foundation Model (FlowFM), which jointly trains a representation encoder and a conditional flow matching generator. This decoupled design achieves both high-fidelity generation and effective recognition. By using flow matching to learn a simpler velocity field, FlowFM accelerates and stabilizes training, improving its efficiency for representation learning. Experiments on wearable sensor data show FlowFM reduces training time by 50.4\% compared to a diffusion-based approach. On downstream tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics
