CaLMFlow: Volterra Flow Matching using Causal Language Models
Sizhuang He, Daniel Levine, Ivan Vrkic, Marco Francesco Bressana,, David Zhang, Syed Asad Rizvi, Yangtian Zhang, Emanuele Zappala, David van, Dijk

TL;DR
CaLMFlow introduces a novel flow matching framework using causal language models, formulating it as a Volterra integral equation, which enhances the modeling of complex, high-dimensional data with improved scalability and context-awareness.
Contribution
It presents a new approach that leverages large language models for flow matching by casting it as a Volterra integral equation, bridging discrete language modeling with continuous data generation.
Findings
Outperforms ODE solver-dependent methods like CFM.
Effective on synthetic and real-world data, including single-cell perturbation prediction.
Incorporates textual context and generalizes to unseen conditions.
Abstract
We introduce CaLMFlow (Causal Language Models for Flow Matching), a novel framework that casts flow matching as a Volterra integral equation (VIE), leveraging the power of large language models (LLMs) for continuous data generation. CaLMFlow enables the direct application of LLMs to learn complex flows by formulating flow matching as a sequence modeling task, bridging discrete language modeling and continuous generative modeling. Our method implements tokenization across space and time, thereby solving a VIE over these domains. This approach enables efficient handling of high-dimensional data and outperforms ODE solver-dependent methods like conditional flow matching (CFM). We demonstrate CaLMFlow's effectiveness on synthetic and real-world data, including single-cell perturbation response prediction, showcasing its ability to incorporate textual context and generalize to unseen…
Peer Reviews
Decision·Submitted to ICLR 2025
It is only natural to extend any of these iterative refinement based approaches to learning to sample from a complex distribution to make each refinement step less local and more global. Although it is natural, it has been challenging to do so until recently, as it was not clear whether we can build a powerful neural net that can take as input a long sequence of a trajectory and use it properly. With the recent advances in language models, this doubt is no more, and the authors in this paper dem
Unfortunately the current manuscript is extremely difficult to read. One reason i can point out is due to the lack of clear exposition on how this underlying autoregressive model looks like; what does it take as input, and how the prefix is processed to result in the prediction of the next step’s refined observation? Furthermore, the authors’ use of the term “tokenization” confused me quite a lot, as “tokenization” is often used to refer to as the process by which we quantize a continuous observ
The technical idea is original and natural, blending LLMs into the framework of VIE flow matching. Extensive experiments were carried out on a range of tasks, covering synthetic and real-world data. The analysis and ablation studies also show the importance of each component/technique in the method, providing insights for future improvements. The paper is generally clear.
The method seems general but the only kind of real data in the experiments are single-cell data. Without further evidence, it is hard to judge whether the significance will be high or not, broad or not. The writing has some typesetting issues. E.g., - line-213, $T$ tokens - line-219, $D_{text{in}}$
The writing is clear and accessible. The idea is easy to understand. Experiments are conducted on both synthetic and real data, enhancing the validity of the results.
Trivial Task: The paper focuses on continuous data modeling, which may not present a sufficiently challenging or novel task. The modeling of Gaussian distributions and the MNIST dataset appears trivial, and many existing multimodal methods can already handle cell data modeling effectively. Mismatched Motivation and Experiment: Although the paper initially highlights the difficulty of solving ODEs, the experiments focus only on continuous data modeling and do not fully support or address the sta
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Time Series Analysis and Forecasting · Data Stream Mining Techniques
