Cliqueformer: Model-Based Optimization with Structured Transformers
Jakub Grudzien Kuba, Pieter Abbeel, Sergey Levine

TL;DR
Cliqueformer is a transformer-based model that learns the structure of black-box functions using functional graphical models, improving offline model-based optimization in design tasks like protein engineering and materials discovery.
Contribution
It introduces Cliqueformer, a novel architecture that captures function structure via FGMs, enhancing optimization without conservative methods.
Findings
Outperforms existing MBO algorithms across multiple domains.
Effectively addresses distribution shift in design tasks.
Demonstrates superior results in chemical and genetic design problems.
Abstract
Large neural networks excel at prediction tasks, but their application to design problems, such as protein engineering or materials discovery, requires solving offline model-based optimization (MBO) problems. While predictive models may not directly translate to effective design, recent MBO algorithms incorporate reinforcement learning and generative modeling approaches. Meanwhile, theoretical work suggests that exploiting the target function's structure can enhance MBO performance. We present Cliqueformer, a transformer-based architecture that learns the black-box function's structure through functional graphical models (FGM), addressing distribution shift without relying on explicit conservative approaches. Across various domains, including chemical and genetic design tasks, Cliqueformer demonstrates superior performance compared to existing methods.
Peer Reviews
Decision·Submitted to ICLR 2025
- Cliqueformer effectively addresses the limitations of FGM by eliminating the need for explicit graph discovery steps, while still maintaining the scalability advantages of MBO with FGM. - The proposed method is carefully designed based on theoretical observations - It demonstrates significant performance improvements over COMs across a wide variety of tasks. - well-written with clear motivation
While the paper makes significant contributions, some claims might be somewhat overstated: - The authors state that "we consistently obtained good performance by setting the clique size to $ d_{\text{clique}} = 3 $." However, the experimental results suggest that the choice of $ d_{\text{clique}} $ or $ N_{\text{clique}} $ significantly affects performance. For instance, Figure 5(b) shows that only $ N=4 $ achieves better performance than $ N=1 $ (assuming $ N=1 $ represents the whole graph), a
The paper presents a novel method for offline model-based optimization. The idea of learning a latent space and imposing FGM on it is interesting. I also like the simplicity of the model and the clear writing of the paper. The empirical performance of the proposed method is strong against different baselines.
- Looking at Figure 5, the performance varies significantly with different values of cliques, and there is not a universal value across different datasets. How did the authors pick the value in an offline setting where you're not allowed to query the oracle? - The baselines seem quite outdated. Have the authors tried comparing with more recent baselines such as generative methods using transformers or diffusion models [1, 2, 3, 4]? Some of these papers should also be cited and discussed in the p
1. The notations and model definitions were well-described. I had an easy time understanding the approach, and the explanations were easy on the mind. 2. The intuitions in the paper are stated clearly, and the desideratums were easy to follow. 3. The figures seemed effective at explaining the related concepts. Figure 4 presents a clear summary of the model, Figure 3 clearly demonstrates the problem with FGM choices, and Figures 1 and 2 are also helpful. 4. A limited set of ablation studies sh
1. As is, I fear the introduction and model description sections are quite disconnected from the experiments. The authors spend a substantial amount of time in Sections 1-4 building intution on how the proposed model should work, what features are desired, and so on. The experiments don't do much more than presenting some benchmarks. This makes it difficult for me to verify that the gained improvements are actually the result of the claims and contributions, rathen than being side effects of aux
Code & Models
Videos
Taxonomy
TopicsMusic Technology and Sound Studies
