Cliqueformer: Model-Based Optimization with Structured Transformers

Jakub Grudzien Kuba; Pieter Abbeel; Sergey Levine

arXiv:2410.13106·cs.LG·March 20, 2026

Cliqueformer: Model-Based Optimization with Structured Transformers

Jakub Grudzien Kuba, Pieter Abbeel, Sergey Levine

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

Cliqueformer is a transformer-based model that learns the structure of black-box functions using functional graphical models, improving offline model-based optimization in design tasks like protein engineering and materials discovery.

Contribution

It introduces Cliqueformer, a novel architecture that captures function structure via FGMs, enhancing optimization without conservative methods.

Findings

01

Outperforms existing MBO algorithms across multiple domains.

02

Effectively addresses distribution shift in design tasks.

03

Demonstrates superior results in chemical and genetic design problems.

Abstract

Large neural networks excel at prediction tasks, but their application to design problems, such as protein engineering or materials discovery, requires solving offline model-based optimization (MBO) problems. While predictive models may not directly translate to effective design, recent MBO algorithms incorporate reinforcement learning and generative modeling approaches. Meanwhile, theoretical work suggests that exploiting the target function's structure can enhance MBO performance. We present Cliqueformer, a transformer-based architecture that learns the black-box function's structure through functional graphical models (FGM), addressing distribution shift without relying on explicit conservative approaches. Across various domains, including chemical and genetic design tasks, Cliqueformer demonstrates superior performance compared to existing methods.

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 3

Strengths

- Cliqueformer effectively addresses the limitations of FGM by eliminating the need for explicit graph discovery steps, while still maintaining the scalability advantages of MBO with FGM. - The proposed method is carefully designed based on theoretical observations - It demonstrates significant performance improvements over COMs across a wide variety of tasks. - well-written with clear motivation

Weaknesses

While the paper makes significant contributions, some claims might be somewhat overstated: - The authors state that "we consistently obtained good performance by setting the clique size to $ d_{\text{clique}} = 3 $." However, the experimental results suggest that the choice of $ d_{\text{clique}} $ or $ N_{\text{clique}} $ significantly affects performance. For instance, Figure 5(b) shows that only $ N=4 $ achieves better performance than $ N=1 $ (assuming $ N=1 $ represents the whole graph), a

Reviewer 02Rating 6Confidence 4

Strengths

The paper presents a novel method for offline model-based optimization. The idea of learning a latent space and imposing FGM on it is interesting. I also like the simplicity of the model and the clear writing of the paper. The empirical performance of the proposed method is strong against different baselines.

Weaknesses

- Looking at Figure 5, the performance varies significantly with different values of cliques, and there is not a universal value across different datasets. How did the authors pick the value in an offline setting where you're not allowed to query the oracle? - The baselines seem quite outdated. Have the authors tried comparing with more recent baselines such as generative methods using transformers or diffusion models [1, 2, 3, 4]? Some of these papers should also be cited and discussed in the p

Reviewer 03Rating 5Confidence 3

Strengths

1. The notations and model definitions were well-described. I had an easy time understanding the approach, and the explanations were easy on the mind. 2. The intuitions in the paper are stated clearly, and the desideratums were easy to follow. 3. The figures seemed effective at explaining the related concepts. Figure 4 presents a clear summary of the model, Figure 3 clearly demonstrates the problem with FGM choices, and Figures 1 and 2 are also helpful. 4. A limited set of ablation studies sh

Weaknesses

1. As is, I fear the introduction and model description sections are quite disconnected from the experiments. The authors spend a substantial amount of time in Sections 1-4 building intution on how the proposed model should work, what features are desired, and so on. The experiments don't do much more than presenting some benchmarks. This makes it difficult for me to verify that the gained improvements are actually the result of the claims and contributions, rathen than being side effects of aux

Code & Models

Repositories

znowu/cliqueformer-code
pytorchOfficial

Videos

Cliqueformer: Model-Based Optimization with Structured Transformers· underline

Taxonomy

TopicsMusic Technology and Sound Studies