Multi-Objective-Guided Discrete Flow Matching for Controllable Biological Sequence Design

Tong Chen; Yinuo Zhang; Sophia Tang; Pranam Chatterjee

arXiv:2505.07086·cs.LG·May 15, 2025

Multi-Objective-Guided Discrete Flow Matching for Controllable Biological Sequence Design

Tong Chen, Yinuo Zhang, Sophia Tang, Pranam Chatterjee

PDF

1 Models 4 Reviews

TL;DR

This paper introduces MOG-DFM, a novel framework for guiding discrete flow models to generate biological sequences optimized across multiple conflicting objectives, advancing biomolecule design capabilities.

Contribution

The paper presents MOG-DFM, a general method for multi-objective control of pretrained discrete flow models in biological sequence generation.

Findings

01

Effective multi-property optimization in peptide design.

02

Successful generation of DNA sequences with specific enhancer functions.

03

Demonstrated Pareto-efficient trade-offs in sequence generation.

Abstract

Designing biological sequences that satisfy multiple, often conflicting, functional and biophysical criteria remains a central challenge in biomolecule engineering. While discrete flow matching models have recently shown promise for efficient sampling in high-dimensional sequence spaces, existing approaches address only single objectives or require continuous embeddings that can distort discrete distributions. We present Multi-Objective-Guided Discrete Flow Matching (MOG-DFM), a general framework to steer any pretrained discrete flow matching generator toward Pareto-efficient trade-offs across multiple scalar objectives. At each sampling step, MOG-DFM computes a hybrid rank-directional score for candidate transitions and applies an adaptive hypercone filter to enforce consistent multi-objective progression. We also trained two unconditional discrete flow matching models, PepDFM for…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 3

Strengths

- The problem of multi-objective sequence generation is an important problem. Specifically, generating sequences that bind strongly to target molecules while exhibiting low affinity for off-targets has significant practical value. - Performing biological sequence generation in a discrete space offers advantages and strengths.

Weaknesses

- The main weakness of this paper is the lack of technical motivation for the proposed method. The method consists of four steps for sequence generation. However, the paper does not explain why these specific steps are expected to improve the generation process. There are several simpler approaches for multi-objective generation. However, the paper does not discuss the advantages of the proposed method relative to these alternatives. This weakens the overall contribution. - The experimental sect

Reviewer 02Rating 2Confidence 4

Strengths

- The problem statement and the proposed method are well-motivated. - MOG-DFM extends discrete flow matching to support Pareto-guided generation across multiple objectives.

Weaknesses

- The proposed method seems to be a simple adaptation of the ParetoFlow method [1] to the discrete sequence case, where each of the key steps shares significant similarity to ParetoFlow. This limits the novelty and methodological contribution of the proposed method. Also, the difference and novelty, especially in comparison to ParetoFlow are not clearly stated. - In step 1, the authors randomly select one position on the sequence to update. This is very inefficient, especially when the sequence

Reviewer 03Rating 2Confidence 3

Strengths

The paper introduces a new paradigm for performing multi-objective optimization directly within a discrete flow-matching model. The hypercone filtering mechanism is novel and the rank–directional combination is original and intuitive, offering a practical way to handle non-differentiable or noisy objectives. Applies the method to two distinct domains (DNA and peptides) and compares against multiple baselines (diffusion, evolutionary, and hybrid methods).

Weaknesses

Many of its design features lack a more intuitive explanation and stronger motivation, such as exponential reweighting and the use of EMA to maintain $\Phi$. Subsequent ablation experiments reinforce this concern. For example, in Table 6, w/o filtering and w/o adaptation achieve better performance on the Hemolysis metric. The paper would benefit from consistent variable definitions and a clearer flow between subsections. Some mathematical symbols, e.g. $\mathcal{T},T,[K]^d$, should be clearly ex

Reviewer 04Rating 6Confidence 3

Strengths

1. Clear problem formulation and practical relevance. Multi-objective sequence design with discrete tokens is important in therapeutic peptide and regulatory DNA design. Framing controllability directly over discrete token transitions addresses a real gap in methods that assume continuous embeddings. 2. Clarity and structure. The paper is generally well written, with a clean decomposition (score, direction, filtering). 3. The method seems to have solid empirical evidence on two modalities. An

Weaknesses

1. Validation relies heavily on proxy predictors with limited external ground truth. It seems the oracle predictor are not reliable. Some paper https://arxiv.org/abs/2503.17286 discussed some lookup-table oracle/task where you may want to use, or at least discuss. 2. While tables report per-objective means, there is no traditional metrics like hypervolume (HV). 3. There seems a lack of error bars/variances in the paper. 4. The rank-directional score is overlly complicated and needs to tune

Code & Models

Models

🤗
ChatterjeeLab/MOG-DFM
model· ♡ 6
♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBalanced Selection