DisProtEdit: Exploring Disentangled Representations for Multi-Attribute Protein Editing
Max Ku, Sun Sun, Hongyu Guo, Wenhu Chen

TL;DR
DisProtEdit is a novel framework for controllable protein editing that learns disentangled representations of structure and function using dual-language supervision, enabling interpretable and modular modifications.
Contribution
It introduces a new disentangled representation learning approach for protein editing with a large-scale multimodal dataset and achieves improved interpretability and control.
Findings
Competitive performance on protein editing benchmarks.
Achieves up to 61.7% success rate on multi-attribute editing.
Demonstrates improved interpretability and controllability.
Abstract
We introduce DisProtEdit, a controllable protein editing framework that leverages dual-channel natural language supervision to learn disentangled representations of structural and functional properties. Unlike prior approaches that rely on joint holistic embeddings, DisProtEdit explicitly separates semantic factors, enabling modular and interpretable control. To support this, we construct SwissProtDis, a large-scale multimodal dataset where each protein sequence is paired with two textual descriptions, one for structure and one for function, automatically decomposed using a large language model. DisProtEdit aligns protein and text embeddings using alignment and uniformity objectives, while a disentanglement loss promotes independence between structural and functional semantics. At inference time, protein editing is performed by modifying one or both text inputs and decoding from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Microfluidic and Catalytic Techniques Innovation
