MolDA: Molecular Understanding and Generation via Large Language Diffusion Model
Seohyeon Shin, HanJun Choi, Jun-Hyung Park, Hong Kook Kim, and Mansu Kim

TL;DR
MolDA introduces a diffusion-based multimodal framework for molecular understanding and generation, overcoming autoregressive limitations to improve structural validity and global coherence.
Contribution
It replaces traditional autoregressive models with a diffusion approach, integrating graph encoders and a Q-Former for enhanced molecular reasoning and generation.
Findings
Ensures global structural coherence during molecule generation
Achieves higher chemical validity in generated molecules
Supports molecule captioning and property prediction
Abstract
Large Language Models (LLMs) have significantly advanced molecular discovery, but existing multimodal molecular architectures fundamentally rely on autoregressive (AR) backbones. This strict left-to-right inductive bias is sub-optimal for generating chemically valid molecules, as it struggles to account for non-local global constraints (e.g., ring closures) and often accumulates structural errors during sequential generation. To address these limitations, we propose MolDA (Molecular language model with masked Diffusion with mAsking), a novel multimodal framework that replaces the conventional AR backbone with a discrete Large Language Diffusion Model. MolDA extracts comprehensive structural representations using a hybrid graph encoder, which captures both local and global topologies, and aligns them into the language token space via a Q-Former. Furthermore, we mathematically reformulate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
