Bolek: A Multimodal Language Model for Molecular Reasoning

Frederic Grabowski; Jacek Szczerbi\'nski; Maciej Ja\'skowski; Kalina Jasi\'nska-Kobus; Pawe{\l} D\k{a}browski-Tuma\'nski; Tomasz Jetka; Bartosz Topolski

arXiv:2605.02745·cs.LG·May 5, 2026

Bolek: A Multimodal Language Model for Molecular Reasoning

Frederic Grabowski, Jacek Szczerbi\'nski, Maciej Ja\'skowski, Kalina Jasi\'nska-Kobus, Pawe{\l} D\k{a}browski-Tuma\'nski, Tomasz Jetka, Bartosz Topolski

PDF

TL;DR

Bolek is a multimodal language model that grounds molecular reasoning in structure, outperforming larger models in drug-discovery tasks with more grounded explanations and verifiable features.

Contribution

Introduces Bolek, a compact multimodal model injecting molecular structure into language reasoning, achieving superior performance and explainability over larger models.

Findings

01

Bolek outperforms baseline models on 13 of 15 classification tasks.

02

Bolek's explanations cite descriptors 10-100x more often, aligning with RDKit values.

03

Bolek generalizes well to unseen molecular endpoints.

Abstract

Molecular property models increasingly support high-stakes drug-discovery decisions, but their outputs are often difficult to audit: classical predictors return scores without rationale, while language models can produce fluent explanations weakly grounded in the input molecule. We introduce Bolek, a compact multimodal language model that grounds natural-language reasoning in molecular structure by injecting a Morgan fingerprint embedding into an instruction-tuned text decoder. Bolek is fine-tuned on molecular alignment tasks, including molecule description, RDKit descriptor prediction, and substructure detection, and on downstream reasoning over 15 TDC binary classification tasks using synthetic chains-of-thought anchored in concrete molecular features. Across these tasks, Bolek outperforms its Qwen3-4B-Instruct base on all endpoints in yes/no mode and on 13 of 15 in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.