Benchmark for Assessing Olfactory Perception of Large Language Models

Eftychia Makri; Nikolaos Nakis; Laura Sisson; Gigi Minsky; Leandros Tassiulas; Vahid Satarifard; Nicholas A. Christakis

arXiv:2604.00002·cs.CL·April 2, 2026

Benchmark for Assessing Olfactory Perception of Large Language Models

Eftychia Makri, Nikolaos Nakis, Laura Sisson, Gigi Minsky, Leandros Tassiulas, Vahid Satarifard, Nicholas A. Christakis

PDF

TL;DR

This paper introduces the OP benchmark to evaluate large language models' ability to reason about olfactory concepts, revealing current models mainly rely on lexical associations and highlighting significant gaps in olfactory reasoning.

Contribution

The OP benchmark is the first comprehensive dataset for assessing LLMs' olfactory reasoning across multiple tasks and representations, providing insights into their capabilities and limitations.

Findings

01

Compound-name prompts outperform SMILES prompts in olfactory tasks.

02

Best model achieves 64.4% accuracy, indicating emerging but limited olfactory reasoning.

03

Multilingual aggregation improves prediction accuracy, AUROC = 0.86.

Abstract

Here we introduce the Olfactory Perception (OP) benchmark, designed to assess the capability of large language models (LLMs) to reason about smell. The benchmark contains 1,010 questions across eight task categories spanning odor classification, odor primary descriptor identification, intensity and pleasantness judgments, multi-descriptor prediction, mixture similarity, olfactory receptor activation, and smell identification from real-world odor sources. Each question is presented in two prompt formats, compound names and isomeric SMILES, to evaluate the effect of molecular representations. Evaluating 21 model configurations across major model families, we find that compound-name prompts consistently outperform isomeric SMILES, with gains ranging from +2.4 to +18.9 percentage points (mean approx +7 points), suggesting current LLMs access olfactory knowledge primarily through lexical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.