Quantifying the human visual exposome with vision language models
Christian Rominger (1), Andreas R. Schwerdtfeger (1), Malay Gaherwar Singh (2), Dimitri Khudyakow (2), Elizabeth A. M. Michels (2), Fabian Wolf (2), Jakob Nikolas Kather (2,3,4), Magdalena Katharina Wekenborg (2) ((1) University of Graz, (2) TU Dresden

TL;DR
This paper introduces a scalable method combining vision language models and large language models to quantify the visual environment's impact on mental health, surpassing traditional proxies.
Contribution
It presents a novel, scalable approach to quantify visual context and its association with mental health using advanced AI models and real-world imagery.
Findings
VLM-derived greenness estimates predict affect and stress.
Over 33% of visual context ratings correlate with mental health measures.
The pipeline enables high-throughput decoding of visual environment effects.
Abstract
The visual environment is a fundamental yet unquantified determinant of mental health. While the concept of the environmental exposome is well established, current methods rely on coarse geospatial proxies or biased self reports, failing to capture the first person visual context of daily life. We addressed this gap by coupling ecological momentary assessment with vision language models (VLMs) to quantify the semantic richness of human visual experience. Across 2674 participant generated photographs, VLM derived estimates of greenness robustly predicted momentary affect and chronic stress, consistent with established benchmarks. We then developed a semi autonomous large language model (LLM) based pipeline that mined over seven million scientific publications to extract nearly 1000 environmental features empirically linked to mental health. When applied to real world imagery, up to 33…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
