Contrastive Image-Metadata Pre-Training for Materials Transmission Electron Microscopy
Georgia Channing, Debora Keller, Marta D. Rossell, Philip Torr, Stig Helveg, Henrik Eliasson

TL;DR
This paper introduces a contrastive pre-training method for aligning TEM images with their acquisition metadata, enabling improved image retrieval, parameter recovery, and a physics-informed denoising model.
Contribution
It presents a novel CLIP-style encoder trained on paired TEM images and metadata, achieving high cross-modal retrieval and enabling advanced image re-rendering and denoising.
Findings
Achieved 84.4% top-1 cross-modal retrieval accuracy.
All acquisition parameters are recoverable from the visual embeddings.
Proposed denoising model preferred over state-of-the-art in user study.
Abstract
The transmission electron microscope facilitates the highest-resolution imaging of any instrument ever created, and its limiting factor is no longer spatial resolution but dose efficiency. Low electron doses avoid sample damage but produce noisy images for which, unlike in classical computer vision, there is no ground truth. Autonomous materials experimentation poses a related problem, since closed-loop instruments need representations grounded in the microscope state at acquisition. Both demand representations grounded in how an image was acquired. We release 7,330 paired high-angle annular dark-field scanning-TEM (HAADF-STEM) images and their seven-dimensional acquisition metadata, and propose Contrastive Image-Metadata Pre-training (CIMP), a CLIP-style encoder that aligns the two modalities and reaches 84.4% Top-1 cross-modal retrieval on a held-out split. All seven parameters are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
