High-Accuracy Detection of Odor Presence from Olfactory Bulb Local Field Potentials via Deep Neural Networks
Matin Hassanloo, Ali Zareh, Mehmet Kemal Özdemir

TL;DR
This paper introduces a deep learning system that accurately detects odors using brain signals from mice, offering a new approach for odor sensing.
Contribution
The study introduces a novel deep learning framework using LFPs from the olfactory bulb for robust single-trial odor detection.
Findings
The proposed model achieved 86.2% mean accuracy in detecting odor presence from LFPs.
The model outperformed previous benchmarks with an F1-score of 85.3% and an AUC of 0.942.
t-SNE visualization confirmed the model captures biologically significant odor signatures.
Abstract
Odor detection underpins food safety, environmental monitoring, medical diagnostics, and many more fields. Current artificial sensors developed for odor detection struggle with complex mixtures, while non-invasive recordings lack reliable single-trial fidelity. To develop a general system for odor detection, in this study we present preliminary work where we test two hypotheses: (i) that spectral features of local field potentials (LFPs) are sufficient for robust single-trial odor detection and (ii) that signals from the olfactory bulb alone are adequate. To test these hypotheses, we propose an ensemble of complementary one-dimensional convolutional networks (ResCNN and AttentionCNN) that decodes the presence of odor from multichannel olfactory bulb LFPs. Tested on 2349 trials from seven awake mice, our final ensemble model supports both hypotheses, achieving a mean accuracy of 86.2%,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4- —National Authority TUBITAK
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOlfactory and Sensory Function Studies · Advanced Chemical Sensor Technologies · Insect Pheromone Research and Control
1. Introduction
The development of advanced sensor technologies for the reliable detection of odorants is a critical challenge in different fields ranging from environmental safety to medical diagnostics [1,2,3,4,5]. Moving beyond the limitations of conventional electronic noses (e-noses), the field of neural sensing aims to emulate the olfactory system, using its rapid and complex processing to create high-performance brain–computer interfaces (BCIs) [6,7,8,9,10,11]. Central to this approach is the decoding of neural signals, and among these, the LFP offers a uniquely powerful data source. Unlike the sparse firing patterns of individual neurons, the LFP represents the synchronized synaptic activity of thousands of neurons, offering a robust signal for single-trial classification [12,13,14].
LFP recordings overcome the limitations of non-invasive methods in two critical aspects. First, direct LFP recordings offer superior spatial resolution, eliminating contamination from non-neural sources [13]. Second, these recordings provide a high signal bandwidth that allows analysis of high-frequency oscillations known to be important for olfactory processing [15]. By capturing this clean and full spectrum of neural activity, LFP recordings offer a powerful and decodable feature source for the specific task of binary odor detection, potentially overcoming the limitations of traditional sensors.
Previous studies on the decoding of odor presence from neural signals have generally focused on non-invasive methods, which suffer from a critical loss of high-fidelity spectral information [16,17,18]. Scalp electroencephalography (EEG), for instance, struggles with a low signal-to-noise ratio (SNR) due to the deep cortical source of olfactory signals [19,20]. Consequently, achieving successful classification with these non-invasive signals requires methods that are impractical for real-world applications. For example, many studies rely on averaging responses across dozens of trials, making real-time detection impossible [21,22]. The challenge of non-invasive decoding is illustrated in a recent study by Rajabi et al., who targeted olfactory bulb (OB) activity non-invasively with an electrobulbogram (EBG) and a 1D CNN. Their model achieved an Area Under the Curve (AUC) of 0.58, indicating a performance close to the binary chance level for single-trial classification [23].
We introduce an ensemble of complementary convolutional neural networks, combining an AttentionCNN and a ResCNN, to determine odor presence from 32-channel extracellular LFP recordings. We propose that using high-fidelity direct LFP recordings instead of low-fidelity non-invasive signals will help close the existing performance gap. Based on the proposed ensemble model, we propose two central hypotheses. First, we hypothesize that the spectro-temporal features within single-trial LFPs provide sufficient information to robustly and accurately classify odor-presence versus odor-absence conditions. Second, we hypothesize that neural activity from the olfactory bulb alone is sufficient for the odor presence detection task, without requiring signals from downstream processing areas like the piriform cortex (PCx). By developing an ensemble of deep convolutional neural networks [24,25], we demonstrate a significant leap in performance, thereby establishing the feasibility of using LFP recordings as a foundation for odor sensing.
2. Materials and Methods
In this section, we detail the pre-processing steps applied to LFP signals, introduce two core architectures (AttentionCNN and ResCNN), and explain the procedures for training and ensembling these models, along with the evaluation metrics. Figure 1 provides an overview of our methodological pipeline.
2.1. Data Source and Experimental Context
Our analysis uses the pcx-1 dataset [26,27] prepared by Bolding and Franks (2018), which is publicly available on CRCNS.org (https://doi.org/10.6080/K00C4SZB). It contains simultaneous extracellular recordings of mouse OB and PCx, acquired with 32-channel NeuroNexus Poly3 silicon probes (NeuroNexus, Ann Arbor, MI, USA) at a sampling rate of 30 kHz, together with respiration traces sampled at 2 kHz. Although both OB and PCx signals are provided, in this study we analyze only the OB recordings.
The dataset comprises two complementary recording sets. The main dataset contains 2349 trials from seven awake head-fixed mice, with odor stimuli (ethyl butyrate, isoamyl acetate, 2-hexanone, hexanal, ethyl acetate, and ethyl tiglate), each delivered at 0.3% v/v concentration; mineral oil served as the baseline control (n = 336 trials). A supplementary concentration dataset contains 2600 trials examining ethyl butyrate across four concentrations (0.03%, 0.1%, 0.3%, and 1.0% v/v; n = 200 trials per concentration) with matched mineral oil controls (n = 200 trials), to enable a systematic analysis of concentration-dependent detection thresholds. All analyses were implemented in Python (version 3.12.12) using NumPy (version 2.0.2) for numerical computations.
2.2. Pre-Processing
2.2.1. Dataset Preparation and Class Balancing
A systematic pre-processing pipeline was implemented to prepare the raw LFP signals for analysis. The ‘Odor-Presence’ class was established from all active odorant trials (n = 2013) while the Mineral Oil controls (n = 336) defined the ‘Odor-Absence’ class. This grouping resulted in a significant 6:1 data imbalance. To mitigate the risk of classification bias, we applied random under-sampling (without replacement) to the ‘Odor-Presence’ group, selecting 336 trials to match the minority class. This procedure yielded a final balanced dataset composed of 672 trials for the binary odor detection task.
2.2.2. Signal Filtering and Downsampling
Every 32 recording channels were subjected to a fifth-order Butterworth bandpass filter (SciPy version 1.16.3) with cut-off points at 0.5 Hz and 100 Hz. These cut-off points were chosen to allow odor-relevant frequencies in the delta to gamma range to be preserved and to eliminate baseline drift and high-frequency artifacts [14,28,29]. After filtering, the data were downsampled at a rate of 30 to 1 kHz, resulting in 2000 data samples per 2-s trial with no aliasing artifacts (since the cut-off is well below the Nyquist frequency at 500 Hz post-downsampling).
2.2.3. Spectral Feature Extraction and Normalization
Following the pre-processing, spectral features were extracted and normalized. First, power spectral densities (PSDs) were computed for each channel using Welch’s method with a 256-point Hann window and 50.0% overlap [30]. The resulting spectra were then normalized using a RobustScaler (scikit-learn version 1.6.1) by subtracting the median and dividing it by the inter-quartile range to reduce the impact of spectral outliers [31].
2.3. Network Architecture
AttentionCNN: AttentionCNN is a well-known architecture for processing complex feature sets. Given the richness and variability of odor-evoked spectral activity, we chose this model to focus on the most discriminative temporal patterns. Our implementation begins with a stand-alone max-pooling layer (stride 2), which is followed by two identical convolutional blocks. Each block consists of the sequence [Conv1D → BatchNorm → ReLU → MaxPool (stride 4)], and these operations collectively expand the output to 128 channels. We then apply three parallel Conv1D branches (kernel sizes 1, 3, 5; 64 filters each), concatenate to form 192 channels, and then recalibrate via a squeeze-and-excitation channel attention module [32], followed by spatial attention.
For a given feature map , the squeeze-and-excitation operation performs the following:
where is the sigmoid function, is ReLU, and , with reduction ratio .
A global average pooling layer reduces the temporal dimension, producing a 192-dimensional vector that passes through dropout (p = 0.3), a 256-unit ReLU fully connected layer, dropout (p = 0.5), and a final linear classifier for binary odor detection.
ResCNN: ResCNN builds on residual blocks to enable very deep networks, improving the gradient flow and feature reuse. This makes it ideal for capturing hierarchical temporal features in LFP spectra. Our ResCNN adaptation starts with max-pooling (stride 2), a Conv1D layer (kernel 7, stride 2) with BatchNorm and ReLU, and another max-pool (stride 4). Three residual blocks (64 channels each) then refine the features, followed by a Conv1D (kernel 3, stride 1 → BatchNorm → ReLU) to expand to 128 channels. After a final max-pool (stride 2) and two more 128-channel residual blocks, global average pooling condenses the temporal dimension. A dropout layer (p = 0.4) precedes the final linear classifier, producing robust single-trial odor-presence decisions.
2.4. Rationale for Architecture Selection
To justify our architecture choices and ensure reproducibility, we compared eight CNN variants on our dataset: (1) Vanilla CNN (3-layer baseline), (2) Deep CNN (6 layers without skip connections), (3) Dilated CNN (dilated convolutions), (4) Wide CNN (increased channel width), (5) Shallow CNN (2 layers), (6) AttentionCNN (our proposed architecture with CBAM attention), (7) ResCNN (our proposed architecture with squeeze-and-excitation residual blocks), and (8) Ensemble (combination of AttentionCNN and ResCNN). All models were trained using identical protocols: 5-fold cross-validation, 150 epochs, AdamW optimizer with cosine annealing, and fixed random seed (42) to have a fair comparison [33,34,35]. The comprehensive performance comparison is presented in Section 3.1.
2.5. Training Procedure
To train and evaluate our models, we employed a five-fold cross-validation scheme on the 2349 trials. For each fold, the data were partitioned into a training set (80.0%) and a test set (20.0%), with 10% of the training set being used for the validation. We optimized network parameters using the AdamW optimizer (PyTorch version 2.9.0; initial learning rate , weight decay ) [36]. All GPU computations were performed on NVIDIA A100-SXM4-80GB (NVIDIA Corporation, Santa Clara, CA, USA) with CUDA (version 12.6). The learning rate schedule was varied depending on the model being evaluated: for ResCNN and AttentionCNN, the learning rate followed a Cosine Warm Restarts schedule ( epochs, ). For the ensemble evaluation, the models within each fold were trained using a One-Cycle policy [37]. Early stopping was applied in each run, with a patience of 15–20 epochs and a minimum required improvement ( ) of 0.001 in the validation loss. The model checkpoint with the best validation loss was then used for the final testing.
2.6. Ensemble Strategy
To combine the complementary outputs of AttentionCNN and ResCNN without additional training, we employed a late fusion of their softmax probabilities. For a given trial x, let
We compute the ensemble probability as the arithmetic mean,
and assign an “odor” label if . This simple fusion does not require additional parameters and incurs minimal inference overhead. Under five-fold cross-validation, this probability averaging was performed within each fold: For each of the five splits, an AttentionCNN and a ResCNN were trained on the training portion, and their ensembled predictions were evaluated on the held-out test portion. The final reported metrics are the mean and standard deviation of the performance across these five folds.
2.7. Evaluation Metrics
We evaluated the classification performance using the following metrics:
where , , , and denote true positives, true negatives, false positives, and false negatives, respectively. Additionally, we computed the Area Under the Receiver Operating Characteristic Curve (AUC) and used confusion matrices to clarify detailed error distributions. All metrics were calculated within each fold of a five-fold cross-validation and reported as the mean ± standard deviation. Furthermore, to verify the reliability of the probability estimates, we compared the model output with the actual labels for each test trial to assess the calibration (for example, a predicted odor probability ∼80.0% typically indicated actual odor trials ∼80.0%).
3. Results
3.1. Architecture Comparison and Selection
Table 1 presents the systematic comparison of eight CNN architectures, justifying our selection of AttentionCNN and ResCNN for the ensemble model. The baseline Vanilla CNN (three convolutional layers) achieved 83.9% accuracy, establishing our performance floor. Simply adding depth without architectural innovations proved ineffective—the Deep CNN (six layers without skip connections) matched this performance at 83.9%, likely due to gradient degradation. Dilated convolutions (Dilated CNN: 83.0%) and increased channel width (Wide CNN: 82.1%) similarly failed to improve upon the baseline, while the Shallow CNN (79.9%) confirmed that insufficient model capacity limits the performance.
In contrast, our proposed architectures demonstrated clear advantages. ResCNN with squeeze-and-excitation residual blocks reached 85.6%, demonstrating the value of skip connections and channel attention for multi-electrode LFP classification. AttentionCNN with CBAM attention achieved 84.7%, learning complementary representations through its multi-scale convolutional branches. Combining both architectures via ensemble averaging yielded the highest performance (86.2% accuracy, 0.942 AUC), a 2.3 percentage point improvement over the baseline, validating our design choices.
3.2. Performance Metrics Distribution Across Models
Table 2 summarizes the comprehensive evaluation metrics for our three primary models across five-fold cross-validation. AttentionCNN achieved 84.7% accuracy with high specificity (92.2%), while ResCNN reached 85.6% accuracy with more balanced sensitivity (87.0%) and specificity (86.0%). The ensemble combined these complementary strengths, achieving 86.2% accuracy with 84.0% sensitivity and 90.0% specificity. These results indicate that (i) both architectural innovations yield strong single-trial detection, and (ii) ensemble fusion maintains peak accuracy while balancing the sensitivity and specificity for robust odor detection.
3.3. Learned Feature Representation
Beyond the quantitative metrics, we validated that our model learned biologically significant features. Figure 2 visualizes the feature space learned by the network using t-SNE [38]. The per-odor colored embedding (Figure 2) achieves a silhouette score of 0.544. Mineral oil trials cluster distinctly from odor trials, while the relative positions of individual odorants correlate with their classification accuracy—hexanal and 2-hexanone form tight clusters far from the control region, whereas ethyl acetate trials appear closer to the boundary, consistent with its lower detectability (detailed in Section 3.4). The 3D projections provide additional perspective, clearly showing the spatial separation between the control and odor-evoked neural patterns.
The partial overlap seen in the t-SNE embedding reflects differences in how well the neural responses to different odorants can be distinguished from the baseline. Our per-odor analysis (Table 3) shows that odorants producing weaker neural signatures, such as ethyl acetate (67.1% accuracy), account for most of the overlap region. In contrast, odorants like hexanal (96.3% accuracy) form tight well-separated clusters. This pattern is consistent with what we know about glomerular activation [39], where some odorant molecules produce more distinct responses than others.
3.4. Per-Odor Classification Performance
When we trained separate classifiers for each odorant against the mineral oil control, we found substantial differences in how well each could be detected (Table 3, Figure 3). Hexanal and 2-hexanone both achieved over 95% accuracy, suggesting these compounds produce particularly distinctive neural responses [40]. Ethyl acetate, on the other hand, proved much harder to detect at only 67.1% accuracy. This 29.2 percentage point range tells us that the olfactory bulb does not encode all odors with equal clarity. This pattern indicates that aldehydes (hexanal) and ketones (2-hexanone) show higher detectability compared to esters (ethyl acetate, ethyl tiglate), likely due to their distinct molecular properties. Factors like receptor binding characteristics and the spatial pattern of glomerular activation [41,42] determine how discriminable each odor’s neural signature is from the baseline.
3.5. Concentration-Dependent Detection
We tested whether our detection framework has a concentration threshold by analyzing ethyl butyrate across four concentration levels spanning two orders of magnitude (Table 4). At 0.03% v/v, the classification was no better than a binary random chance (51.3%), indicating the neural response at this concentration is too weak to detect [43,44]. The performance rose to 60.8% at 0.1% v/v and reached 75.2% at 0.3% v/v, which we take as the practical detection threshold. At the highest concentration tested (1.0% v/v), the accuracy reached 86.4%. This monotonic increase in accuracy with concentration mirrors the dose–response characteristics of sensory neurons [45,46], suggesting our classifier is tracking the strength of the neural response rather than picking up on artifacts.
3.6. Inference Speed Analysis
To assess the real-time capability, we benchmarked the inference speed across a CPU and three GPU platforms (Table 5). The lightweight ensemble architecture (461.4 K parameters) achieves single-sample inference in just 2.58 ms on a CPU—faster than the GPU, due to the kernel launch overhead at small batch sizes. Given the 1000 ms trial duration, this corresponds to a real-time factor of 388×, enabling deployment on standard hardware without GPU acceleration. For batch processing of large datasets, GPU throughput reaches 11,780 samples/second at batch size 64, facilitating rapid offline analysis.
3.7. Ensemble Prediction Confidence
Figure 4 displays the distribution of the ensemble prediction confidence for correct and incorrect classifications. The horizontal axis shows the confidence level, and the vertical axis indicates the frequency of trials within each confidence bin. The green bars correspond to correct predictions with higher confidence values (mean confidence ≈ 0.815), while the red bars (incorrect predictions) are distributed more broadly with lower mean confidence (≈0.678). This suggests that the ensemble classifier is well calibrated, as it expresses higher confidence on correct predictions and lower confidence on misclassifications.
4. Discussion
4.1. Overcoming Prior Limitations
The findings presented in this study offer two primary contributions to the field of olfactory decoding. The first contribution is related to methodological advance: we establish the feasibility of accurate single-trial odor detection using deep learning on spectral features. This validates our first hypothesis that these signals contain sufficient information for robust classification without averaging. The second contribution is related to neurological insight: we demonstrate that this performance can be achieved using signals from the olfactory bulb alone. This supports our second hypothesis that the initial stages of olfactory processing are sufficient for the fundamental task of presence detection for seven odors, without requiring contributions from higher cortical regions.
Prior non-invasive work on human olfactory registration has shown lower performance. Rajabi et al. evaluated logistic regression and an end-to-end 1D ResNet on EEG and EBG signals, finding that their linear baseline remained at AUC ≈ 50%, and their ResNet-1D could push AUC into the high-50% range (e.g., 56.6% for scalp-EBG, 58.0% for EEG) [23]. The difficulty of cross-subject generalization with EEG is further shown by the work of Ezzatdoost et al., who achieved 64.3% accuracy on a more complex four-odor identification task using handcrafted nonlinear features [19]. Similarly, Kato et al. demonstrated that, while odor representations in human EEG can be decoded within 100 ms post-stimulus onset, achieving reliable classification remains challenging due to signal quality limitations [47]. This demonstrates that current non-invasive recordings lack sufficient SNR for accurate single-trial odor detection [18,48] and highlights the advantage of LFP signals.
Table 6 places our results within the larger context of neural odor decoding. Although direct quantitative comparison is limited by differences in recording modality or experimental procedure, the substantial performance gap corroborates our methodological assumption that high SNR neural recordings are a prerequisite for robust single-trial odor detection. The 22–29 percentage point accuracy gain observed is consistent with the recognized difference in signal fidelity between intracranial LFP recordings versus non-invasive scalp modalities such as EEG or EBG [13], definitively validating our methodological assumptions. Finally, we acknowledge that the performance differential may be influenced by interspecific differences in olfactory processing between mice and humans.
4.2. Methodological Contributions
This work establishes a reliable neural-based binary odor detection system, providing both methodological advances and biological insights. Our ensemble approach effectively combines complementary CNN architectures to achieve reliable performance. The clear separation observed in our t-SNE visualization (Figure 2) and the high confidence levels for correct predictions (Figure 4) further validate that our models have learned biologically meaningful features that capture fundamental differences between odor-presence and odor-absence neural states. Compared to prior EEG-based studies, our LFP approach offers higher spatial resolution and direct access to OB circuitry, enabling the extraction of rich spectro-temporal signatures that were previously inaccessible.
4.3. Limitations and Future Directions
Our concentration analysis (Section 3.5) shows that the detection performance depends on the stimulus strength, with a practical threshold at 0.3% v/v for ethyl butyrate. Below this concentration, neural responses become too weak for reliable single-trial classification. This finding has important implications for real-world applications, where target analyses may be present at trace concentrations. Future work should systematically characterize detection thresholds across all six odorants and investigate whether multi-trial averaging or more sensitive pre-processing methods can lower these limits.
Our evaluation focuses on monomolecular odorants at a fixed concentration (0.3% v/v) in head-fixed mice, leaving important questions about generalization to complex mixtures [49], varying concentrations, and naturalistic behavioral conditions. The invasive nature of LFP recordings also limits immediate translational applications, though our findings provide crucial validation of neural-based odor detection principles. Future work will systematically extend our approach to diverse odor mixtures and concentration ranges, validate its performance in freely moving animals, and investigate non-invasive recording modalities to assess translational feasibility.
5. Conclusions
This study demonstrates that deep neural networks can achieve robust single-trial odor detection from olfactory bulb LFPs, establishing two key findings. First, spectral features within single-trial LFPs provide sufficient information for accurate odor presence classification, achieving 86.2% accuracy and 0.942 AUC—substantially outperforming prior non-invasive approaches. Second, signals from the olfactory bulb alone are adequate for this task, without requiring downstream cortical processing. Our ensemble of AttentionCNN and ResCNN architectures leverages complementary feature extraction strategies to capture biologically significant neural signatures, as confirmed by t-SNE visualization and prediction confidence analysis. These findings establish the feasibility of LFP-based odor sensing and provide a foundation for future development of neural-based detection systems.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Sanislav T. Mois G.D. Zeadally S. Folea S. Radoni T.C. Al-Suhaimi E.A. A Comprehensive Review on Sensor-Based Electronic Nose for Food Quality and Safety Sensors 202525443710.3390/s 2514443740732564 PMC 12301011 · doi ↗ · pubmed ↗
- 2Kim C. Lee K.K. Kang M.S. Shin D.M. Oh J.W. Lee C.S. Han D.W. Artificial olfactory sensor technology that mimics the olfactory mechanism: A comprehensive review Biomater. Res.2022264010.1186/s 40824-022-00287-135986395 PMC 9392354 · doi ↗ · pubmed ↗
- 3Deng H. Chen Z. Feng P. Tian L. Zong H. Nakamoto T. Recent Advances and Applications of Odor Biosensors Electronics 202514185210.3390/electronics 14091852 · doi ↗
- 4Dennler N. Drix D. Warner T.P.A. Rastogi S. Della Casa C. Ackels T. Schaefer A.T. van Schaik A. Schmuker M. High-speed odor sensing using miniaturized electronic nose Sci. Adv.202410 eadp 176410.1126/sciadv.adp 176439504378 PMC 11540037 · doi ↗ · pubmed ↗
- 5Kim T. Kim Y. Cho W. Kwak J.-H. Cho J. Pyeon Y. Kim J.J. Shin H. Ultralow-power single-sensor-based e-nose system powered by duty cycling and deep learning for real-time gas identification ACS Sens.202493557357210.1021/acssensors.4c 0047138857120 · doi ↗ · pubmed ↗
- 6Shor E. Herrero-Vidal P. Dewan A. Uguz I. Curto V.F. Malliaras G.G. Savin C. Bozza T. Rinberg D. Sensitive and robust chemical detection using an olfactory brain-computer interface Biosens. Bioelectron.202219511366410.1016/j.bios.2021.11366434624799 · doi ↗ · pubmed ↗
- 7Lu Q. Yi M. Jiang J. Bioelectronic nose for ultratrace odor detection via brain-computer interface with olfactory bulb electrode arrays Biosens. Bioelectron.202528511758510.1016/j.bios.2025.11758540393212 · doi ↗ · pubmed ↗
- 8Qin C. Wang Y. Hu J. Wang T. Liu D. Dong J. Lu Y. Artificial Olfactory Biohybrid System: An Evolving Sense of Smell Adv. Sci.202310220472610.1002/advs.202204726 PMC 992914436529960 · doi ↗ · pubmed ↗
