Differences in event-related potentials between unipolar depression and bipolar II disorder during depressive episodes: a retrospective case-control study

Xiaobo Zhou; Jingwen Liu; Zhonghua Lin; Minjing Xiang; Xia Deng; Zhili Zou

PMC · DOI:10.1186/s12888-025-07433-8·October 22, 2025

Differences in event-related potentials between unipolar depression and bipolar II disorder during depressive episodes: a retrospective case-control study

Xiaobo Zhou, Jingwen Liu, Zhonghua Lin, Minjing Xiang, Xia Deng, Zhili Zou

PDF

Open Access

TL;DR

This study finds differences in brain activity between people with unipolar depression and bipolar II disorder during depressive episodes, which could help in accurate diagnosis.

Contribution

The study identifies specific neurophysiological markers, such as prolonged S2-P50 latency, that distinguish bipolar II disorder from unipolar depression.

Findings

01

Both unipolar depression and bipolar II disorder groups had longer reaction times and larger P2-N2 complex amplitudes compared to healthy controls.

02

The bipolar II disorder group showed significantly longer S2-P50 latency compared to the unipolar depression group.

03

Bipolar II disorder patients had prolonged N2 latency compared to healthy controls.

Abstract

Bipolar II disorder (BD II) is a chronic and severe mental illness frequently misdiagnosed as major depressive disorder (MDD) due to symptom overlap and the absence of objective diagnostic tools. Consequently, establishing pathophysiological markers to differentiate BD II from MDD is critical. A total of 180 patients were enrolled in the study and allocated to three groups: patients with unipolar depression (UD group; MDD currently experiencing a major depressive episode, n = 60), patients with bipolar II disorder during depressive episodes (BD II group; n = 60), and age- and sex- matched healthy controls (HC; n = 60). Sociodemographic data were collected, and all participants underwent psychological assessments using the 7-item Generalized Anxiety Disorder (GAD-7), Patient Health Questionnaire-9 (PHQ-9), and 32-item Hypomania Checklist (HCL-32). Additionally, all participants passed…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases9

bipolar II disorder major depressive disorder unipolar depression mental illness depression Hypomania BD MDD Generalized Anxiety Disorder

Figures2

Click any figure to enlarge with its caption.

Differences in event-related potentials among patients with bipolar II depression (BD II), unipolar depression (UD), and healthy controls (HC). Both UD and BD II patients (a) showed larger amplitude of P2-N2 complex compared to HC (b). The latency of S2-P50 in BD II patients (c) was significantly longer than UD patients (d)

Univariate and ROC analyses for differentiating bipolar II depression (BD II) from unipolar depression (UD). a MMN and P300 components for differentiating BD II from UD: Univariate odds ratios. b CNV and P50 components for differentiating BD II from UD: Univariate odds ratios. c ROC curve analysis of S2-P50 latency for differentiating BD II from UD (AUC = 0.801)

Funding2

—Chengdu Science and Technology Program
—Sichuan Medical Association

Keywords

Bipolar II disorderUnipolar depressionEvent-related potentialCognitionP300

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBipolar Disorder and Treatment · Electroconvulsive Therapy Studies · Neuroscience and Music Perception

Full text

Introduction

Bipolar II disorder (BD II) is a chronic and severe psychiatric condition characterized by recurrent major depressive episodes alternating with hypomanic episodes. A nationwide epidemiological study [1] reported a lifetime prevalence of 0.09% for bipolar I disorder and 0.04% for bipolar II disorder in China. Compared with bipolar I disorder, BD II is more likely to be misdiagnosed as unipolar depression in clinical settings [2]. Individuals with bipolar II disorder have a significantly higher lifetime suicide risk than the general population [3, 4]. Additionally, a number of studies have reported relatively higher rates of completed suicides in patients with BD II compared to BD I [5].

BD II is inherently diagnostically challenging, with the majority of patients initially presenting with depressive symptomatology. This diagnostic complexity contributes to substantial under-detection [6], with studies indicating that nearly 40% of BD-II cases are misclassified as major depressive disorder (MDD; also known as unipolar depression [UD]) during initial evaluations [7]. The resultant diagnostic uncertainty creates a perilous clinical gap – the median interval from first psychiatric consultation to correct BD II diagnosis exceeds 10 years [8]. The consequences of misdiagnosis of BD II include inappropriate treatment choices, such as under-prescription of mood-stabilizing medications, rapid cycling, relapse and hospitalization, increased risk of suicidal thoughts and behaviors, cognitive and functional impairments, and increased care expenses [8].

However, mental disorders pose complex diagnostic challenges, as current classifications, whether Diagnostic and Statistical Manual of Mental Disorders (DSM-V) or International Classification of Diseases (ICD-11), rely predominantly on symptomatology due to the lack of validated objective biomarkers. Compared to UD, earlier onset of depressive episodes and emotional instability may represent distinguishing clinical features of BD II. However, these features pose challenges for objectively quantifying distinctions between UD and BD II [9, 10]. Identifying robust neurophysiological or biochemical biomarkers to distinguish BD II from UD has proven highly challenging Despite promising accuracy of fMRI [Area under the curve (AUC): 0.84–0.94] [11] and gut microbiome genomic analysis (AUC: 0.72–0.88) [12] in differentiating BD II and UD, their clinical translation is constrained by practical high expense. Meanwhile, serum cytokine profiling exhibits only modest discriminatory capability (AUC: 0.69–0.74) [13], underscoring the need for cost-effective and reliable biomarkers like electrophysiological indices.

Event-related potentials (ERP) offer an objective, non-invasive, and economical method to detect neural activity associated with cognitive and sensory processes [14]. ERP comprise key components such as mismatch negativity (MMN), P300, contingent negative variation (CNV), and P50 suppression. Characteristics of these components, including presence, amplitude, topography, and latency, could serve as indicators of cognitive performance and provide objective measures for detecting changes in neurocognitive functioning associated with specific disorders [15, 16]. Many previous studies have shown impairments in psychosocial functioning in individuals with BD II [17] and UD [18]. The superior temporal resolution of ERP enables the tracking of neural activity associated with impaired cognitive performance, providing vital insights for developing therapeutic interventions [19]. Consequently, ERP components show promise as potential biomarkers for certain aspects of specific mental illnesses [20]. However, relatively few studies in recent decades have directly compared ERP components between patients with BD II and UD. While a systematic review confirmed significant P300 latency differences between bipolar disorder and UD, it highlighted a critical knowledge gap: the lack of subtype-specific data for BD II [21]. Other studies reported that patients with BD II exhibit reduced amplitudes in MMN and P300, when compared to patients with UD or healthy controls (HC) [22, 23].

Therefore, we hypothesize that patients with UD or BD II may exhibit distinct cognitive impairments and that ERP could serve as a significant biomarker for differentiating between UD and BD II. This study aimed to assess neurophysiological responses in patients with BD II, patients with UD, and matched HC, and to determine whether ERP components can function as potential electrophysiological biomarkers for distinguishing BD II from UD.

Methods

Sample size estimation

We aimed to investigate differences in ERP components across three groups: BD II, UD, and HC. An a priori power analysis using G*Power 3.1 [24] determined a minimum sample size of 144 participants (α = 0.05, power (1–β) = 0.90, effect size Cohen's f = 0.3, df = 2,λ = 12.9, n = λ/f^2^ = 12.9/0.09≈143.3) for detecting group differences via analysis of variance (ANOVA). Ultimately, 180 participants were recruited (60 per group).

Participants and procedure

This retrospective case–control study enrolled 180 participants from March 2020 to February 2023, and all participants were stratified into three distinct groups: (1) UD group: 60 patients diagnosed with MDD; (2) BD II group: 60 patients diagnosed with BD II and currently experiencing a depressive episode; and (3) HC group: 60 age- and sex- matched healthy participants. Patients with BD II and UD were recruited through the Department of Psychiatry at Sichuan Provincial People's Hospital, while the HC group were recruited from community.

The inclusion criteria for this research were as follows: (1) Aged 18–60 years, female or male; (2) UD group patients met DSM-5 [25] diagnostic criteria for MDD with current depressive episode; (3) BD II group patients met DSM-5 [25] diagnostic criteria for bipolar II disorder with current depressive episode; (4) Healthy participants were required to have: a) No lifetime history of psychiatric disorder; b) Absence of chronic physical illnesses; (5) No history of use of antipsychotics, antidepressants, or other psychotropic medications within 4 weeks prior to baseline assessment; (6) Complete abstinence from alcohol, tobacco, and caffeine-containing products for at least 24 h before the examination; (7) Provision of written informed consent.

The exclusion criteria for this research were as follows: (1) Meeting DSM-5 [25] diagnostic criteria for other mental disorders, (e.g., bipolar I disorder, schizophrenia, obsessive–compulsive disorder, alcohol/substance use disorders); (2) Clinically significant mental or physical conditions impairing task performance; (3) Abnormal auditory brainstem response (ABR) tests (hearing threshold > 60 dB nHL) to confirm intact auditory pathways; (4) History of concussion or brain injury with loss of consciousness within 6 months; (5) Electroconvulsive therapy within 12 months; (6) Diagnosis of neurological disorders.

The study protocol was approved by the Ethics Committee of Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital. All procedures were conducted in accordance with the World Medical Association Declaration of Helsinki.

Psychological assessment tools

Patient Health Questionnaire-9 (PHQ-9)

The PHQ-9 is a validated and widely used instrument for depression screening and severity assessment in clinical settings [26, 27]. This self-report scale comprises 9 items evaluating symptom frequency over the last two weeks. Each item is rated on a 4-point Likert scale (0 = "not at all" to 3 = "nearly every day"), generating total scores from 0 to 27. Severity is stratified using established cutoff values, with scores ≥ 10 demonstrating optimal sensitivity and specificity for depression diagnosis in clinical populations [28].

Generalized Anxiety Disorder −7 GAD-7)

The GAD-7 is a widely used psychometric instrument assessing anxiety symptoms, with established criterion validity, factorial validity, and diagnostic validity [27, 29]. This instrument demonstrates clinical utility in both research and clinical practice, facilitating efficient screening, diagnosis, and severity assessment of anxiety disorders [29].

32-item Hypomania Checklist (HCL-32)

The HCL-32 [30] is a validated self-report instrument with robust psychometric properties in diverse populations [31, 32]. This 32-item tool uses a dichotomous (yes/no) format to assess lifetime hypomanic manifestations, where yes responses are scored 1. A cutoff of ≥ 14 has been established as optimal for detecting historical hypomanic episodes in mood disorder populations [33, 34].

ERP procedures and data analysis

The ABR testing and ERP recordings were performed by a certified neurophysiology technician with over 10 years of specialized experience. All participants underwent standardized electrophysiological assessments in a controlled environment. Patients in the BD II and UD groups were experiencing current depressive episodes at the time of testing.

All participants underwent testing while seated in a reclining chair within an acoustically shielded chamber, where ambient noise levels were maintained below 30 dB nHL throughout the procedure. Participants with abnormal auditory brainstem response (ABR) results (hearing threshold > 60 dB nHL) were excluded to confirm intact auditory pathways.

All participants completed four ERP paradigms in the following fixed sequence during a single 30-min session: (1) MMN (pre-attentive deviant detection in a passive oddball paradigm), (2) P300 paradigm (target stimulus detection in an auditory oddball task), (3) CNV (anticipatory response to warning tones), (4) P50 suppression (paired-click sensory gating assessment).

Electroencephalogram (EEG) data were required using silver/silver-chloride electrodes placed at Fz and Cz, referenced to the earlobes (the mastoid electrodes served as a reference for all EEG electrode channels online). The Fpz electrode served as ground. Data collection employed a Neuropack MEB-9200 evoked potential system (Nihon Kohden, Japan) coupled to an auditory stimulator.

Signal averaging was performed with 0.1–100 Hz bandpass filtering. Epochs exceeding ± 300 μV in amplitude were automatically rejected. The analysis time windows were paradigm-specific: For MMN and P300: −100 to 1000 ms relative to stimulus onset (pre-stimulus baseline correction). For CNV, −500 to 5000 ms. For P50 suppression, 0 to 600 ms. Following filtering, data were resampled to 1000 Hz. The amplifier specifications were listed as follows: input impedance: ≥ 1000 MΩ, Noise: < 0.6 μV rms (1-10 k Hz bandwidth, input short-circuited), Common mode rejection ratio: ≥ 112 dB. Low-cut filter was 0.1 Hz at 6 dB/oct for MMN and P300 paradigms, and 0.01 Hz at 6 dB/oct for CNV and P50. High-cut filter was set as follow: 50 Hz at 6 dB/oct for MMN, 100 Hz at 12 dB/oct for P300, 20 Hz at 12 dB/oct for CNV, and 1000 Hz at 12 dB/oct for P50. Alternating current interference notch filter: 50 Hz (rejection ratio < 1/20); and skin–electrode contact impedance check: < 5 kΩ. The data were processed using overlapping averaging methods.

MMN paradigm

The MMN paradigm comprised 200 auditory stimuli administered in a pseudorandom sequence. The stimulus set included two categories: rarely ‘deviant’ stimuli (20% probability; 100 ms ‘da’, 80 dB nHL) and frequently ‘standard’ stimuli (80% probability; 100 ms ‘da’, 60 dB nHL), with an inter-stimulus interval (ISI) of 1000 ms between consecutive stimuli. Explicit instructions emphasized passive listening without active sound discrimination. Epochs contaminated by eye blinks were excluded from ERP averaging, with additional artifact-free deviant trials added to maintain a minimum of 200 valid trials per subject. The key metrics of MMN components were MMN latency (time interval between the onset of deviant stimulus and the peak negativity of the MMN components) and MMN amplitude (the voltage difference between the MMN peak negativity and the pre-stimulus baseline).

P300 paradigm

The P300 paradigm employed a two-tone auditory oddball task with 30–36 deviant stimuli serving as targets interspersed among standard stimuli, delivered binaurally via headphones. All stimuli consisted of 100 ms tones presented at 1 Hz, with standard stimuli at 60 dB nHL and deviant stimuli at 80 dB nHL. EEG signals were acquired using a 0.1–100 Hz bandpass filter and epochs exceeding ± 40 μV amplitude were excluded from analysis. Participants were instructed to press a response button immediately upon detecting deviant stimuli.

The P300 encompasses four principal components: (1) N1, a negative deflection peaking at 80–120 ms post-stimulus; (2) P2, a positive peak maximal at 150–200 ms; (3) N2, a negative wave occurring at 200–250 ms; and (4) P3, a late positive component with 300–600 ms latency. Four composite amplitudes were quantified using peak-to-trough/trough-to-peak measurements:

- N1-P2 amplitude: from N1 trough to P2 peak;
- P2-N2 amplitude: from P2 peak to N2 trough;
- N2-P3 amplitude: from N2 trough to P3 peak;
- P2-P3 amplitude: from P2 peak to P3 peak.

CNV paradigm

The CNV paradigm consisted of trial sequences initiated by a warning tone (S1; 85 dB nHL). Following a fixed 2000 ms foreperiod, an imperative visual stimulus (S2; repetitive light flashes) appeared. Participants were required to immediately terminate the S2 flashes via dominant-hand button press upon stimulus detection. Each button press triggered a 10-s inter-trial interval (ITI) before subsequent S1 presentation. The CNV amplitude was calculated by averaging seven artifact-free trials.

The key metrics in the CNV paradigm were as follows:

Point A: The onset latency of the early CNV component, which captures transition from sensory processing to anticipatory attention [35].
Point B: The peak latency of the late CNV component, which marks peak motor readiness potential [36].
A-B Interval: Time span (ms) between Point A and Point B.
A-B Area: Total voltage activity (μV·ms) between Point A and Point B.
A-B Amplitude: Amplitude difference (μV) between Point A and Point B.

P50 suppression paradigm

The sensory gating paradigm employed paired auditory clicks (S1: conditioning stimulus; S2: test stimulus) at 80 dB nHL presented binaurally through headphones. The S1-S2 interval was 500 ms (inter-stimulus interval), and the ITI was 10 s. The subjects were instructed to passively listen, and maintain central fixation on a crosshair without overt behavioral response. Sixteen artifact-free trials were averaged to generate composite waveforms for analysis purposes. Grand averages for S1 and S2 were computed separately using a weighted averaging algorithm.

The key metrics in the P50 suppression paradigm were as follows.

A1 amplitude: The peak-to-baseline voltage difference (µV) of the P50 component evoked by the S1.
A2 amplitude: The peak-to-baseline voltage difference (µV) of the P50 component evoked by the S2.
S1-P50 latency: The time interval (ms) from the onset of S1 to the peak of A1.
S2-P50 latency: The time interval (ms) from the onset of S2 to the peak of A2.

Statistical analyses

Statistical analyses were performed using SPSS 26.0 (IBM Corp) for parametric and non-parametric analyses, and GraphPad Prism 9.0 for scientific graphing. Chi-square tests were applied to categorical variables. Continuous variables were presented as mean ± standard deviations (SD), while categorical variables were reported as frequencies and percentages. Normality was assessed using the Kolmogorov–Smirnov test [37]. To establish baseline comparability, demographic and clinical characteristics across the three groups (HC, BD II, UD) were formally tested using: (1) χ^2^ tests (or Fisher's exact test) for categorical variables; (2) One-way ANOVA (normally distributed) or Kruskal–Wallis test (non-normal) for continuous variables [38]. Bonferroni post hoc correction was performed for multiple comparisons to reduce the risk of type I error [38]. ANCOVA models (SPSS GLM procedure) were constructed with diagnosis group (UD = 1, BD II = 2) as a fixed factor, and age [39], sex (dummy-coded 0/1) [40], and GAD-7 total score [41] as covariates, based on previous studies suggesting their potential impact on ERP components in depression patients. Pearson correlation (normally distributed variables) or Spearman’s rank order correlation (non-normal variables) analyses were performed to assess associations between ERP components and PHQ-9 total score within the UD and BD II groups separately. Benjamini–Hochberg false discovery rate (FDR) correction was subsequently implemented, with significant associations determined at q < 0.05. In addition to statistical significance, effect size estimates were reported alongside inferential statistics, including standardized β coefficients and odds ratios (OR) with 95% confidence intervals (95% CI) for association analyses. The diagnostic accuracy of the logistic regression model was evaluated by receiver operating characteristic (ROC) curve analysis using GraphPad Prism 9.0. An area under the curve (AUC) > 0.7 was considered clinically meaningful [42]. All analyses used two-sided tests with α = 0.05 defining statistical significance. Exact P-values were reported to three decimal places, with values below 0.001 denoted as P < 0.001. All reported P-values for group comparisons were adjusted for covariates.

Results

Sociodemographic characteristics

A total of 180 participants were enrolled into this study. All subjects were categorized into three groups: BD II (n = 60), UD (n = 60), and HC (n = 60). For the UD, BD II, and HC groups, the mean age was 38.0 ± 13.0, 36.1 ± 11.9, and 34.1 ± 7.6 years (F = 1.757,* P* = 0.176), and the proportion of female sex was 76.7%, 80.0%, and 70.0% (χ^2^ = 1.684, P = 0.431), respectively. Other demographic characteristics, including education, marital status, and socioeconomic status, showed no significant differences across groups (all* P* > 0.05). The total scores of the GAD-7 and PHQ-9 were significantly higher in the BD II and UD groups than in the HC group; the HCL-32 total score was significantly higher in the BD II group than in both the UD and HC groups (all P < 0.05) (Table 1).Table 1. Sociodemographic data for the three groups of participantsSociodemographic dataHC group (n = 60)UD group (n = 60)BD-II group (n = 60)ContrastsP valueAge (year) mean ± SD38.0 ± 13.036.1 ± 11.934.1 ± 7.6F = 1.7570.176Sexχ2 = 1.6840.431 Female46 (76.7%)48 (80.0%)42 (70.0%) Male14 (23.3%)12 (20.0%)18 (30.0%)Educational levelχ2 = 6.2880.392 Illiteracy/primary school7 (11.7%)6 (10.0%)4 (6.7%) Middle school6 (10.0%)14 (23.3%)11 (18.3%) High school10 (16.7%)13 (21.7%)14 (23.3%) Bachelor degree or higher37 (61.7%)27.0 (45.0%)31 (51.7%)Marital statusχ2 = 3.6160.460 Unmarried25 (41.7%)20 (33.3%)25 (41.7%) Married33 (55.0%)33 (55.0%)30 (50.0%) Divorced/widowed2 (3.3%)7 (11.7%)5 (8.3%)Socioeconomic statusχ2 = 0.3930.822 Employed44 (73.3%)45 (75%)42 (70%) Unemployed16 (26.7%)15 (25%)18 (30%)PHQ-9 total score1.1 ± 1.217.6 ± 5.518.8 ± 4.2F = 346.355 < 0.001GAD-7 total score2.7 ± 1.110.6 ± 5.59.7 ± 4.6F = 62.161 < 0.001HCL-32 total score3.7 ± 0.94.1 ± 1.122.4 ± 2.0F = 3238.603 < 0.001**GAD-7* 7 item generalized anxiety disorder, PHQ-9 Patient Health Questionnaire-9, HCL-32 The 32-item Hypomania Checklist, HC Healthy control, UD Unipolar depression, BD-II Bipolar II depression^*^P < 0.05 was statistically significant

Similarities and differences in ERP components among the three groups

Table 2 and Fig. 1 showed similarities and differences in ERP components among the three groups. Compared with HC, both UD and BD II groups showed significantly longer reaction times (HC: 254.4 ± 43.8 ms; UD: 297.7 ± 72.2 ms; BD II: 300.3 ± 70.0 ms; P = 0.028) and larger amplitude of P2-N2 complex in the P300 paradigm (HC: 5.7 ± 4.4 μV; UD: 8.1 ± 4.8 μV; BD II: 8.6 ± 5.6 μV; P = 0.001) (all P < 0.05, Fig. 1a, b), whereas comparisons between the UD and BD II groups revealed no significant differences in these measures.Table 2. Similarities and differences in ERP components among the three groupsERP variablesHC groupUD groupBD-II groupF/HP valuePPost Hoc Pairwise* P*-values (P^a^)HC vs.UDHC vs. BD-IIUD vs. BD-IIMMN MMN latency (ms)218.2 ± 22.5216.6 ± 34.1211.4 ± 29.00.9040.4070.7311.0000.5990.987 MMN amplitude (uV)4.2 ± 3.24.1 ± 3.84.0 ± 3.10.0120.9880.0841.0001.0001.000P300 Button press count31.3 ± 1.630.9 ± 1.930.9 ± 1.41.1620.3150.1610.5910.5451.000 Reaction time (ms)254.5 ± 43.8297.7 ± 72.2300.3 ± 70.09.827 < 0.0010.0280.001* < 0.0011.000 N1 latency (ms)100.1 ± 19.597.6 ± 19.195.8 ± 18.90.7630.4680.7881.0000.6621.000 P2 latency (ms)160.1 ± 23.3162.4 ± 20.9158.0 ± 21.90.6100.5450.8931.0001.0000.813 N2 latency (ms)205.2 ± 16.5213.2 ± 17.3216.2 ± 22.15.4460.0050.0440.0660.0051.000 P3 latency (ms)319.6 ± 18.9315.6 ± 29.4310.4 ± 28.91.8510.1600.4371.0000.1700.862 N1-P2 complex (uV)8.6 ± 4.58.6 ± 3.96.7 ± 4.33.4450.0340.1161.0000.0670.070 P2-N2 complex (uV)5.7 ± 4.48.1 ± 4.88.6 ± 5.65.7570.0040.0010.0350.0051.000 P2-P3 complex (uV)12.0 ± 7.18.6 ± 11.36.6 ± 5.16.1170.0030.0580.0860.0020.596 N2-P3 complex (uV)15.5 ± 6.712.4 ± 8.310.7 ± 7.06.3450.0020.3780.0750.0020.642CNV Reaction time (ms)2178.9 ± 196.02262.3 ± 234.62212.1 ± 193.52.4030.0930.2880.0921.0000.574 Point A latency (ms)416.5 ± 207.5350.1 ± 159.1342.1 ± 196.72.7930.0640.1430.1720.0971.000 Point B latency (ms)1430.6 ± 380.31267.9 ± 397.71330.9 ± 447.62.4090.0930.3310.0930.5521.000 A-B interval (ms)1014.0 ± 383.8974.2 ± 433.11085.1 ± 580.40.8330.4360.8881.0001.0000.611 A-B amplitude(uV)23.3 ± 8.218.7 ± 10.421.5 ± 8.43.8680.0230.1260.1190.8690.306 A-B area (uVs)13.1 ± 8.69.3 ± 7.312.1 ± 9.83.0010.0520.2270.0581.0000.257P50 S1-P50 latency (ms)55.8 ± 12.255.3 ± 13.258.8 ± 11.81.2850.2790.1051.0000.5970.424 S2-P50 latency (ms)57.1 ± 13.450.4 ± 11.163.2 ± 11.56.6430.0020.025*0.1530.287 < 0.001 Amplitude A1 (uV)13.4 ± 8.111.8 ± 7.311.2 ± 5.91.488^a^0.4750.8840.3290.2660.855 Amplitude A2 (uV)6.6 ± 3.27.6 ± 3.88.3 ± 4.83.009^a^0.2220.0940.1610.1200.726MMN Mismatch negativity, CNV Contingent negative variation, ERP Event-related potential, HC Healthy control, UD Unipolar depression, BD-II: Bipolar II depression^^P < 0.05 was statistically significant^**^P < 0.001 was statistically significantP*: P-values derived from analysis of covariance (ANCOVA) adjusting for age, sex, and GAD-7 total scoreP^a^: P-values derived from Bonferroni multiple comparison correctionF value derived from ANOVA test^a^H value derived from Kruskal–Wallis H testFig. 1Differences in event-related potentials among patients with bipolar II depression (BD II), unipolar depression (UD), and healthy controls (HC). Both UD and BD II patients (a) showed larger amplitude of P2-N2 complex compared to HC (b). The latency of S2-P50 in BD II patients (c) was significantly longer than UD patients (d)

In addition, the S2-P50 latency in the BD II group was significantly longer than the UD group (UD: 50.4 ± 11.1 ms vs. BD II: 63.2 ± 11.5 ms; P = 0.025), and the N2 latency in the BD II group was longer than the HC group (BD II: 216.2 ± 22.1 ms vs. HC: 205.2 ± 16.5 ms; P = 0.044) (both P < 0.05, Fig. 1c, d).

Due to non-normal distribution, amplitudes A1 and A2 were compared among the three groups using the Kruskal–Wallis H test. No significant differences were observed in either amplitude (all P > 0.05).

Analysis of ERP indices for discriminating UD and BD II.

As shown in Table 3 and Fig. 2, logistic regression identified S2-P50 latency as a significant discriminator of BD II from UD (β = 0.097, P = 0.020; OR = 1.102, 95% CI:1.059–1.147). ROC analysis demonstrated significant diagnostic utility for differentiating BD II from UD, with an AUC of 0.800 (95% CI: 0.720–0.880; P < 0.001). At the optimal cut-off of 57.9 ms of S2-P50 latency, sensitivity reached 0.800 and specificity 0.717.Table 3. Logistic regression analysis of ERP indices for discriminating UD and BD-IIVariablesβStandard errorWardsPOR95% CI (lower limit, upper limit)PAUCMMN MMN latency (ms)−0.0050.0060.7960.3720.9950.9831.0060.571 MMN amplitude (uV)−0.0020.0990.0000.9870.9980.8231.2120.971P300 Button press count−0.0060.1100.0030.9540.9940.8021.2320.939 Reaction time (ms)0.0010.0030.0400.8421.0010.9951.0060.502 N1 latency (ms)−0.0050.0100.2660.6060.9950.9761.0140.448 P2 latency (ms)−0.0100.0091.2850.2570.9900.9731.0070.319 N2 latency (ms)0.0080.0090.6740.4121.0080.9891.0260.402 P3 latency (ms)−0.0060.0060.9170.3380.9940.9811.0060.427 N1-P2 complex (uV)−0.1130.0505.1780.0230.8930.8100.9840.033ROC = 0.362, P = 0.013 P2-N2 complex (uV)0.0220.0350.3830.5361.0220.9541.0960.969 P2-P3 complex (uV)−0.0290.0251.3460.2460.9720.9251.0200.247 N2-P3 complex (uV)−0.0290.0251.4210.2330.9710.9261.0190.119CNV Reaction time (ms)−0.0010.0011.5910.2070.9990.9971.0010.168 Point A latency (ms)0.0000.0010.0620.8041.0000.9981.0020.978 Point B latency (ms)0.0000.0000.6680.4141.0001.0001.0010.283 A-B interval (ms)0.0000.0001.3750.2411.0001.0001.0010.183 A-B amplitude (uV)0.0320.0212.3540.1251.0330.9911.0760.133 A-B area (uVs)0.0390.0232.8160.0931.0390.9941.0880.090P50 S1-P50 latency (ms)0.0220.0152.1020.1471.0220.9921.0530.212 S2-P50 latency (ms)0.0970.02023.071< 0.0011.1021.0591.147< 0.001ROC = 0.800, P < 0.001** Amplitude A1 (uV)−0.0140.0280.2580.6120.9860.9331.0420.389 Amplitude A2 (uV)0.0390.0420.8470.3581.0400.9571.1300.563CI Confidential interval, MMN Mismatch negativity, CNV Contingent negative variation, ERP Event related potential, UD Unipolar depression, HC Healthy controls, BD-II: Bipolar II depression, AUC Area under the receiver operating characteristic (ROC) curve^^P < 0.05 was statistically significant^**^P < 0.001 was statistically significantP*: P-values derived from Logistic regression analysis adjusting for age, sex, and GAD-7 total scoreFig. 2Univariate and ROC analyses for differentiating bipolar II depression (BD II) from unipolar depression (UD). a MMN and P300 components for differentiating BD II from UD: Univariate odds ratios. b CNV and P50 components for differentiating BD II from UD: Univariate odds ratios. c ROC curve analysis of S2-P50 latency for differentiating BD II from UD (AUC = 0.801)

Bivariate logistic regression analysis revealed that reduced amplitude of the N1-P2 complex was associated with BD II diagnosis (β = − 0.113, P = 0.023; OR = 0.893, 95% CI: 0.810–0.984). In contrast, ROC curve analysis demonstrated no diagnostic capacity for this measure, yielding an AUC of 0.362 (95% CI: 0.257–0.468).

No other ERP parameters showed statistically significant differences between BD II and UD in adjusted logistic regression models (all P > 0.05).

Correlation between ERP components and PHQ-9 total score

Pearson correlation analyses revealed no significant associations between any ERP component and PHQ-9 total score in either the UD group (q = 0.44–0.98, |r|= 0.01–0.25) or BD II group (q = 0.28–0.98, |r|= 0.01–0.39) following FDR correction.

Discussion

Our study found that BD II patients exhibited prolonged S2-P50 latency compared to UD patients. To our knowledge, this study provides novel evidence that S2-P50 latency may assist in differentiating BD II from UD in clinical practice. P50 is an effective tool to measure sensory gating function [43], which reflects a neural filtering mechanism that suppresses redundant or non-salient sensory inputs without conscious effort [44]. Sensory gating deficits manifest as excessive influx of irrelevant sensory information, impaired selective attention and information processing, alongside cognitive decline [45]. One study reported prolonged S2-P50 latency in schizophrenia patients, which correlated positively with cognitive impairment [46]. Another study proposed that P50 latency could serve as a clinical biomarker for identifying individuals at elevated risk of measurable cognitive decline [47]. Our study indicates that BD II patients exhibit more severe sensory gating deficits than UD patients, as evidenced by prolonged S2-P50 latency. Given the scarcity of historical research on P50 latency in BD II and UD, future studies are needed to validate the utility of S2-P50 latency in distinguishing between BD II and UD.

Our analysis confirmed that S2-P50 latency demonstrated significant diagnostic value for differentiating BD II from UD. The ROC-derived AUC of 0.800 (95% CI: 0.720–0.880) exceeded the threshold for clinically useful discrimination (AUC > 0.70) [42]. At the optimal cut-off of 57.9 ms — determined by optimizing sensitivity and specificity trade-offs — this measure achieved a sensitivity of 0.800 (correctly identifying 80% of BD II cases) and a specificity of 0.717 (correctly excluding 71.7% of UD cases). These findings suggested that prolonged S2-P50 latency may reflect underlying neurophysiological distinctions between BD II and UD.

In the P300 paradigm, both BD II and UD showed significantly prolonged reaction time compared to healthy controls; although no significant difference was observed between the BD II and UD groups. These findings suggest that impairments in two stages of information processing — stimulus assessment and motor reaction initiation — might represent a core cognitive deficit during the depressive phase across the mood disorder spectrum. This slowed information processing might be related to altered global connectivity within the frontal-parietal cognitive control network, which has been linked to depressive symptoms [48]. Consequently, these impairments might partly account for the psychomotor retardation characteristic of depressive states, affecting both BD II and UD. Moreover, the lack of significant differences between BD II and UD in this paradigm supported the notion that impairment in basic cognitive processing (indexed by P300 reaction time) was predominantly mediated by the depressive state, rather than diagnostic categories (BD II vs. UD).

Our study found that larger P2-N2 amplitudes in both the UD and BD II groups compared to healthy participants. P2 indexes higher-order cognitive mechanisms extending beyond sensory encoding, whereas N2 corresponds to attentional allocation and conflict monitoring processes [49] and the amplitude of P2-N2 complex may reflect a continuum from sensory integration to cognitive control, and abnormalities may indicate broader cognitive dysfunction. These findings suggest that both UD and BD II patients have severe cognitive impairments. However, the lack of significant difference between BD II and UD was inconsistent with previous studies [50], which may suggest similar cognitive impairment between BD II and UD patients. Heterogeneity in study populations across different studies may partly account for this discrepancy. Most studies fail to rigorously distinguish between depressive episodes and other illness phases (e.g., manic or mixed states) in bipolar II disorder, potentially overestimating the specificity of impairments during acute depressive episodes [49]. More studies are necessary to expand the knowledge about the clinical and neuropathological significance of abnormalities in the amplitude of P2-N2 complex in UD and BD II patients.

In addition, significantly prolonged N2 latency was observed in BD II patients compared with healthy controls. This prolonged N2 latency was correlated with impairments of conflict inhibition [51], indicating potential dysfunction in executive control mechanisms in BD II patients. Although a prior study observed prolonged N2 latency in bipolar disorder, critical details – such as bipolar subtype specification – were not reported, thereby limiting direct comparability with our findings [49]. Therefore, the generalizability of our findings requires further validation due to heterogeneity across existing research in study populations and task paradigms.

Limitations

It is necessary to consider the following limitations when interpreting the findings. First, recruiting participants exclusively from Sichuan Province, may limit the generalizability of the findings to populations with diverse ethnic and geographical backgrounds across China. Therefore, future larger-scale, multi-center studies with ethnically diverse samples are warranted to enhance the reliability and validity of the findings. In addition, the ERP paradigms used in the study may not fully reflect the complex cognitive functions observed in real-world scenarios. Therefore, future research should aim to develop and utilize ERP paradigms with higher ecological validity to better capture cognitive processes in naturalistic settings. Furthermore, we utilized self-report scales as the primary instrument to assess symptom severity. In future investigations, clinician-rated scales, such as the Hamilton Anxiety Rating Scale (HAMA) and Hamilton Depression Rating Scale (HAMD), should be used to enhance the reliability and validity of the findings.

Conclusions

This study may identify neurophysiological distinctions between BD II and UD depression, notably a prolonged S2-P50 latency in BD II.