Classifying irritable bowel syndrome using spatio-temporal graph convolution networks on brain functional MRI data

Jiazhen Wu; Shuxin Zhuang; Zhemin Zhuang; Liangqiong Qu; Lei Xie; Mengting Liu

PMC · DOI:10.1093/braincomms/fcag062·February 27, 2026

Classifying irritable bowel syndrome using spatio-temporal graph convolution networks on brain functional MRI data

Jiazhen Wu, Shuxin Zhuang, Zhemin Zhuang, Liangqiong Qu, Lei Xie, Mengting Liu

PDF

Open Access

TL;DR

This study uses brain MRI data and a new deep learning model to classify irritable bowel syndrome with high accuracy and identify key brain regions involved in the condition.

Contribution

A spatio-temporal graph convolution network with an interpretability module for IBS classification and biomarker identification.

Findings

01

The model achieved 83.51% accuracy in classifying IBS patients versus healthy controls.

02

Five brain regions were identified as important for IBS classification, consistent with prior literature.

03

External experiments validated the effectiveness of the interpretability module in identifying relevant brain regions.

Abstract

Irritable bowel syndrome (IBS) is a functional gastrointestinal disorder marked by abdominal pain and changes in stool consistency or frequency. Recent studies have explored the link between IBS and alterations in brain networks using functional MRI. Despite these efforts, an effective diagnostic or predictive model for IBS remains elusive. This shortfall is twofold: firstly, the sample sizes in these studies are typically small, and secondly, the machine learning or deep learning models currently in use fail to adequately detect the subtle and dynamic pathological changes present in functional MRI data for IBS. In this study, we extracted rs-functional MRI of 79 subjects with IBS and 79 healthy controls, then put them into spatio-temporal graph convolution network (ST-GCN) for classification. We also incorporated a novel interpretability module into this model to identify potential…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases4

Irritable bowel syndrome gastrointestinal disorder IBS abdominal pain

Figures6

Click any figure to enlarge with its caption.

Overall structure of the ST-GCN: our overall network architecture consists of three ST-GC blocks with different channel numbers and a final fully connected layer. Figure (A) illustrates the internal structure of the ST-GC block, while figures (B), (C), and (D) respectively depict the detailed network architectures of MSGCN, MSG3D, and MSTCN within the ST-GC block. In the figure, c1, c2, and c3 denote the channel sizes of the three blocks, MLP denotes a multilayer perceptron head, Conv2d and BatchNorm2d indicate standard convolution and normalization layers, respectively.

Interpretability module overview: this figure illustrates the necessary inputs for the interpretability module and the explanation process. In the figure, c1, c2 and c3 denote the channel sizes of blocks 1, 2 and 3, respectively.

Classification accuracy across window sizes: this figure presents the scatter distribution and standard deviation of average classification accuracy under different window sizes. When the window size reached 140, we achieved the optimal classification accuracy of 83.51%. Each individual data point represents the classification accuracy obtained from one fold of the five-fold cross-validation, computed at the subject level (N = 158 subjects in total; 79 IBS patients and 79 HC). For each window size, five data points correspond to the five cross-validation folds, and the central marker denotes the mean accuracy across folds, with error bars indicating the standard deviation (STD).

ROC curves of eight deep learning models: for the eight deep learning models, we plotted the optimal classification accuracy for each and found that the ST-GCN model exhibited the best classification performance. ROC curves were generated based on subject-level predictions from the test folds of five-fold cross-validation (N = 158 subjects). AUC denotes the area under the ROC curve, reflecting the model's overall discriminative ability.

Brain region importance scores (Automated Analytical Labelling; AAL90 atlas): the importance score for each brain region was derived from the averaged feature-weight contributions across five cross-validation folds, using the optimal window size (140). Each data point represents the aggregated importance score of a given brain region across all subjects (N = 158), obtained by averaging region-wise scores over folds and subjects.

Visualization of important cortical and subcortical regions: this figure presents the visualization of important brain regions, with abbreviations for significant areas indicated in the figure; (A) represents crucial brain regions located on the cerebral cortex, while (B) denotes regions situated beneath the cerebral cortex. Cortical and subcortical visualizations represent the relative importance scores derived from the interpretability module, averaged across all subjects (N = 158) and five cross-validation folds. Warmer colours indicate higher relative importance in distinguishing IBS patients from healthy controls.

Tables9

Table 1. Demographics information of the IBS dataset used in this study

Protocols	IBS patients (n = 79)	HCs (n = 79)	P value
Gender (male/female)	54/25	50/29	0.502^①
Age (years)	32.33 ± 9.38	32.48 ± 9.29	0.919^②
Education (years)	13.35 ± 2.97	13.97 ± 2.79	0.179^②

Table 2. Classification performance of deep learning models (STD = standard deviation)

Model	Acc mean(%) (STD)	Sens mean(%) (STD)	Spec mean(%) (STD)	Prec mean(%) (STD)	F1-Score mean(%) (STD)
GCN	61.98(0.014)	57.42(0.010)	66.82(0.020)	62.78(0.017)	0.600(0.068)
GIN	61.74(0.018)	52.58(0.035)	69.58(0.022)	64.82(0.032)	0.580(0.045)
GAT	58.70(0.009)	56.18(0.015)	61.38(0.022)	60.06(0.010)	0.580(0.028)
BrainGNN	61.25(0.032)	47.69(0.191)	53.33(0.163)	79.26(0.088)	0.596(0.019)
BolT	71.28(0.106)	67.61(0.197)	75.18(0.213)	77.16(0.152)	0.720(0.050)
TodyNet	55.63(0.013)	86.67(0.125)	23.94(0.097)	45.60(0.053)	0.597(0.070)
TapNet	65.00(0.046)	90.00(0.109)	40.00(0.188)	61.24(0.065)	0.729(0.022)
ST-GCN (ours)	83.51(0.053)	84.83(0.029)	81.00(0.112)	84.16(0.026)	0.845(0.013)

Table 3. The results of machine learning comparative experiments

Classifier	Average accuracy	Best accuracy	Reduction
GaussianNB	0.4984	0.5688	MDS
RandomF	0.5789	0.6625	LLE
anova_svcl1	0.5297	0.6313	LLE
anova_svcl2	0.5348	0.6438	LLE
knn	0.5699	0.6500	FA2/PCA/PCA2
logistic_l2	0.5645	0.6438	LLE2
neural10	0.5809	0.7000	LLE2
neural10a	0.5855	0.6438	LLE
neural5	0.5695	0.7000	LLE2
neural5a	0.5148	0.6250	FA2
ridge	0.5621	0.6438	LLE2
svc_l1	0.5590	0.6500	LLE2
svc_l2	0.5711	0.6438	LLE2

Table 4. The results of ablation experiments

Ablated submodule	Acc mean(%) (STD)	Sens mean(%) (STD)	Spec mean(%) (STD)	Prec mean(%) (STD)	F1 mean(%) (STD)
MS-GCN	81.71(0.072)	83.67(0.074)	79.83(0.127)	81.61(0.087)	82.62(0.065)
MS-G3D	81.67(0.045)	88.58(0.073)	74.83(0.103)	78.51(0.061)	83.24(0.034)
MS-TCN-1	81.65(0.023)	81.00(0.105)	82.33(0.133)	83.86(0.075)	82.40(0.072)
MS-TCN-2	82.28(0.025)	82.17(0.077)	82.33(0.072)	83.03(0.053)	82.60(0.065)
MS-TCN-3	79.78(0.023)	79.66(0.051)	79.83(0.091)	80.80(0.068)	80.22(0.022)

Table 5. The results of different hyperparameters

gcn_scales	Acc mean(%) (STD)	g3d_scales	Acc mean(%) (STD)	tcn_scales	Acc mean(%) (STD)
1	77.82(0.053)	1	77.82(0.022)	1	78.49(0.053)
2	81.61(0.026)	2	77.80(0.042)	2	75.95(0.043)
3	72.12(0.027)	3	77.84(0.057)	3	76.61(0.041)
4	79.15(0.041)	4	79.74(0.068)	4	83.51(0.053)
5	77.90(0.054)	5	79.11(0.033)	5	81.01(0.045)
6	76.57(0.076)	6	78.45(0.059)	6	75.97(0.036)
7	77.82(0.030)	7	72.78(0.033)
8	83.51(0.053)	8	83.51(0.053)

Table 6. Classification performance of top 20 significant regions and random 20 regions

Group	Acc mean(%) (STD)	Sens mean(%) (STD)	Spec mean(%) (STD)	Prec mean(%) (STD)	F1 mean(%) (STD)
Random 1	75.30(0.043)	77.17(0.034)	73.25(0.098)	75.15(0.064)	76.14(0.053)
Random 2	75.28(0.074)	78.75(0.151)	72.08(0124)	74.84(0.095)	76.74(0.023)
Random 3	74.58(0.083)	70.92(0.187)	78.75(0.146)	80.00(0.130)	75.18(0.078)
Random 4	75.30(0.059)	77.50(0.129)	73.42(0.114)	75.54(0.086)	76.51(0.016)
Random 5	73.95(0.086)	74.50(0.087)	73.25(0.126)	74.46(0.110)	74.48(0.059)
Top 20	78.51(0.057)	81.25(0.040)	75.67(0.082)	77.73(0.067)	79.46(0.090)

Table 7. Top 10 significant regions obtained by interpretability module and PBM

Ranking	Interpretability module	PBM
1	IPL.R	OLF.R
2	ORBinf.R	PAL.R
3	PCG.R	PCL.L
4	ORBmid.R	REC.R
5	ORBsupmed.L	DCG.L
6	PCL.R	ORBinf.R
7	ORBsup.R	PHG.R
8	CAU.R	PHG.L
9	REC.R	THA.L
10	ACG.R	IPL.R

Table 8. Top 10 significant regions’ correlation with node attributes

Regions	IPL.R	ORBinf.R	PCG.R	ORBmid.R	ORBsupmed.L
Node degree	√	√		√	√
Closeness centrality	√	√		√	√
Clustering coefficient	√	√	√	√

Table 9. Regions correlation with node attributes identified by PBM

Regions	OLF.R	PAL.R	PCL.L	REC.R	DCG.L
Node degree	√				√
Closeness centrality		√		√
Clustering coefficient	√		√		√

Equations2

Funding3

—Shenzhen Medical Research Fund
—National Natural Science Foundation of China10.13039/501100001809
—Shenzhen Science and Technology Innovation Program10.13039/501100017610

Keywords

irritable bowel syndrome ST-GCNresting-state functional magnetic resonance imaginginterpretability module

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGastrointestinal motility and disorders · Functional Brain Connectivity Studies · Machine Learning in Healthcare

Full text

Introduction

Irritable bowel syndrome (IBS) is a functional gastrointestinal disorder characterized by abdominal pain coupled with alterations in stool form or frequency. The prevalence of the disease ranges from 5% to 10%, primarily affecting young and middle-aged individuals. For most patients, IBS commonly manifests as a cycle of recurrence and remission.^1-3^ Owing to its persistent and intermittent nature, IBS can significantly impact patients’ quality of life and occupational productivity.^4^

Majority of studies have supported that IBS patients have high level of neuroticism and exhibit neurological deficiencies.^5,6^ IBS is therefore considered as a new pathological approach as a brain-gut axis disorder.^5^ In recent years, many studies have investigated the association between IBS and various brain impairments. Based on the findings of the survey, the majority of collected articles indicate that patients with IBS often experience long-term chronic pain. This might result in the alterations in functional activity of various brain regions, along with the atrophy of grey matter and the compromised integrity of white matter in the brain.^7^

Functional magnetic resonance imaging (fMRI) stands as a pivotal modality for unravelling the intricacies of cerebral function, comprising two primary approaches: resting-state fMRI and task-based fMRI. Task-based fMRI examines the brain's activity in response to specific stimuli or cognitive challenges; however, it often incorporates confounding variables such as fluctuations in participant motivation and variations in task engagement. In contrast, resting-state fMRI presents a distinctive advantage by delineating the brain's inherent connectivity patterns within a non-demanding context. For the above reasons, resting-state functional magnetic resonance imaging has been applied in the investigation of a wide range of brain disorders such as autism, Alzheimer's disease and alcoholism. Due to its outstanding spatio-temporal properties, resting-state fMRI has demonstrated remarkable utility across diverse research domains, including neural function localization,^8^ brain network delineation,^9^ and exploration of implicated brain regions in various neurological disorders.^10^ In recent years, many researchers have started utilizing resting-state fMRI to explore the brain function of IBS. A review in 2022 summarized 22 studies conducted on IBS disease using fMRI data analysis.^3^ This review highlighted alterations in brain networks in fMRI among IBS patients, predominantly concentrating in regions implicated in visceral sensation, emotional processing and pain modulation, ascertained through a comprehensive summary of previous literature. These studies show that the chronic effects of IBS lead to changes in brain function in patients that differ from those observed in healthy individuals. This serves as a pivotal basis for the application of machine learning and deep learning in the classification of IBS using fMRI data from the brain.

In recent years, a surge in the application of supervised machine learning approaches has been observed in the classification and analysis of IBS. Researchers investigated changes in the postcentral cortex of patients with IBS by extracting the regional homogeneity (ReHo) of the postcentral cortex and the average ReHo of all voxels in the whole brain, along with the seed-based correlation based on the four seed regions: left postcentral cortex, right postcentral cortex, left insula and right insula, as inputs to a support vector machine (SVM) classifier. The classification results showed excellent performance of the ReHo of postcentral cortex and seed-based correlation map (AUC > 0.7).^11^ A project classified IBS and control groups using the resting-state functional connectivity (rs-FC) of the habenula-dlPFC and habenula-thalamus as input features for SVM classifier, achieving a classification accuracy of 71.5%.^12^ In general, the application of machine learning models for the classification of IBS does yield some effectiveness, but it also encounters many challenges. For instance, in machine learning models, researchers often rely on predefined specific brain regions or functional connectivity between them as features for the classification model. However, as the brain is an integrated whole, its various parts exhibit complex interactions, thus relying solely on the variations in a few brain regions as criteria for judgment is evidently insufficient. Furthermore, previous studies have been limited by small sample sizes, with the largest sample database involving only 112 participants, including both IBS and control.^12^ Additionally, it is apparent that the accuracy results obtained from machine learning are relatively low, failing to accurately differentiate between IBS patients and control groups.

Deep learning's principal divergence from traditional machine learning lies in the process of feature extraction. While conventional machine learning necessitates the manual extraction of features considered to significantly impact classification results, deep learning follows an autonomous feature extraction approach. In recent years, with more abundant computational resources available, deep learning models that gradually abstract features from input data through multi-layer neural network structures have been increasingly applied to the classification of various mental disorders, such as autism spectrum disorder,^13-17^ mild cognitive impairment,^18,19^ and attention deficit hyperactivity disorder,^20^ and many other brain diseases.^21,22^ Recent studies have employed deep learning to analyse gastrointestinal disorders through visual data. A study collected 938 images of human faeces from anonymous sources to investigate changes in faecal morphology caused by functional gastrointestinal disorders, including IBS, chronic constipation, and chronic diarrhoea. They used ResNet to classify these images into three categories: “constipation”, “normal” and “diarrhea”, achieving an average accuracy of 74.26% on a test set containing 272 images.^23^ Another one collected colonoscopy images from IBS patients (n = 35) and asymptomatic healthy subjects (n = 88) and employed Google Cloud Platform AutoML Vision, resulting in a positive predictive value, precision, and recall of the model being 81.2%, 72.6% and 72.6%, respectively. Unfortunately, to the best of our knowledge, there is currently no research on classifying IBS using resting-state fMRI data, which is also the driving force behind our study.^24^

Given that fMRI data contains rich temporal and spatial information, it is noteworthy that functional connectivity exist not only spatially between brain regions but also manifest through temporal fluctuations.^25,26^ To comprehensively characterize the spatio-temporal information of the brain, we have chosen to use the spatio-temporal graph convolutional network (ST-GCN) as the classification model. Compared to other deep learning models that primarily focus on static analysis (i.e. using static data such as functional connectivity matrices or graph theory attributes as inputs), ST-GCN can extract features sequentially from spatial and temporal dimensions, allowing it to focus on the dynamic development of the disease throughout the entire fMRI process. It better leverages the advantage of fMRI data that having a temporal dimension compared to other data format.

Although the earlier paragraph outlined several advantages of deep learning compared to traditional machine learning, due to their non-linear underlying structures, most deep learning models are commonly labelled as “black boxes”, and the interpretability and reliability of models’ predictions remain unclear. In the clinical settings, the absence of interpretability in decision-making processes is subject to critique, leading to a relatively low practical acceptance of deep learning methods in the medical field. Recently, various interpretable methods have been introduced for deep learning, such as post-hoc saliency-based methods like SHAP,^27^ gradient-weighted methods like grad-CAM,^28^ and layer-wise backpropagation techniques,^29^ etc. For example, in previous experiments involving the classification of fMRI data, BrainGNN added an interpretability module to its model, identifying the important brain regions that the model believed had the greatest impact on classification results.^16^ However, the interpretability module of BrainGNN is not suitable for ST-GCN because it fails to account for the dynamic changes in functional connectivity over time. In this study, in order to identify brain regions that have undergone functional changes due to long-term effects of IBS, we independently designed a new interpretability module based on the construction of the ST-GCN model to adapt to our own classification experiments. Ultimately, we identified brain regions highly influenced by the IBS disease.

In summary, to address the aforementioned requirements and concerns, this study employs a ST-GCN to classify IBS using a dataset of 79 IBS patients and 79 control subjects. It should be clarified that the fMRI data included in this dataset is resting-state fMRI and is cross-sectional in nature. This represents a significant increase in sample size compared to previous studies. Additionally, we have integrated an interpretability module specifically designed for ST-GCN. This module aims to fully utilize the temporal and spatial data in the rs-fMRI, with the goal of identifying brain regions that are clinically significant in detecting IBS impairments in the human brain.

Materials and methods

Participants

This study included a total of 158 subjects [79 patients with IBS-D and 79 healthy controls (HC)], 79 patients diagnosed with IBS-D (abbreviate as IBS below) were enrolled in the Department of Gastroenterology of the First Affiliated Hospital of Shantou University Medical College from September 2021 to May 2023.Patients meeting the Rome IV diagnostic criteria for IBS-D, possessed an educational background of at least 9 years, fell within the age range of 18–50 years, and had not utilized antibiotics, selective serotonin reuptake inhibitors, or opioids within the preceding 3 months were included. Exclusion criteria: (1) previous history of organic gastrointestinal diseases such as inflammatory bowel disease, colon polyp and gastrointestinal surgery; (2) people who are allergic or have taken probiotics, painkillers and other drugs in the past half month; (3) brain imaging examination found craniocerebral lesions or any neuropsychiatric diseases related to cognitive dysfunction and infectious or metabolic diseases (such as Alzheimer's disease, multiple sclerosis, epilepsy, diabetes, etc.); (4) people with serious mental spectrum diseases; (5) severe acute/chronic cardiac, liver and renal dysfunction; (6) patients with contraindications to MRI scanning. Two rheumatologists selected the patients based on above criteria. Finally, three patients were excluded for incomplete MRI data collection because they were unable to cooperate with MRI scans, and one patient was excluded from this study due to head motion angle exceeding 2° during resting-state fMRI data preprocessing analysis. Thus, 79 IBS patients were included in the final analysis; 79 HC matching the IBS-D group in age, sex and education level were enrolled at the same time.

We conducted inter-group comparisons of baseline demographic characteristics for all participants to ensure adequate comparability between the patient and control groups (Table 1). Specifically, continuous variables (e.g. age, years of education) were compared using independent samples t-tests and are presented as mean ± standard deviation. Categorical variables (e.g. gender) were analysed using Pearson's chi-square test. The detailed results of these statistical tests, including the specific test statistic values (t-value for t-tests, χ2 for chi-square test) and degrees of freedom where applicable, are fully reported in Table 1. All statistical analyses were performed with SPSS software, with the significance level set at P < 0.05. As shown in Table 1, no statistically significant differences were observed between the IBS patients and HC in age, gender, or education level (all P > 0.05), indicating well-matched demographic characteristics between the two groups.

This study was approved by the Ethics Committee of the First Affiliated Hospital of Shantou University Medical College (No. B-2021-235). All subjects were informed of relevant research matters and signed written informed consent.

Image data acquisition and preprocessing

The MRI data were performed using GE 1.5T MR scanner at The First Affiliated Hospital of Shantou University Medical College, including a T1-weighted structural scan and a 6-minute rs-fMRI: (1) High-resolution structural images used 3D T1-weighted Fast spoiled gradient recalled (FSPGR) sequences with repetition time (TR) = 1600 ms; echo time (TE) = 5.1 ms; FA = 20°; matrix size = 256×256; field of view (FOV) = 256 mm × 256 mm; slice thickness = 1.3 mm and interleaved 244 slices; (2) The rs-fMRI images were obtained using a T2* weighted gradient echo—echo plane imaging technique (GRE-EPI sequence),the imaging parameters were as follows: TR = 2000ms, TE = 45ms, FA = 90°, matrix size = 64×64, FOV = 250mm × 250mm, slice thickness = 6 mm, 20 interleaved axial slices, acquisition time = 6 min, finally, each fMRI scan obtained 180 time points.

The rs-fMRI data was preprocessed using the GRETNA toolbox (http://www.nitrc.org/projects/gretna/). The preprocessing steps including (1) eliminate the first 10 time point images; (2) slice timing correction; (3) head movement correction: data of subjects whose translation ≥2 mm and rotation ≥2° were excluded; (4) spatial standardization: spatial normalization was performed in three sequential steps. First, the motion-corrected functional images were co-registered to each subject's high-resolution T1-weighted anatomical image. Second, the individual T1-weighted image was nonlinearly normalized to the Montreal Neurological Institute (MNI152) standard template space. Finally, by applying the resulting transformation parameters to the co-registered functional data, the fMRI images were warped into MNI152 space and resampled to an isotropic voxel size of 3× 3 × 3 mm; (5) regression covariates: regression interference parameters to remove the influence of covariates such as noise and cerebrospinal fluid; (6) spatial smoothing: the space smoothing process is carried out by Gaussian kernel with half height and full width of 6mm; (7) the regression of nuisance covariates was performed to remove noisy signals from the CSF and WM, then linear detrended and filtered within the frequency 0.01–0.1Hz were conducted .

After preprocessing the image data, we used Automated Analytical Labeling to determine 90 ROIs and extracted the average BOLD signal time series of all ROIs, and conduct Pearson correlation analysis one by one. In order to make the FC values follow a more normal distribution, we further processed the data using the Fisher-z transform. Dynamic functional connectivity (dFC) matrices were generated using a sliding window approach. The resulting dFC sequences for each subject were directly fed into the Spatio-Temporal Graph Convolutional Network (ST-GCN) for end-to-end feature learning and classification. This pipeline retains the full temporal dynamics of the data, allowing the model to automatically capture and learn intrinsic spatiotemporal patterns.

Network architecture of ST-GCN

The overall network structure is clearly illustrated in Fig. 1, where three ST-GC blocks with different channel sizes were concatenated to enhance the extraction of spatial-temporal characteristics. The features convolved through three ST-GC blocks will be fed into average pooling and fully connected layers to obtain the final classification results.

Overall structure of the ST-GCN: our overall network architecture consists of three ST-GC blocks with different channel numbers and a final fully connected layer. Figure (A) illustrates the internal structure of the ST-GC block, while figures (B), (C), and (D) respectively depict the detailed network architectures of MSGCN, MSG3D, and MSTCN within the ST-GC block. In the figure, c1, c2, and c3 denote the channel sizes of the three blocks, MLP denotes a multilayer perceptron head, Conv2d and BatchNorm2d indicate standard convolution and normalization layers, respectively.

Definition of spatio-temporal graph

We consider the time series of each participant as a spatio-temporal graph $[eqn]$ . From a spatial perspective, it is regarded as a spatial graph $[eqn]$ at each time point, the nodes of which are fully connected, each node represents a brain region, with the node features being the blood oxygen concentration signals of the corresponding brain regions at that time. The edge weights of the spatial graph $[eqn]$ are consistently constructed across all participants, the time series data of all the participants are concatenated along the time dimension, resulting in a long time series for each brain region. Then this long time series was utilized to calculate the Pearson's correlation between brain regions according to brain atlas Automated Analytical Labeling-90. In other words, the edge weight matrix obtained through this method is not influenced by time. Therefore, during our temporal convolution operations, the graph's edges are not taken into account; instead, we rely solely on the features of the nodes over time. This adjacency matrix is then utilized in multi-scale spatial graph convolution operators to incorporate spatial information into the graph.

Spatial-temporal graph convolution (ST-GC) block

Based on a study by Gadgil,^30^ the formulation of a ST-GC block is as follows:

[eqn]

where $[eqn]$ represents a node, t denotes the time point, and T represents the width of the time window. Subscript i represents the number of brain regions and N represents the total number of brain regions in the brain atlas. $[eqn]$ represents a normalization factor. $[eqn]$ denotes the set of neighbouring nodes to $[eqn]$ , encompassing all adjacent nodes both in space and time domain. $[eqn]$ denotes the neighbouring nodes of $[eqn]$ . $[eqn]$ represents the input node features, where $[eqn]$ denotes the number of input node feature channels, and $[eqn]$ represents the output node features, $[eqn]$ denotes the number of output node feature channels. Through the formulation (1), the features of the nodes within $[eqn]$ are sequentially convolved by spatial convolutional kernel $[eqn]$ and temporal convolution kernel $[eqn]$ in turn, finally aggregating the features of the neighbouring nodes in the set $[eqn]$ .

Fig. 1(A) illustrates the internal structure of a ST-GC Block, where the input spatio-temporal graph data passes through two ways: one performing sequential multi-scale spatial graph convolutions and multi-scale temporal convolutions, and the other conducting spatial-temporal fusion convolutions. Subsequently, after feature fusion and a multi-scale temporal convolution module, the data flows into the next ST-GC block. Fig. 1(B) demonstrates the structure of multi-scale spatial graph convolution, where we compute eight scales of adjacency matrices based on an original adjacency matrix A, representing the relationships between different brain regions at various distances. After obtaining these adjacency matrices, we perform multi-scale graph convolution on the input spatial graph and output the results. Fig. 1(C) illustrates the process of spatial-temporal fusion convolutions. The structure of this block is similar to multi-scale spatial graph convolution, after multi-scale graph convolutions, the data undergoes a 3D convolution block to further fuse spatio-temporal information. Fig. 1(D) showcases the structure of multi-scale temporal convolutions, where features at different time scales are integrated by setting different dilation values.

Multi-scale spatial graph convolution network (MSGCN)

The MSGCN consists of modules for graph convolution with input data and adjacency matrices at multiple scales, as well as multilayer perceptron. The classic spectral graph convolution method was applied for multi-scale convolution of the spatial graph, referred as $[eqn]$ , using a specific convolution kernel $[eqn]$ . To minimize computational complexity, we adopted the strategy outlined in the GCN, which involves using the Chebyshev Polynomials Approximation. This approach limits the convolution kernel to polynomials of the eigenvalues, denoted by $[eqn]$ , of the Laplacian matrix corresponding to the edge weight matrix W. This Laplacian matrix has been normalized through the Symmetrically Normalized Graph Laplacian technique to produce a symmetric matrix. The filtering operation is represented as the multiplication of the kernel and the signal transformed by the graph Fourier transform, followed by altering the output channel number through a multi-layer perceptron. Meanwhile, by adjusting the number of terms in the polynomial, i.e. the highest degree of $[eqn]$ in the polynomial, we modify the spatial graph convolution's receptive field (in this experiment, 8 different spatial scales are utilized) to enable more extensive range and deeper level information propagation and feature aggregation, thereby enhancing the feature extraction and representation capability in the GCN. The formulation of the spatial graph convolution transformed into spectral graph convolution is as follows:

[eqn]

where $[eqn]$ represents the input features and its dimensions, $[eqn]$ represents the output features and its dimensions, $[eqn]$ denotes the matrix of eigenvectors of the normalized graph Laplacian $[eqn]$ , $[eqn]$ is the identity matrix with dimensions consistent with the number of brain regions, and D is the degree matrix. Through the spectral graph convolution, we have altered the channel numbers of the features of nodes within different neighbourhood ranges in the input spatial graph, enabling a richer integration of information from adjacent nodes within the spatial domain.

Multi-scale temporal convolution network (MSTCN)

MSTCN is composed of multi-scale time-dimensional convolution modules and common 2D convolution modules. Following the adjustment of the node feature channel numbers through multi-scale spatial convolution, we perform temporal convolution on the convoluted graph at multiple scales. We utilize one-dimensional convolutional kernels to convolve the feature vectors from every single brain region along the temporal dimension. At each scale, we modify the dilation rates or the size of the temporal kernel to progressively widen the field of view of the temporal domain convolutional kernel along the temporal dimension. Additionally, we incorporate residual connections to address the gradient vanishing problem and improve training efficiency. Finally, by concatenating the results of multiple temporal convolutions at different scales, we obtain a representation of fMRI that integrates multi-scale spatial features and multi-scale temporal features.

Multi-scale Spatio-temporal Convolution Network (MSG3D)

MSG3D is a module for spatio-temporal feature fusion. In addition to sequential convolutions in both spatial and temporal domains on fMRI data, the model also incorporates spatio-temporal fusion convolutions on fMRI data. This block replicates and concatenates the original adjacency matrix A to generate a $[eqn]$ adjacency matrix, and the undergoes the same multi-scale operations as in MSGCN. This process yields multi-scale adjacency matrices shared by adjacent three time points, enabling the aggregation of spatial graphs from the three time points for graph convolutions. Subsequently, a 3D convolution is applied to the feature matrix after multi-scale spatio-temporal graph convolutions to further integrate the spatio-temporal information from these three time points.

Training details and hyperparameters

In this project, we conducted a five-fold-cross-validation on rs-fMRI from 79 IBS patients and 79 control subjects to classify the disease status. In our five-fold cross-validation experiment, we allocated 60% of the data as training data, 20% as validation data, and the remaining 20% as testing data. By inspiration of a study by Gadgil,^30^ we experimented with different window sizes, ranging from 30 to 160, to explore the optimal classification results. The window size refers to the dynamic sliding window analysis, with the unit in TRs. We fixed the step size to 1 and randomly selected the starting position of the sliding window in each experiment to ensure a consistent number of windows across all trials. To enhance the model's generalization ability, we employed a voting mechanism during both model training and testing: We divided each subject's rs-fMRI data into multiple time segments based on different window sizes, performed spatiotemporal feature extraction and classification on the rs-fMRI data of each time segment, and ultimately determined the subject's classification result by majority voting of the classification results from each time segment. We empirically set the spatial scale K_s to 8; for the multi-scale temporal graph convolution module, the time scale K_T was set to 4, as this optimally combines higher-order spatial features with temporally distant features. Additionally, in this experiment, we only used an NVIDIA RTX A6000 graphics card to complete all related experiments.

Comparative experiments

Compare with deep learning models

In the comparative experiment, we applied representative deep learning models including graph convolutional network (GCN), graph attention network (GAT), graph isomorphism network (GIN), BrainGNN, BolT, TapNet and TodyNet, all of which have demonstrated superior performance in public datasets.^13,16^ GCN^31^ is a conventional deep learning model for graph-structured data, operates by performing convolutions on graph data and utilizes the adjacency matrices and feature tensors for information propagation and feature extraction. By updating the representation of each node through the merging of neighbouring node's feature, GCN can capture the local and global topology information within the graph. GAT^32^ incorporates attention mechanisms into graph convolutional network, enabling the automatic learning of relationships between input node features, as opposed to independently updating the features of each node as done in traditional graph convolutions. GIN^33^ is a graph convolution model designed based on the injective function property of graph neural networks, with its essence lying in the belief that a powerful GNN can map isomorphic graphs to the same representation. BrainGNN,^16^ a novel application of graph convolutional networks, employs novel ROI-aware convolutional (Ra-Conv) and ROI-aware pooling (R-pool) layers, specifically designed for rs-fMRI data. BolT^34^ partitions time-series data into multiple windows and captures local features by computing attention between selected window and adjacent windows, and progressively extends local features to global features as the overlapping range of windows increases, ultimately applying them to classification. TodyNet^35^ explores the dynamic spatiotemporal relationships in multivariate time series by constructing dynamic graphs. This model obtains multiple graph structures by slicing the time series into shorter segments and initializes the vertex and edge information within each segment. These segments are then input into the spatiotemporal graph model, which is based on graph isomorphism networks, for iterative processing. TapNet^36^ first embeds the multivariate time series into a one-dimensional space using LSTM and one-dimensional convolution. This embedded representation is then input into a transformer to learn prototype vectors for each class. Finally, classification is performed by comparing the embedded one-dimensional space vectors with the class prototype vectors. We employed the aforementioned deep learning models to classify our IBS dataset and compared the best classification results with ST-GCN.

Compare with machine learning models

Apart from the deep learning models, several machine learning models were also employed for comparative experimentation. To reduce the features and enhance the predictive power in the comparative experiment, we first employed dimensionality reduction methods on the data. These methods included principal component analysis (PCA), locally linear embedding (LLE), factor analysis (FA) and multidimensional scaling (MDS). For each dimensionality reduction method, different parameters were set, such as different numbers of principal components or latent factors, to achieve varying degrees of dimensionality reduction. For the PCA method, three pipelines were created: pca, pca2 and pca3, which were set to retain all principal components, half of the principal components, and one-third of the principal components, respectively. The treatment for LLE was similar to that of PCA. In the context of using the FA method for dimensionality reduction, procedures akin to those used in PCA were executed. Moreover, this approach was augmented by incorporating pipelines where the number of components was specifically set to 1000 and 2000. The approach for the MDS was the same as that for FA. Consequently, we established 16 different dimensionality reduction pipelines.

In this comparative experimentation, we employed eight different machine learning methods: linear SVM, Gaussian naive Bayes, random forest, logistic regression, lasso linear regression, k-nearest neighbours, ridge regression and multi-layer perceptron. For each of the method, exploration was conducted to discover the best classification performance by adjusting different parameters. For instance, in linear SVM, regularization was performed using both l1 and l2 penalty, and in the multi-layer perceptron, different hidden layers and solvers for weight optimization were set. Finally, the dimensionality reduction methods and machine learning methods were combined in various permutations, resulting in a total of 208 different machine learning classifiers. These classifiers were applied to our IBS dataset using a five-fold cross-validation method to obtain the final classification results.

Interpretability module

We extract the weights corresponding to the best classification results from the batch during the testing process, as well as a feature tensor $[eqn]$ that integrates spatial and temporal domains prior to entering the final fully connected layer for classification. The weights and the feature tensor $[eqn]$ are input into the interpretability module for analysis, the process of which is detailed in Fig. 2. Subsequently, we will provide a detailed description of this process.

Interpretability module overview: this figure illustrates the necessary inputs for the interpretability module and the explanation process. In the figure, c1, c2 and c3 denote the channel sizes of blocks 1, 2 and 3, respectively.

Initially, we extract the weights from the last fully connected layer of the neural network, denoted as FC_Weights, identifying the weights with the highest absolute values and marking their positions. We posit that these marked weights contribute most significantly to the classification outcome. Subsequently, we utilize the indices of the top-ranked weights from the previous step to extract what we consider the important channels from the feature tensor $[eqn]$ . This results in a feature vector with dimensions (B, c3, WindowSize, top_N), containing only the important channels. Next, we select the brain regions of interest from these important channels. Given that prior to entering the fully connected layer, the model flattens and averages the last two dimensions of the convolved feature tensor $[eqn]$ , we seek the elements within the dimensions (WindowSize, N) that contribute most significantly to the average value, marking their respective positions. We then count the number of marked elements in each column of the last two dimensions as a measure of the importance of that brain region, which we refer to as the importance score for each brain region. Finally, we compile and aggregate the importance scores for each brain region across all subjects in that batch, resulting in a ranked list of important brain regions.

Experiments and results

Classification experiment and results on ST-GCN

Fig. 3 illustrates the classification accuracy and corresponding standard deviations under varying window sizes (30–160). It is observed that the selected window size of 140 yields the optimal classification performance, achieving an accuracy of 83.51% in five-fold cross-validation. The average accuracy across all window sizes stands at 79.03%, which significantly surpasses previous studies. However, the experimental performance was less satisfactory when the time window was either too large or too small.

Classification accuracy across window sizes: this figure presents the scatter distribution and standard deviation of average classification accuracy under different window sizes. When the window size reached 140, we achieved the optimal classification accuracy of 83.51%. Each individual data point represents the classification accuracy obtained from one fold of the five-fold cross-validation, computed at the subject level (N = 158 subjects in total; 79 IBS patients and 79 HC). For each window size, five data points correspond to the five cross-validation folds, and the central marker denotes the mean accuracy across folds, with error bars indicating the standard deviation (STD).

Comparative results

Results with deep learning models

We conducted five-fold cross-validation experiments on seven additional deep learning models (namely GCN, GAT, GIN, BrainGNN, BolT, TodyNet, TapNet), aside from ST-GCN, while striving to optimize their hyperparameters for optimal classification performance. Table 2 displays the best classification accuracy, along with corresponding sensitivity, specificity and precision values, for these eight models. Comparative findings reveal that the classification performance of ST-GCN far exceeds that of other deep learning models, and it exhibits superior stability. We also plotted the ROC curves (Fig. 4) for the best training results of these eight deep learning models to demonstrate the stability of ST-GCN.

ROC curves of eight deep learning models: for the eight deep learning models, we plotted the optimal classification accuracy for each and found that the ST-GCN model exhibited the best classification performance. ROC curves were generated based on subject-level predictions from the test folds of five-fold cross-validation (N = 158 subjects). AUC denotes the area under the ROC curve, reflecting the model's overall discriminative ability.

Results with machine learning models

In the comparative experiments with machine learning models, we employed 13 different methods with the best dimensionality reduction method respectively, and show the results are present in Table 3. The first column denotes the abbreviation of the utilized pipeline, while the second column represents the average accuracy of all reduction methods for each pipeline. The third column displays the accuracy of the optimal reduction method for each pipeline. Additionally, the fourth column indicates the dimensionality reduction method employed when the best accuracy was achieved.

The classifiers in the first column are: (1) Gaussian naive Bayes; (2) random forest; (3)clinear SVM with l1 penalty and ANOVA feature selection method; (4)clinear SVM with l2 penalty and ANOVA feature selection method; (5)ck-nearest neighbours; (6) logistic regression with l2 penalty; (7) 10 hidden layers multi-layer perceptron; (8) 10 hidden layers multi-layer perceptron with Adam solver; (9) 5 hidden layers multi-layer perceptron; (10) 10 hidden layers multi-layer perceptron with Adam solver; (11) ridge regression; (12) linear SVM with l1 penalty; (13) linear SVM with l2 penalty.

In our machine learning pipelines, employing a multi-layer perceptron (MLP) classifier which configuring 10 hidden layers, yielded the highest average accuracy of 58.55%. Additionally, utilizing LLE with half the size of the training dataset as the reduced feature dimension, both 10-hidden-layer and 5-hidden-layer MLP classifiers achieved the highest accuracy of 70%. Comparatively, traditional machine learning methods exhibited significantly lower average accuracy compared to the ST-GCN model.

Results of ablation experiments

We conducted ablation experiments on all submodules within the ST-GC blocks simultaneously to verify the necessity of each submodule. Specifically, we performed ablation on MS-G3D, MS-GCN, and three MS-TCNs, resulting in a total of five distinct ablation experiments for validation. Table 4 presents the results of the five ablation experiments conducted using five-fold-cross-validation.

Results of different hyperparameters

Firstly, we sought the optimal hyperparameters by adjusting the spatial scale parameters of the MS-GCN and MS-G3D modules (they are denoted as gcn_scales and g3d_scales in Table 5 respectively), i.e. modifying the receptive field in the spatial convolution process. Subsequently, we adjusted the temporal scale parameters (which denoted as tcn_scales in Table 5) by progressively increasing the dilation range of the temporal convolutional kernel to observe the classification performance under different temporal receptive fields. Table 5 shows the classification results obtained under different hyperparameter configurations.

Interpretability results

Interpretability experiment and results

With the five-fold cross-validation, we selected the five folds with a window size of 140 (optimal classification accuracy window size) and extracted their weight matrices and final feature tensors, which were then averaged separately and input into our interpretability module to statistically assess the importance of each brain region. Figs 5 and 6 shows the top 20 brain regions identified by our interpretability module as being strongly associated with IBS, and we also show the importance scores for each brain region in the figure. Results revealed that for the Inferior Parietal Lobule (IPL.R) and the Inferior Frontal Orbital part (ORBinf.R) are significantly higher than those of other brain regions. The Postcentral Gyrus (PCG.R), Middle Frontal Orbital part (ORBmid.R), and Superior Medial Frontal Orbital part (ORBsupmed.L) are in the second tiers. Subsequently, the following regions are ranked: Paracentral Lobule (PCL.R), Superior Frontal Orbital part (ORBsup.R), Caudate nucleus (CAU.R), Rectal gyrus (REC.R), Anterior and Posterior Cingulate Gyrus (ACG.R), Paracentral Lobule (PCL.L), Superior Parietal Lobule (SMG.L), Inferior Occipital Gyrus (IOG.R), Pallidum (PAL.R), Lingual Gyrus (CUN.L), Angular Gyrus (ANG.R), Supplementary Motor Area (SMA.L), Thalamus (THA.L), Hippocampus (HIP.L) and Supplementary Motor Area (SMA.R). To visually assess the significance of different brain regions, we performed visualization processing on the outcomes derived from the interpretability module. For displaying cortical regions, we utilized the SurfStat toolbox (http://www.math.mcgill.ca/keith/surfstat),^37^ and for subcortical regions, we employed the Enigma Toolbox.^38^

Brain region importance scores (Automated Analytical Labelling; AAL90 atlas): the importance score for each brain region was derived from the averaged feature-weight contributions across five cross-validation folds, using the optimal window size (140). Each data point represents the aggregated importance score of a given brain region across all subjects (N = 158), obtained by averaging region-wise scores over folds and subjects.

Visualization of important cortical and subcortical regions: this figure presents the visualization of important brain regions, with abbreviations for significant areas indicated in the figure; (A) represents crucial brain regions located on the cerebral cortex, while (B) denotes regions situated beneath the cerebral cortex. Cortical and subcortical visualizations represent the relative importance scores derived from the interpretability module, averaged across all subjects (N = 158) and five cross-validation folds. Warmer colours indicate higher relative importance in distinguishing IBS patients from healthy controls.

Experiment only on important brain regions

To further validate the significance of our interpretability module, we extracted data from the top 20 significant regions identified by interpretability module and conducted classification experiments by inputting the time series data of these 20 significant regions into the ST-GCN model separately. Following the same experimental procedures as the whole-brain experiment, we performed five-fold-cross-validation on these data with optimal window size 140. In order to compare with the experiments described above, we also randomly selected 20 brain regions from all brain regions and conducted the same experiment. (The comparative experiment will be repeated 10 times with optimal window size 140, and statistical analysis will be conducted. The brain regions selected each time will be random and mutually independent.)

The results showed that the five-fold average classification accuracy of the 20 significant regions’ data exceeded that of the 20 random regions, the top 20 significant regions exhibited an average accuracy that was close to 3% higher than the 20 random regions in window size 140 (P < 0.001), and was comparable to the average classification accuracy of the whole-brain experiment in several window sizes. This demonstrates the validity of the conclusions drawn by the interpretability module. Table 6 shows the average accuracy and other binary classification attributes of five-fold-cross-validation for classification using the top 20 significant regions’ and random 20 regions’ data in time window size 140.

Comparison with the perturbation-based methods (PBM)

To further validate the reasonability of our interpretability module, we conducted a general linear model with gender, age and educational years as covariates between the node attributes derived from brain graphs of the patient group and the corresponding IBS-SSS (IBS symptom severity scale) scores of the patients. Specifically, we first constructed a graph for each IBS patient based on their functional connectivity matrix, the elements of the connectivity matrix is served as edge weights. Next, we computed node attributes of the graph for each patient, including degree centrality, closeness centrality, and clustering coefficient. These graph metrics were treated as independent variables, while the patients’ IBS-SSS scores were used as the dependent variable, with the above-mentioned covariates (gender, age and education) included in the general linear model. For each correlation analysis, we used only one node attribute from a specific brain region as the independent variable. Finally, we conducted a statistical analysis of the correlations between the node attributes of the ten most significant brain regions—identified by our interpretability module—and the patients’ IBS-SSS scores.

Through comparison, we found that IPL.R, ORBinf.R and REC.R appeared in the top 10 important brain regions of both interpretability methods (Table 7). Additionally, within the groups of the top 20 brain regions identified by both methods, 9 regions were consistently represented in the results of both approaches. Therefore, it is evident that various interpretative methods may highlight different brain regions, and a standardized approach for evaluating these methods is currently lacking. Nonetheless, the overlapping regions identified across different techniques may represent areas of greater significance for IBS and warrant closer attention.

Correlation analysis

To further validate the reasonability of our interpretability module, we conducted a general linear model with gender, age and educational years as covariates between the node attributes derived from brain graphs of the patient group and the corresponding IBS-SSS (IBS symptom severity scale) scores of the patients. Specifically, we first constructed a graph for each IBS patient based on their functional connectivity matrix, the elements of the connectivity matrix is served as edge weights. Next, we computed node attributes of the graph for each patient, including degree centrality, closeness centrality, and clustering coefficient. These graph metrics were treated as independent variables, while the patients’ IBS-SSS scores were used as the dependent variable, with the above-mentioned covariates (gender, age and education) included in the general linear model. For each correlation analysis, we used only one node attribute from a specific brain region as the independent variable. Finally, we conducted a statistical analysis of the correlations between the node attributes of the ten most significant brain regions—identified by our interpretability module—and the patients’ IBS-SSS scores.

Our findings indicate a strong correlation between these three node attributes and the IBS-SSS scores of the patient group. We will discuss the analysis results of these three node attributes in turn. First, among the top ten brain regions, the node degree of IPL.R, ORBinf.R, ORBmid.R, ORBsup.R, CAU.R and ACG.R demonstrated significant positive correlations with the IBS-SSS scores (FDR corrected p's < 0.05). Next, the closeness centrality of IPL.R, ORBinf.R, ORBmid.R, ORBsup.R and CAU.R also exhibited significant positive correlations with the IBS-SSS scores (FDR corrected p's < 0.05). Finally, the clustering coefficient of IPL.R, ORBinf.R, PCG.R, ORBmid.R, CAU.R and ACG.R displayed significant positive correlations with the IBS-SSS scores (FDR corrected p's < 0.05).

In addition to performing correlation analysis between the important brain regions identified by our interpretability module and the corresponding IBS-SSS (IBS symptom severity scale) scores of patients, we also conducted identical experiments on the important brain regions derived from the aforementioned PBM approach. Results demonstrate that both our interpretability module and PBM identified a broad range of brain regions showing significant correlations with IBS-SSS scores (Tables 8 and 9). However, the brain regions identified by the interpretability module showed more consistently significant correlation with IBS-SSS scores across different graph-theoretical metrics compared to those identified by PBM. Specifically, four brain regions—IPL.R, ORBinf.R, ORBmid.R and CAU.R—demonstrated strong correlations across all three graph-theoretical metrics with IBS-SSS scores, a consistency not observed in the PBM results, which enhances in the reliability of the findings from our interpretability module.

Discussion

Overall model performance

Currently, there is a lack of research utilizing deep learning for classifying fMRI data for IBS. Primarily, this is due to the limited sample size, and employing deep learning models with large parameters poses a high risk of overfitting. Through a series of experiments, supported by a large amount of data, we have demonstrated the classification effectiveness of STGCN on our IBS dataset. The model achieved an accuracy of 83.51% at a window size of 140, surpassing the accuracy of all previous classification methods while ensuring model stability. We believe that the primary benefit of the STGCN compared to traditional machine learning techniques and other deep learning models is its remarkable ability to extract and integrate spatio-temporal features from time series data for effective representation. This highlights the critical role that advanced spatio-temporal features play in enhancing the classification accuracy of brain time series data.

The optimal window size for classification was determined to be 140. While this window size is relatively large, we hypothesize that the underlying reasons may be as follows: As a disorder involving brain-gut axis interactions, the neural activity patterns associated with IBS may not manifest as transient rapid changes but rather as spatiotemporal dynamics that evolve over several minutes, representing relatively slow processes. A longer analysis window may be more conducive to capturing such stable, macroscale neurodynamic features related to pathophysiological mechanisms, thereby providing more discriminative information for the ST-GCN model and ultimately enhancing classification performance. Although this window length presents limitations in practical applications, our experimental results suggest that neural biomarkers of IBS may exist on a longer time scale than conventionally assumed—a finding that carries intrinsic scientific significance.

Furthermore, in IBS research, classification is not the primary objective, researchers are more interested in identifying biomarkers that can differentiate IBS patients from healthy subjects. Present studies often adopt a hypothesis-driven approach, whereby problematic brain regions are hypothesized and features related to these regions, such as ReHo and functional connectivity, are extracted. Subsequently, machine learning models are constructed to gauge the correlation of these extracted features with IBS through classification. Despite the capabilities of existing approaches, they fall short in fully leveraging the rich spatio-temporal information available in fMRI data. In our research, we utilize the ST-GCN to thoroughly exploit the spatio-temporal characteristics of data pertinent to IBS, without presuming pathological areas. This data-driven deep learning classification approach significantly enhances classification accuracy and, by designing a comprehensive interpretability module, enables the identification of brain regions contributing significantly to classification accuracy. Because of this interpretability module, our study offers comprehensive exploration of IBS.

Functional MRI data are extensively utilized in brain research, not just for its high spatial resolution but also for its ability to capture temporal variations, offering insights into the brain's dynamic activities. However, the complex spatial and temporal information in fMRI data poses a challenge for deep learning models to fully comprehend the contributions of specific brain regions over time. In this study, after fusing spatio-temporal features of fMRI data using ST-GCN, we incorporate an interpretability module designed to reverse infer neural network parameters, tackling this complexity. The brain regions pinpointed by this module have shown highly significant relevance to IBS as confirmed by our external experimental validations and literature reviews, thereby validating the efficacy of our interpretability approach.

Although we classified our IBS dataset using a uniform window size and achieved optimal classification results, an important factor has been overlooked: individual differences. The individual differences arising from factors such as gender, age, educational years, pain severity and subtype of the IBS disease cannot be ignored, even though these subjects were collected from the same medical institution. Technically, we look forward to future research that explores specific window size and optimal hyperparameters for different subgroups of IBS patients and even individual IBS patient. Clinically, we also look forward to future studies that incorporate subtype separations on IBS based on their clinical or imaging performance, this would believed to be given more nuanced information about the neuropathological factors towards the disease.

Clinical significance

We believe that the deep learning model applied in this study holds significant clinical implications, as it can assist physicians in improving the diagnostic accuracy of IBS, thus enhancing patient treatment outcomes. Additionally, the model can effectively analyse complex patterns in brain rs-fMRI data through its interpretability module, aiding clinicians in identifying biomarkers associated with IBS, thereby facilitating earlier and more accurate diagnoses.

To translate this model into a practical diagnostic tool, we propose several steps. First, we recommend conducting multi-centre clinical data collection to comprehensively validate the model's stability and generalizability, ensuring its adaptability to diverse patient populations. Upon successful validation, we will aim to develop a user-friendly interface that seamlessly integrates into existing clinical workflows, allowing healthcare professionals to efficiently input patient data and obtain diagnostic results. We also aim to design a website where users can easily upload data and automatically receive diagnostic results. This website will protect the privacy of users’ data and assist users in making decisions. By bridging advanced machine learning techniques with clinical practice, we aspire to enhance the efficiency of the diagnostic process and ultimately improve the quality of care for patients with IBS.

Brain regions altered by IBS

Using the brain region rankings provided by the interpretability module, we compare these identified regions to those reported in previous literature. We found that many brain regions undergoing changes belong to the default mode network, for example, the Postcentral Gyrus (PCG.R), the Anterior and Posterior Cingulate Gyrus (ACG.R) and the hippocampus (HIP.L), suggesting that the persistent pain caused by IBS is reflected in DMN. This finding is well supported by previous studies, for example, some researchers summarized that the DMN, as one of the most widely studied resting-state networks, may be most significantly affected by chronic pain.^7^ A study found that the average functional connectivity of the DMN in IBS patients was negatively correlated with IBS-SSS (IBS symptom severity scale), which is a scoring system used for assessing the severity of symptoms in patients with IBS.^4^ Compared to the healthy control group, the functional connectivity between DMN subregions in IBS patients was weakened, partly explaining the phenomenon of disordered visceral sensation in IBS patients. It is believed that chronic pain induced by IBS does indeed alter the DMN.

Among the top-ranking brain regions, the Inferior Parietal Lobule (IPL.R) is a part of the parietal lobe of the brain, primarily involved in processing sensory information and responding to external stimuli. This brain region is primarily engaged in attention and emotion regulation processes and is also associated with the regulation of the autonomic nervous system. A study suggests that gastrointestinal disturbances caused by IBS may lead to lowered anxiety and pain thresholds, causing patients to focus more on negative outcomes.^39^ Therefore, we posit that changes in IPL.R, as a key brain region involved in attention and emotion regulation processes, due to the influence of IBS are understandable. In addition to the Inferior Parietal Lobule (IPL.R), the Inferior Frontal Orbital part (ORBinf.R), Middle Frontal Orbital part (ORBmid.R), and Superior Medial Frontal Orbital part (ORBsupmed.L) all belong to the frontal lobe of the brain and play important roles in emotion regulation, with the latter two being closely related to social behaviour. In this study, researchers found that psychological and social factors significantly influence the mechanisms regulating visceral sensitivity in the brains of IBS patients. The Postcentral Gyrus (PCG.R) is a part of the parietal lobe of the brain and is involved in the processing of bodily sensations.^39^ One study discovered that this brain region shows significant activation during visceral stimulation, indicating a higher likelihood of alterations under the chronic effects of IBS.^40^ The Paracentral Lobule (PCL.R) is responsible for sensory-motor control, body positioning, and certain higher cognitive functions. The Superior Frontal Orbital part (ORBsup.R) is located above the frontal lobe and also has a significant impact on emotion cognition and regulation. Through meta-analysis of previous related studies, a review revealed that in IBS patients, the brain regions showing activation or deactivation are mainly associated with visceral sensation (PoCG, ACC, INS), emotional processing (hippocampus) and pain processing (frontal lobe, parietal lobe, SMA, thalamus, hippocampus, cerebellum, caudate nucleus and PCUN).^3^ Our study also found that patients with IBS exhibit related anomalies in a wide range of frontal and parietal brain regions. From the overview of the aforementioned top 7 ranked brain regions, it is evident that these regions are mainly located in the parietal and frontal cortical areas of the brain, which is also confirmed by the review.^3^ One of the essential functions of the parietal cortical areas is the processing of sensory information, while the frontal lobe is a crucial region involved in emotion regulation. Therefore, we have reason to believe that the results obtained from the interpretability module are indeed strongly correlated with a wide range of frontal and parietal regions, which is well aligned with the prior IBS findings.

Next, we explore the brain regions that are ranked slightly lower in our interpretable output list. Through the summary of previous related research, we observed that these brain regions, such as the caudate nucleus, anterior and posterior cingulate gyrus, SMA, thalamus, and hippocampus, have been mentioned in previous studies multiple times, thereby validating the validity of the interpretability module. For example, a study used functional connectivity density (FCD) to investigate changes in the overall brain functional connectivity patterns in IBS patients. The experimental results showed that compared to the healthy control group, IBS patients showed reduced long- and short-range FCD in the bilateral anterior cingulate cortex (aMCC), decreased short-range FCD in the caudate nucleus, and increased long-range FCD in the right SMA.^41^ In a study, it was found that patients with IBS showed higher positive resting-state functional connectivity between the amygdala and the adjacent hippocampus, as well as motor areas.^4^ Some authors demonstrated that IBS patients exhibited abnormal functional connectivity in brain regions associated with the frontal-insular system and the somatosensory-motor network, especially the insular cortex and SMASMA, explaining the vicious cycle between negative emotions and gastrointestinal symptoms in IBS.^42,43^ In a study, abnormal brain regions were observed in functional imaging of IBS patients, such as cortical thinning in the anterior cingulate cortex and increased grey matter in the thalamus.^41^ A voxel-based analysis of the amplitude of low-frequency fluctuation (fALFF) in the fALFF map of IBS and HC was compared, revealing changes in fALFF values in the left hippocampus.^44^ In a research, the analysis of ReHo in patients showed an increase in ReHo in the thalamus and a decrease in ReHo in the anterior cingulate cortex compared to the control group.^45^ Furthermore, it was found that resting-state functional connectivity between the right habenula and the right thalamus was significantly reduced in IBS patients.^12^

In addition to validation through prior literature review, we conducted an external validation to further confirm the efficacy of the interpretability module. For this, we utilized the top 20 regions identified by the whole brain model to predict IBS, comparing the outcomes against those obtained using 20 randomly selected brain regions. The findings demonstrated that the chosen 20 regions significantly outperformed the random regions in accurately classifying IBS and control participants, suggesting a more pronounced connection between the selected regions and IBS abnormalities.

Furthermore, we conducted PBM and correlation analyses on the brain regions identified by the interpretability module. For the correlation analysis results, we analysed them sequentially based on the different attributes of the nodes. The degree of a node indicates the number of edges connected to that node, a higher degree suggests that the node has more connections with other nodes, potentially representing greater influence or higher levels of activity. According to the results of the correlation analysis, the IPL.R, which is involved in processing sensory information and responding to external stimuli, primarily participates in attention and emotional regulation processes. The subsequent regions, including ORBinf.R, ORBmid.R and ORBsupmed.L, are also critical areas for emotional regulation. Notably, the above-mentioned brain regions exhibited a significant positive correlation with the IBS-SSS scores, indicating that increased activity in these regions is associated with greater severity of symptoms in IBS patients. The clustering coefficient of a node measures the tightness of connections among its neighbours, a high clustering coefficient suggests that the neighbours of a node are likely interconnected. Within this node attribute context, ORBinf.R, ORBmid.R and PCG.R, which show a significant positive correlation with IBS-SSS, are important regions for emotional regulation in the brain. In contrast, IPL.R, CAU.R and ACG.R are involved in visceral perception and pain perception, suggesting that the tighter the connections between these brain areas and surrounding regions, the higher the severity of symptoms in IBS patient.

Limitations and future directions

While ST-GCN has greatly increased the predictive power of IBS classification using brain fMRI data, it is still insufficient to fully utilize the rich information contained within the data. Specifically, concerning the ST-fMRI-GCN model, we hold the view that its handling of temporal features is overly simplistic, missing out on the opportunity to fully leverage the dynamic correlations present in fMRI data. In future research, we believe that the following aspects can be improved: (1) Our study lacks longitudinal trials covering the progression of IBS. Longitudinal studies are necessary to track neural changes from different stages, allowing for a better understanding of how the onset and progression of IBS affect the organization of large-scale brain networks. (2) In the ST-fMRI-GCN, the model parameters, such as window size, spatial scale and temporal scale, are manually selected. Although numerous experiments have been conducted to maximize the classification accuracy, this method consumes a significant amount of time and manpower. In future experiments, we aim to improve the model by integrating methods that enable it to adaptively choose its hyperparameters, streamlining the optimization process. (3) In our interpretability module, we can identify brain regions that have been altered by IBS, but we cannot discover changes in brain functional connectivity influenced by IBS, nor can we determine if the altered brain regions are more active or inhibited. In future research, we aim to refine and enhance the interpretability module to improve its functionality. (4) Currently, despite the fact that the sample size of the IBS dataset utilized in our study significantly exceeds that of prior research, it appears challenging to meet the data requirements necessitated by deep learning models, and increase the risk of overfitting. This dataset has been collected over a span of more than three years and is still in the process of ongoing collection. We anticipate the opportunity to evaluate our model on a larger dataset in the future. (5) Due to the privacy concerns associated with IBS data, we were only able to obtain data from the collaborating hospital, which has limited the validation of the model's generalizability and applicability. In future efforts, we intend to seek additional collaborative hospitals to establish a multi-site dataset that will facilitate the validation of our model.

Conclusion

In this paper, we apply ST-GCN to our own IBS dataset, achieving the highest average classification accuracy of 83.51%. After the model training, we deduce through the interpretability module that brain regions such as the IPL.R, ORBinf.R, PCG.R, ORBmid.R and ORBsupmed.L are highly correlated with the classification of this disease. We found that multiple brain regions, such as ORBinf.R, PCG.R and ORBmid.R, were involved in the regulation of emotions. Most of these regions are located in the parietal and frontal lobes of the brain, with many of them belonging to the DMN.

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Li J, He P, Lu X, et al A resting-state functional magnetic resonance imaging study of whole-brain functional connectivity of voxel levels in patients with irritable bowel syndrome with depressive symptoms. J Neurogastroenterol Motil. 2021;27(2):248–256.33795543 10.5056/jnm 20209 PMC 8026363 · doi ↗ · pubmed ↗
2Ma X, Li S, Tian J, et al Altered brain spontaneous activity and connectivity network in irritable bowel syndrome patients: A resting-state f MRI study. Clin Neurophysiol. 2015;126(6):1190–1197.25454279 10.1016/j.clinph.2014.10.004 · doi ↗ · pubmed ↗
3Yu Z, Liu L-Y, Lai Y-Y, et al Altered resting brain functions in patients with irritable bowel syndrome: A systematic review. Front Hum Neurosci. 2022;16:851586.35572000 10.3389/fnhum.2022.851586 PMC 9105452 · doi ↗ · pubmed ↗
4Qi R, Liu C, Ke J, et al Abnormal amygdala resting-state functional connectivity in irritable bowel syndrome. AJNR Am J Neuroradiol. 2016;37(6):1139–1145.26767708 10.3174/ajnr.A 4655 PMC 7963559 · doi ↗ · pubmed ↗
5Balmus I-M, Ciobica A, Cojocariu R, Luca A-C, Gorgan L. Irritable bowel syndrome and neurological deficiencies: Is there a relationship? The possible relevance of the oxidative stress status. Medicina (B Aires). 2020;56(4):175.
6Padhy SK, Sahoo S, Mahajan S, Sinha SK. Irritable bowel syndrome: Is it “irritable brain” or “irritable bowel”? J Neurosci Rural Pract. 2015;6(4):568–577.26752904 10.4103/0976-3147.169802 PMC 4692018 · doi ↗ · pubmed ↗
7Farmer MA, Baliki MN, Apkarian AV. A dynamic network perspective of chronic pain. Neurosci Lett. 2012;520(2):197–203.22579823 10.1016/j.neulet.2012.05.001PMC 3377811 · doi ↗ · pubmed ↗
8Liu M, Backer RA, Amey RC, Splan EE, Magerman A, Forbes CE. Context matters: Situational stress impedes functional reorganization of intrinsic brain connectivity during problem-solving. Cerebral Cortex. 2021;31(4):2111–2124.33251535 10.1093/cercor/bhaa 349 · doi ↗ · pubmed ↗