MDIC3: Matrix decomposition to infer cell-cell communication
Yi Liu, Yuelei Zhang, Xiao Chang, Xiaoping Liu

TL;DR
MDIC3 is a new tool that can infer how cells communicate with each other using single-cell data, without needing prior knowledge of ligand-receptor pairs.
Contribution
MDIC3 is the first unsupervised method for cell-cell communication inference that works across species and does not rely on ligand-receptor databases.
Findings
MDIC3 outperforms existing methods in inferring cell-cell communication in both human and mouse data.
MDIC3 successfully infers cell-cell communication in zebrafish and human skin samples.
MDIC3 separates gene and cell information from expression data without using prior ligand-receptor knowledge.
Abstract
Crosstalk among cells is vital for maintaining the biological function and intactness of systems. Most existing methods for investigating cell-cell communications are based on ligand-receptor (L-R) expression, and they focus on the study between two cells. Thus, the final communication inference results are particularly sensitive to the completeness and accuracy of the prior biological knowledge. Because existing L-R research focuses mainly on humans, most existing methods can only examine cell-cell communication for humans. As far as we know, there is currently no effective method to overcome this species limitation. Here, we propose MDIC3 (matrix decomposition to infer cell-cell communication), an unsupervised tool to investigate cell-cell communication in any species, and the results are not limited by specific L-R pairs or signaling pathways. By comparing it with existing methods…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMolecular Communication and Nanonetworks · Single-cell and spatial transcriptomics · Wireless Body Area Networks
Introduction
Crosstalk among cells is a fundamental function that leads multicellular organisms to accomplish complex biological tasks.1 For example, during early embryonic development, the cell’s differentiation and ultimate fate are controlled by communication among neighboring cells.2 In developed organisms, intercellular communications coordinate the activities of multiple cells for complex organismal processes such as immune response, growth, and homeostasis.2 Cell-cell communication involves interactions between cells of different types, and it extensively relies on interactions between secreted ligands and cell-surface receptors.2 Single-cell RNA sequencing (scRNA-seq) technologies can be used to detect the gene expression of each cell, leading to the discovery of cellular information at an unprecedented resolution level, thus greatly facilitating the development of intercellular communication research.3^,^4^,^5^,^6 On the basis of scRNA-seq, the systematic deciphering of the intercellular crosstalk mediated by ligand-receptor (L-R) interactions quickly became a research focus.3^,^4^,^7^,^8^,^9^,^10
To date, many methods have been developed for cell-cell communication, such as CellPhoneDB,7 CellChat,4 and iTALK.8 All of these methods investigate intercellular crosstalk on the basis of the expression intensity of specific L-R pairs, such as their coexpression (sum, mean, or product),2^,^7 differential expression8^,^11^,^ and expression correlation.12 However, relying on prior information such as L-R signals to investigate cell-cell communications is a common feature of existing cell-cell communication inference, and it also limits the application of these existing methods to other species without specific L-R pairs or signaling pathways. Thus, a strategy for identifying the overall cell-cell communication potential is still lacking. To date, the majority of existing L-R interaction databases are dominated by human databases, which were manually curated on the basis of experimental studies or literature. One of the representative human L-R databases was compiled and expanded by Ramilowski et al.,2 which presents the first large-scale map of cell-cell communication between 144 human primary cell types.2 Subsequently, a more comprehensive human L-R database called CellPhoneDB was built, which considered the subunit architecture for both ligands and receptors.7 More recently, an L-R database in both mice and humans was published by Jin et al.,4 which also contains the pairs of L-Rs with corresponding signaling pathways. Although there are many L-R databases, they cannot guarantee that an L-R database contains all of the discovered L-R information. In addition, the ongoing exploration of unknown L-R interactions means that the update of the L-R databases never stops.
In this paper, we propose a new method, MDIC3 (matrix decomposition to infer cell-cell communication), to reveal cell-cell communication through the cooperative analysis of gene regulatory networks (GRNs) and matrix decomposition on the basis of scRNA-seq data. Importantly, our method is independent of prior knowledge, meaning that the inference of cell-cell communication is no longer limited to specific species and does not rely on predefined L-R interactions or signaling pathways. MDIC3 can investigate intercellular communication networks from individual cells to cell types on the basis of existing cell classification labels. We showed the overall capabilities of MDIC3 by applying it to both mouse and human scRNA-seq datasets and evaluated its reliability and effectiveness using a large body of literature. We applied MDIC3 to analyze the communication networks during the early embryonic development of mice and found that the dominant communication signaling during the developmental stages from embryonic day 13.5 (E13.5) to E14.5 changes from dermal cells to epidermal cells. Furthermore, we used MDIC3 for two datasets of lesional (LS) and nonlesional (NL) skin from atopic dermatitis (AD) patients to compare the communication networks in different disease conditions. Our findings indicate that inflammatory phenomena may be present in both LS and NL skin, but the communications among inflammatory cells are more active in LS skin.
Results
Overview of MDIC3
MDIC3 is used to explore cell-cell communication from single-cell expression data. MDIC3 only requires a single-cell gene expression matrix when determining the cell-cell communication network, instead of any prior knowledge such as L-R pairs. Cell-cell communication can be regarded as a special regulatory relationship between cells. The core of MDIC3 lies in determining the regulatory relationships among cells on the basis of the regulatory relationships among genes. The regulatory relationships among genes can be depicted by the GRN, which can be determined from the single-cell gene expression profiles.
MDIC3 is based on the idea of matrix decomposition and tries to separate information about the genes and cells contained in the expression profile matrix. Usually, the single-cell gene expression profile with genes and cells can be regarded as a matrix with size , and the matrix can be further decomposed into three submatrices (Equation 4). The left submatrix is the size of , the middle submatrix is the size of , and the right submatrix is the size of (Figure 1). The left submatrix is used to represent the regulatory relationships among genes and can be replaced by the GRN adjacency matrix with size . The middle submatrix is used to connect the relationship between genes and cells and can be replaced by the singular matrix of the matrix with size . The middle submatrix represents the association information between genes and cells, as the singular matrix can extract the features of the matrix . On the basis of the single-cell expression matrix , the left submatrix , and the middle submatrix , we can obtain the right submatrix by solving the pseudoinverse matrix (Equation 6). As the single-cell expression profile should contain the regulation information among genes and the crosstalk information among cells, the resolved right submatrix reflects the regulatory relationships among cells, i.e., cell-cell communication (Figure 1; see methodology section). In fact, cell-cell communication can be regarded as a weighted directed graph/network among cells.Figure 1. Overview of MDIC3MDIC3 infers cell-cell communication from the input single-cell gene expression dataset with genes and cells. First, the gene expression matrix can be decomposed into three submatrices, including a left submatrix with size , a middle submatrix with size , and a right submatrix with size . Then, the middle submatrix can be obtained by calculating the singular matrix of the gene expression matrix . Next, the gene regulatory network (GRN) adjacency matrix is used to replace the left submatrix , where the GRN can be inferred on the basis of the input gene expression matrix and represented by an adjacency matrix. Finally, the right submatrix can be resolved on the basis of the above three matrices, including the input gene expression matrix , the left submatrix represented by the GRN adjacency matrix, and the middle submatrix represented by the singular matrix. The resolved right submatrix was used as the cell-cell communication adjacency matrix, and the cell-cell communication network was further investigated.
Investigating cell-cell communication in human lupus nephritis
Lupus nephritis (LN) is an autoimmune disease involving extensive cellular communication and is characterized by the infiltration of macrophages, T cells (TCs), and B cells into the kidneys.13^,^14^,^15^,^16^,^17 Hence, we used MDIC3 to investigate intercellular communication in LN using a published human single-cell dataset13 (Table S1). This dataset contains five kinds of cell groups that can be further divided into 22 cell types (Table S1), including the macrophage group (CM0, CM1, CM2, CM3, and CM4), T cell group (CT0a, CT0b, CT1, CT2, CT3a, CT3b, CT4, CT5a, CT5b, and CT6), B cell group (CB0, CB1, CB2a, CB2b, and CB3), dividing cell group (CD0), and epithelial cell group (CE0).
A total of 242 intercellular communications were identified among the 22 cell types by MDIC3, some of which are related to known L-R signaling. By comparing the results from MDIC3 with other tools (CellChat, CellPhoneDB, and iTALK), we found that some intercellular communications can be determined by both our algorithm and the other three tools. For example, the communications from CM4 to both CM2 and CM3 (Figure 2A) may be related to the expression of CCL2-CCR2 (Figure 2B) L-R signaling and are involved in LN.18 CCL2, a ligand with high expression in CM4, interacts with its receptor CCR2, which is highly expressed in both CM2 and CM3. Therefore, the communication from CM4 to both CM2 and CM3 can also be supported by CCL2-CCR2 L-R signaling. It has been reported that CCL2, a macrophage chemokine, cooperates with its receptor CCR2 to promote the recruitment of macrophages to sites of inflammation.18^,^19^,^20 Studies have shown that the inhibition of CCL2 may be a novel strategy for the treatment of LN.20 In addition, the CCL2-CCR2 signaling may also participate in the communication of CM4 to CB2b, which was detected by MDIC3, CellChat, CellPhoneDB, and iTALK (Figures 2A and S1), because of the high expression of the receptor CCR2 in CB2b cells (Figure 2B).Figure 2. Case study on the human lupus nephritis dataset(A) Heatmap showing the communication results among cell types determined by MDIC3. Black “I” indicates the communication mentioned in this main text. The row represents the “source” of a communication, and the column represents the “target” of the communication. For example, the black “I” in row “CB3” and column “CT0a” indicate intercellular communication from CB3 to CT0a.(B) Violin plot shows the expression of the CCL2-CCR2 ligand-receptor gene pair across different cell types in human lupus nephritis.(C) Heatmap shows the expression of the CD40-CD40LG, CX3CL1-CX3CR1, and CXCL10-CXCR3 ligand-receptor gene pairs across different cell types in human lupus nephritis.
It is worth noting that we also identified some intercellular communications that were determined only by our algorithm but did not appear in the results of all other tools. For example, communications from CE0 to CM0, CT1, and CT2 can be detected by MDIC3, CellChat, and iTALK (Figures 2A, S1A, and S1C) but not by CellPhoneDB (Figure S1B). This intercellular communication is consistent with the expression trends of CX3CL1-CX3CR1 signaling, which plays a pathological role in inflammatory kidney disease21^,^22 (Figure 2C). The ligand CX3CL1 exhibits high expression trends in CE0, while the receptor CX3CR1 shows high expression trends in CM0, CT1, and CT2. CX3CL1 has been reported to be produced mainly by renal tubular epithelial cells, and its receptors, such as CX3CR1, can be found on leukocytes, including macrophages and TCs.21^,^22^,^23^,^24 This indicates that epithelial cells may be involved in the infiltration of immune cells associated with nephritis, including macrophages and TCs.21^,^22^,^23^,^24 MDIC3 also examines communication from CE0 to CD0, which was also detected by CellChat and iTALK (Figures S1A and S1C) but not by CellPhoneDB (Figures 2A and S1B). The intercellular communication of CE0 to CD0 is possibly related to the expression of CX3CL1-CX3CR1 signaling, as we found not only that the ligand CX3CL1 can be expressed by CE0 but also that the receptor CX3CR1 can be expressed by CD0 (Figure 2C).
Additionally, MDIC3 can also reveal known intercellular communication between B cells and TCs, such as the communication from CD40-expressing CB3 to CD40LG-expressing CT0a, CT0b, and CT3b, which are involved in the immune regulation of LN25^,^26^,^27^,^28 (Figures 2A and 2C). This communication can also be identified by CellChat and CellPhoneDB (Figures S1A and S1B) but not by iTALK (Figure S1C).
Moreover, we also identified communication from CB3 to CT5a, which appeared in the results of CellChat and CellPhoneDB (Figures S1A and S1B) but did not appear in the results of iTALK (Figures 2A and S1C). This communication is mediated by CXCL10-CXCR3 L-R signaling. The signaling pathway was found to be significantly enriched in LN patients through biological experiments and immunofluorescence,18^,^29 with high expression of the ligand CXCL10 in CB3 and the receptor CXCR3 in CT5a (Figure 2C). Furthermore, MDIC3 identified crosstalk among TCs, such as CT6 to CT0a and CT6 to CT5a (Figure 2A), which can also be found by CellChat (Figure S1A) but cannot be found by CellPhoneDB and iTALK (Figures S1B and S1C). This crosstalk was driven by CXCL10-CXCR3 signaling (Figure 2C), as the ligand CXCL10 has high expression in CT6, and the receptor CXCR3 also has significant expression in CT0a and CT5a.
Investigating cell-cell communication in human LS skin
AD is an inflammatory skin disease involving skin immune and barrier abnormalities.30^,^31^,^32 There are a variety of signaling pathways for immune-inflammatory responses and complex chemokine signaling changes in the AD skin microenvironment. We applied MDIC3 to investigate cell-cell communication in the human AD LS skin scRNA-seq dataset30 (Table S1). Our findings revealed intercellular communications from the inflammatory FIB to immune cells (dendritic cells [DCs] and TCs) (Figure S2A; Table S1), which is in concordance with a previous study.30 The inflammatory FIB may interact with immune cells to regulate lymphoid cell organization and type 2 inflammation.30 Moreover, AD was reported to be associated with the expression of periostin (POSTN), and cells expressing POSTN have been reported to have direct signaling with immune cells.33 As expected, the cells expressing POSTN were predominantly present in the inflammatory FIB (Figure 3A).Figure 3. Case study on the human lesional skin dataset, human islet dataset, and mouse brain dataset(A) Violin plot shows the expression of POSTN across different cell types in human lesional skin.(B) Heatmap shows the expression of selected ligand-receptor gene pairs under the COMPLEMENT signaling pathway in human lesional skin, including C3-(ITGAM+ITGB2) and C3-(ITGAX+ITGB2). Inflame FIB, inflammatory fibroblasts; cDC, conventional dendritic cell; inflame DC, inflammatory dendritic cell, LC, Langerhans cell; inflame TC, inflammatory T cell; NKT, natural killer T cell.(C) Communication network among cell types in human islets inferred by MDIC3. The direction of the arrow indicates the direction of signal transference. The edge width represents the communication strength.(D) Heatmap showing the expression of ligand-receptor signaling (IGF2-IGF1R, GCG-GLP1R, SST-SSTR2, INS-INSR) across different cell types in different human islet cells.(E) Communication network among cell types in the mouse brain inferred by MDIC3. The direction of the arrow indicates the direction of signal transference. The edge width represents the communication strength.(F) Dot plot depicting the percent expression and average expression of selected ligand-receptor gene pairs (Ptn-Ngc, Cx3cl1-Cx3cr1, Apoe-Lrp1) in different mouse brain cell types. S1PNs, S1 pyramidal neurons; CA1PNs, CA1 pyramidal neurons; INs, interneurons; OLs, oligodendrocytes; AC, astrocytes; endo, vascular endothelial cells.
DC groups, including inflammatory DCs, cDC1s, cDC2s, and Langerhans cells (LCs), represent one of the immune cell groups (Table S1). MDIC3 identified communication from inflammatory FIB to inflammatory DC (Figure S2A), a finding consistent with other methods (Figures S2B–S2D) and supported by the CCL19-CCR7 L-R signaling (Figure S2E), as the ligand CCL19 and its receptor CCR7 were only highly expressed in inflammatory FIB and inflammatory DC, respectively (Figure S2E). The communication from inflammatory FIB to inflammatory DC through CCL19-CCR7 L-R signaling is also in agreement with the immunofluorescence staining result in the original study.30 Moreover, MDIC3 reveals communication from inflammatory FIB to other cell types of the DC cell group, such as inflammatory FIB to cDC1, inflammatory FIB to cDC2, and inflammatory FIB to LC (Figure S2A). These intercellular communications can be explained in the pathways known to be involved in immune and inflammatory responses.4 For example, the expression of L-R pairs from the COMPLEMENT4 pathway, such as C3 to (ITGAM+ITGB2) and C3 to (ITGAX+ITGB2), was found to contribute to the communication from inflammatory FIB to cDC1, cDC2, and LC (Figure 3B). Moreover, we observed that the communications from inflammatory FIB to cDC2s and LCs can also be inferred by CellChat, CellPhoneDB, and iTALK. Additionally, the communication from inflammatory FIB to cDC1 appeared in the results of CellPhoneDB and iTALK but not in the results of CellChat (Figures S2B–S2D).
The TC group, including inflammatory TC and natural killer T (NKT) cells, represents another immune cell group (Table S1). MDIC3 inferred the communication from inflammatory FIB to inflammatory TC (Figure S2A), a finding consistent with the original analysis.30 However, the communication signal from the inflammatory FIB to the inflammatory TC can be identified by CellPhoneDB (Figure S2C) and iTALK (Figure S2D) but not by CellChat (Figure S2B). Notably, paracrine effects from inflammatory FIB to NKT cells were reported in the results of MDIC3 (Figure S2A) and iTALK (Figure S2D) but not in the results of CellChat (Figure S2B) or CellPhoneDB (Figure S2C). This paracrine effect can be explained by previous immunofluorescence staining,34 which showed that CXCR4-expressing NKT cells can interact with CXCL12-expressing FIB, contributing to the development of allergic skin inflammation in AD patients.34 Here, the ligand CXCL12 was expressed only by inflammatory FIB, while NKT cells indeed expressed the receptor CXCR4 (Figure S2F).
Investigating cell-cell communication in human islets
We used the human islet scRNA-seq dataset to further demonstrate the effect of MDIC3.35 The five different types of endocrine cells can receive signals from the internal and external environment by autocrine and paracrine mechanisms or contact each other by membrane-bound molecules to form a complex intercellular communication network to complete biological functions, such as maintaining glucose homeostasis and regulating the state of surrounding cells.36^,^37^,^38 MDIC3 revealed autocrine and paracrine effects focused on endocrine cells, especially cell-cell communication in alpha, beta, and delta cells (Figures 3C and S3A). We compared the findings of MDIC3 with those of other methods (CellChat, CellPhoneDB, and iTALK) (Figures S3B–S3D).
As a result, MDIC3 successfully detected autocrine signaling in beta cells (Figure 3C), which is consistent with existing research showing that beta cells can regulate insulin-mediated gene expression and protein translation in an autocrine manner.39 Furthermore, autocrine signaling in beta cells may be mediated by IGF2-IGF1R L-R signaling, with the ligand IGF and receptor IGF1R being highly expressed in beta cells (Figure 3D). Indeed, previous studies have elucidated that insulin-like growth factor 2 (IGF2) is produced and secreted by human beta cells and is an autocrine activator of insulin-like growth Factor 1 receptor (IGF1R) signaling in beta cells.40 Autocrine factors have important regulatory effects on the proliferation and function of beta cells.41^,^42 Notably, autocrine signaling in beta cells was uniquely identified by MDIC3 but not by CellChat, CellPhoneDB, or iTALK (Figures S3B–S3D). This indicates that MDIC3 exhibits greater accuracy than other methods in identifying the autocrine signaling of beta cells. Moreover, the communications from beta cells to ductal cells were detected by MDIC3, CellChat, and CellPhoneDB (Figures 3C, S3B, and S3C) but not by iTALK (Figure S3D). This observation may be associated with the expression of IGF2-IGF1R L-R signaling (Figure 3D), attributed to the significant secretion of receptor IGF1R in ductal cells.
In addition, MDIC3 was able to identify paracrine communication from delta cells to alpha cells (Figure 3C). Our analysis revealed high expression of somatostatin (SST) in delta cells and the receptor SSTR2 in alpha cells (Figure 3D). This suggests that the communication from delta cells to alpha cells may be mediated by SST-SSTR2 L-R signaling. Furthermore, our findings are in concordance with previous studies demonstrating that SST, secreted by delta cells,43 can target alpha cells through the receptor SSTR2 to inhibit glucagon secretion.43^,^44^,^45^,^46^,^47^,^48 Notably, the paracrine communication from delta cells to alpha cells was not identified by CellChat, CellPhoneDB, and iTALK (Figures S3B–S3D).
MDIC3 also predicted bidirectional paracrine communication between beta cells and alpha cells (Figure 3C). Previous studies have indicated that alpha cells and beta cells are spatially close, and when islets are dispersed into individual cells, most beta cells remain attached to alpha cells.36^,^37 This close spatial proximity suggests that beta cells and alpha cells may be directly exposed to each other’s secretions, enabling their paracrine communication.36^,^49 This direct communication plays a role in controlling the release of biosynthetic and secretory products, as well as cell survival.50 For example, research has shown that ligand molecules (GCG and INS) secreted by alpha and beta cells can act as paracrine signals.49^,^51^,^52^,^53^,^54^,^55 We found that the receptor GLP1R of GCG and the receptor INSR of INS were indeed significantly expressed on beta cells and alpha cells, respectively. Therefore, bidirectional communication may be mediated by GCG-GLP1R and INS-INSR L-R signaling (Figure 3D). Consequently, both spatial proximity and the expression of L-R pairs support our prediction of bidirectional paracrine communications. However, none of the bidirectional communications were predicted by CellChat, CellPhoneDB, or iTALK (Figures S3B–S3D).
Investigating cell-cell communication in E14.5 mouse skin
MDIC3 can analyze intercellular communication in any species because it does not require any prior knowledge, such as L-R databases. To illustrate the broad application of MDIC3, we showed the ability of MDIC3 to identify cell-cell communications in mouse datasets. As CellPhoneDB and iTALK contain L-R information only for humans, they cannot be used to investigate cell-cell communication in mouse data. Therefore, we compared the intercellular communications identified by MDIC3 with those from CellChat, which provides a mouse L-R database. Embryonic mouse skin development involves the proliferation and growth of numerous epidermal and dermal cells and the activation and transduction of various signals that drive the entire developmental process. We analyzed the scRNA-seq dataset of E14.5 embryonic mouse skin,56 which included dermal cells and epidermal cells.4 In this mouse dataset, we found some known communications between dermal cells (e.g., FIB-A, FIB-B, FIB-P, DC) and epidermal cells (e.g., basal, basal-P).
Both MDIC3 and CellChat can investigate autocrine communication for basal-P and paracrine communication from basal-P to FIB-A, FIB-B, FIB-P, and DC (Figures S4A and S4B). It has been reported that WNT signaling, generated by the epidermis, is needed for fibroblast proliferation during hair follicle development and drives the crosstalk between the epidermis and dermis through epidermal autocrine and epidermis-to-dermis paracrine signaling to mediate the role of hair follicle development.57^,^58 This evidence supports the results showing that epidermal cells, such as basal-P, not only engage in autocrine communication but also in paracrine communication with some dermal cells, such as FIB-A, FIB-B, FIB-P, and DC (Figures S4A and S4B). In addition, we analyzed Wnt6-(Fzd2+Lrp6) L-R signaling, which has a high contribution rate to WNT signaling according to CellChat, and found that its expression can also support paracrine communication from basal-P cells to FIB-A, FIB-B, FIB-P, and DCs (Figure S4C). In other words, basal-P cells exhibit high expression of the ligand Wnt6, while FIB-A, FIB-B, FIB-P, and DC cells show high expression of the receptors Fzd2 and Lrp6.
Both MDIC3 and CellChat also found that some epidermal cells (e.g., basal, basal-P) and some dermal cells (e.g., FIB-B, FIB-P, DC) responded to one fibroblast population, FIB-A, suggesting a paracrine effect in FIB-A. Additionally, we also observed an autocrine manner in FIB-A (Figures S4A and S4B), consistent with reports on the autocrine communication of FIB-A.59^,^60^,^61^,^62^,^63^,^64^,^65 The FGF pathway is a known fibroblast growth signaling pathway and is crucial for skin morphogenesis.59^,^60^,^61^,^62^,^63^,^64^,^65 Furthermore, the FGF pathway plays a regulatory role in hair follicle development, and its activation is essential for hair growth.59^,^60^,^61^,^62^,^63^,^64^,^65 We found that the ligand Fgf7 can promote the growth and proliferation of epidermal cells under the FGF pathway65 and participates in paracrine communication from FIB-A to epidermal cells (e.g., basal, basal-P) through its receptor Fgfr2, as well as autocrine communication of FIB-A and paracrine communication from FIB-A to some other dermal cells (e.g., FIB-B, FIB-P, DC) through the receptor Fgfr1 (Figure S4D). These observations further underscore the accuracy of MDIC3.
Investigating cell-cell communication in the mouse brain
As the second case study for mouse cell-cell communication, we used a scRNA-seq dataset from a juvenile mouse,66 including neuron cells (S1PNs, CA1PNs, and INs), vascular endothelial cells (endo), and glial cells represented by microglia.66 Communication among these cells is critical for the development and function of a normal brain, and their dysregulation can result in neurological disorders.67^,^68^,^69^,^70^,^71^,^72
MDIC3 found communication from INs to microglia (Figure 3E), which was not observed in the results of CellChat (Figure S4E). Communication from neuron cells to microglia was reported primarily through Cx3cl1-Cx3cr1 signaling,73^,^74 which regulates the apoptosis, proliferation, transcription, and migration of microglia.75^,^76^,^77 We analyzed Cx3cl1-Cx3cr1 signaling and found that it is consistent with the known specific localization of this signaling74; that is, Cx3cl1 is abundantly expressed on INs, while its receptor Cx3cr1 is restricted to microglia (Figure 3F).
Additionally, MDIC3 identified communications from microglia to S1PNs, CA1PNs, and INs (Figure 3E), which is consistent with the critical role for microglia in promoting neurogenesis to regulate neural differentiation and synaptic structure,78^,^79^,^80^,^81^,^82 yet these interactions were not identified by CellChat (Figure S4E). Indeed, Apoe is an important secreted protein in the brain, and our analysis revealed its high expression in microglia (Figure 3F). Previous research has reported that Apoe is enriched in microglia and overlaps with the signal of the receptor Lrp1 by immunofluorescence in the brain,67 while the receptor Lrp1 is indeed abundantly expressed on S1PNs, CA1PNs, and INs (Figure 3F). Interestingly, bidirectional communications between INs and microglia were identified by MDIC3 (Figure 3E) but not by CellChat (Figure S4E). However, the bidirectional communications between neurons and glial cells are the basis for the function of the central nervous system and have been reported by previous studies.75^,^82^,^83
Furthermore, MDIC3 can also identify the autocrine communication of S1PNs and paracrine communication from S1PNs to INs (Figure 3E), which were not detected in the results of CellChat (Figure S4E). These communications may be related to Ptn, another important neuromodulator in the brain, which was found to be highly expressed in S1PNs (Figure 3F). Meanwhile, the receptor Ngc,84^,^85 known for promoting neurite outgrowth,84^,^85 was found to be highly expressed in both S1PNs and INs (Figure 3F). Notably, the Ptn-Ngc signal84^,^85 was not documented in the CellChat L-R database, indicating that methods based on the L-R database may lack the ability to capture new signaling pathways for cell-cell communication.
Moreover, MDIC3 also showcases the communications from neuron cells (S1PNs, CA1PNs, and INs) to endo (Figure 3E), which is consistent with the role of neurons in directing vascular endothelial cell development.67^,^86^,^87 For example, communication from INs to endo may be driven by the vascular endothelial growth factor (VEGF) pathway, such as Vegfb-Flt1 L-R signaling (Figure S4F). However, the communications from neuron cells to endo cells were not fully identified by CellChat (Figure S4E).
Joint learning of E13.5 and E14.5 mouse skin
We used MDIC3 to detect communication signaling changes at different stages of mouse development, specifically focusing on the E13.5 and E14.5 mouse skin scRNA-seq datasets.56 We first compared the ratio of cell types in E13.5 and E14.5 mouse skin (Figure 4A). The ratio of spinous epithelial cells (spinous) in E13.5 mouse skin was only 1.45%, while it significantly increased to 11.46% in E14.5 mouse skin (11.46%). In addition, two cell types (DC and pericytes) were found only in E14.5 mouse skin (Figure 4A). By examining the sending and receiving of communication signals (Figure 4B), we found that DCs and pericytes can both receive communication signals from different dermal and epidermal cell types, thus participating in E14.5 mouse skin cell-cell communications. In particular, DCs and pericytes also send communication signals, which are exclusively received by spinous epidermal cells (spinous) (Figure 4B). Notably, spinous exhibited extensive proliferation at E14.588 and showed enriched cellular communication signals specific to E14.5 (Figure 4B). Similarly, the proliferative basal cells (basal-P) also displayed abundant E14.5-specific cellular communication signals (Figure 4B). With the change in mouse embryonic development, the gradual activation of the two epidermal cell types (spinous and basal-P) is consistent with the evident epidermis stratification and differentiation program and the signs of basal differentiation around E14.5.88^,^89 Furthermore, MDIC3 revealed E14.5-specific bidirectional paracrine communication between spinous and basal-P (Figure 4B). This E14.5-specific bidirectional paracrine effect may be related to the spatial proximity of proliferative basal cells and spinous epidermal cells around E14.5.88Figure 4. Joint learning at different development stages and in different disease conditions(A) Comparison of the ratio of different cell types (Table S1) in embryonic day 13.5 (E13.5) and E14.5 mouse skin. FIB-A, fibroblast type A; FIB-B, fibroblast type B; FIB-P, proliferative fibroblasts; DC: dendritic cells; ENDO, endothelial cells; MELA, melanocyte; immune, immune cells; MYL, myeloid cells; muscle, muscle cells; basal, basal cells; basal-P, proliferative basal cells; spinous, spinous epithelial cells.(B) Comparison of cell-cell communication at the two time points. Significant communications that appeared in both E13.5 and E14.5 are marked with large gray dots. Significant communications that only appeared in E14.5 are marked with large blue dots. Significant communications that only appeared in E13.5 are marked with large yellow dots. Communications that appeared at neither E13.5 nor E14.5 are marked with small red dots.(C) Visualization of connectivity-summary networks by the top ten communication strength edges on the basis of E13.5. The dashed ovals highlight communication communities centered around different cell types.(D) Visualization of connectivity-summary networks by the top ten communication strength edges on the basis of E14.5. The dashed ovals highlight communication communities centered around different cell types.(E) Comparison of cell-cell communication in human lesional (LS) and nonlesional (NL) skin datasets. Significant communications that appeared in both NL and LS are marked with large gray dots. Significant communications that only appeared in LS are marked with large blue dots. Significant communications that only appeared in NL are marked with large yellow dots. Communications that appeared in neither NL nor LS are marked with small red dots. Inflame FIB, inflammatory fibroblasts; cDC, conventional dendritic cell; inflame DC, inflammatory dendritic cell; LC, Langerhans cell; inflame TC, inflammatory T cell; NKT, natural killer T cell.(F) Visualization of LS-specific communications. The dashed ovals highlight communication communities centered around different cell types.(G) Visualization of NL-specific communications. The dashed ovals highlight communication communities centered around different cell types.
At E13.5, dermal cells accounted for a large proportion of the cell number, and most cell-cell communication occurred among different dermal cell types, with fibroblasts (FIB-A, FIB-B, and FIB-P) showing high activity (Figures 4A and 4B). Intriguingly, over a one-day period, the communications among dermal cells and epidermal cells significantly increased, suggesting that epidermal cells gradually participate in cell-cell communication between E13.5 and E14.5 (Figure 4B). This finding may be related to the development of mouse hair follicles,90 as previous studies showed that vibrissae development progressed dramatically from E13.5 to E14.5.89 Histology analysis revealed the presence of primary follicles and stratified epithelium structures in the dorsal skin of E16.5 mice.91 We also found that 41 cell-cell communications occurred among different cell types at both E13.5 and E14.5 (Figure 4B). These communications are essential for both time points of embryonic mouse skin development, but these communications may not be critical at specific time points.
Finally, we examined the top 10 communication strength edges on the basis of E13.5 and E14.5 communication networks among cell types and visualized them as two connectivity-summary networks (Figures 4C and 4D). We found different communication communities for both E13.5 and E14.5 (Figures 4C and 4D). For E13.5, we observed a community centered around dermal cells, represented by muscle cells (muscle), which dominated the outgoing communication signals at E13.5 (Figure 4C). Interestingly, these communications from muscle to other cell types also occurred in E14.5 (Figure 4B), suggesting that the communication activity of muscle may gradually decrease in later mouse development, possibly related to mouse abdominal wall development during mouse embryonic development.92 Instead, E14.5 exhibited two communication communities with epidermal cells as centers, represented by spinous epidermal cells (spinous) and proliferative basal cells (basal-P), respectively, and they dominated the outgoing communication signals at E14.5 (Figure 4D).
In general, our joint analysis using MDIC3 and the E13.5 and E14.5 mouse skin datasets indicate that in the early stage of mouse development, cellular communications may be dominated by dermal cells, while with the development of mouse embryos, epidermal communication signals may increase. Taken together, MDIC3 enables a comprehensive assessment of cellular communication at different timescales.
Joint learning of LS and NL human skin
We also applied MDIC3 to analyze cell-cell communications in two different AD conditions. One dataset originates from LS (disease) skin of AD patients, while the other is derived from the NL (normal) skin of AD patients. Both datasets encompass 7 cell types (Table S1). Compared with NL skin, the three inflammatory cell types (inflame FIB, inflame DC, and inflame TC) exhibit a greater number of cellular communications in LS skin (Figure 4E) potentially linked to the presence of inflammation in the lesioned skin. AD lesions are characterized by expanded inflammatory cells that may interact with immune cells to regulate lymphoid cell organization and type 2 inflammation.30 Notably, we identified LS-specific communication in which the signal was sent from inflammatory FIB and received by inflammatory DC (Figure 4E). This finding is consistent with the experimental results reported in the original study.30
Furthermore, we found that 8 cell-cell communications appeared in both NL skin and LS skin among different immune cell types (inflame DC, inflame TC, cDC1, cDC2, LC, and NKT) (Figure 4E), suggesting that cellular communications in different AD conditions are always closely linked to the immune response. Notably, we noticed that 14 communication signals related to inflammatory cells (inflame FIB, inflame DC, and inflame TC) appeared in both NL skin and LS skin (Figure 4E). These signals suggest that not only inflammatory phenomena exist in LS skin but also that there may be some inflammatory phenomena without obvious inflammatory phenotypes in NL skin. This finding has been mentioned in previous reports that NL skin already shows signs of subclinical inflammation.93^,^94 Examining LS-specific communications and NL-specific communications (Figures 4F and 4G), we found that inflammatory cell types (inflame FIB, inflame DC, and inflame TC) were highly active in LS skin, while NL skin was dominated by communications among immune cell types. In fact, inflammatory cells and immune cells are both greatly activated in AD.93^,^95^,^96^,^97
Together, MDIC3 can not only find signal changes in disease conditions but also identify potential signals common to different disease conditions. The joint analyses performed in mice and humans show the ability of MDIC3 to analyze cellular communication across different species.
Comparison of MDIC3 with other cell-cell communication inference methods
We compared MDIC3 with other known methods (i.e., CellChat, CellPhoneDB, and iTALK) to assess their accuracy and effectiveness in identifying intercellular communications in both human and mouse species. We separately applied these four methods to investigate cell-cell communications for five human islet endocrine cell types (alpha, beta, delta, epsilon, and PP cells).35 MDIC3, CellChat, CellPhoneDB, and iTALK identified 12, 8, 2, and 7 intercellular communications among the five cell types (Figure 5A; Table S2), respectively. We observed that MDIC3 identified 10 particular communications that were not detected by other methods (Figure 5B), iTALK and CellChat separately identified 3 and 4 particular communications for the five cell types, while CellPhoneDB did not identify any particular communication among the five cell types (Figure 5B). This means that MDIC3 can identify more particular communications than other methods. By literature research, we found that 83.33% (10 of 12) of the communications identified by MDIC3 are supported by literature (Figure 5A; Table S2), a ratio significantly higher than those of other methods: CellChat (12.5%), CellPhoneDB (0%), and iTALK (42.86%) (Figure 5A; Table S2). To evaluate the accuracy, we depicted the accuracy of identified cell-cell communications using receiver operating characteristic (ROC) curves for the four methods. We found that the area under the ROC curve (AUC) of MDIC3 was 0.801, which was obviously higher than that of the other three methods: CellChat (0.247), iTALK (0.449), and CellPhoneDB (0.417) (Figure 5C).Figure 5. Comparison of the predictive performance of MDIC3 with other methods(A) Comparison of literature support (Table S2) for the results of intercellular communication among endocrine cells in the human pancreatic islet dataset predicted by five methods (MDIC3, CellChat, CellPhoneDB, iTALK, and the two-sided self-representation model). The horizontal ordinate represents the five methods, and the vertical ordinate is the number of intercellular communications from the five methods. The orange bar represents the number of intercellular communications with literature support, and the green bar represents the number of intercellular communications without literature support from the PubMed database (https://pubmed.ncbi.nlm.nih.gov).(B) UpSetR plot of predicted cellular communication results among endocrine cells on the human pancreatic islet dataset from four methods (MDIC3, CellChat, CellPhoneDB, and iTALK). The horizontal bar graph in the bottom left represents the total number of cell-cell communications detected by different methods. Dots are used to indicate the corresponding method on the left. If cell-cell communications were detected only by one method, only one gray dot was darkened, and the number of detected cell-cell communications is shown in the bar graph form above. The intersection of the cell-cell communication results obtained by different methods is shown by multiple black dots connected by lines, and the number of the intersection of intercellular communication results is represented in the bar graph form above.(C) ROC curves of the results of four methods (MDIC3, CellChat, CellPhoneDB, and iTALK) on intercellular communications with literature support (Table S2) among endocrine cells in the human pancreatic islet dataset.(D) ROC curves of the four methods (MDIC3, CellChat, CellPhoneDB, and iTALK) on intercellular communications with literature support (Table S3) on the human lesional skin dataset.
Then, we used the human AD scRNA-seq dataset with seven cell types to compare the results of MDIC3 with other tools (CellChat, CellPhoneDB, and iTALK).30 MDIC3, CellChat, CellPhoneDB, and iTALK separately identified 27, 23, 22, and 15 intercellular communications among the seven cell types (Figure S5A; Table S3). MDIC3 identified 9 particular communications from 27 communications, which were not identified by other methods (Figure S5B). By literature research, we found that 51.85% (14 of 27) of communications identified by MDIC3 can be supported by literature, a ratio higher than the literature-supported communications from CellChat (43.48%) and CellPhoneDB (31.82%) (Figure S5A; Table S3). Additionally, 53.33% (8 of 15) of communications identified by iTALK are supported by literature (Figure S5A), slightly higher than the literature-supported ratio of MDIC3 (Figure S5A). However, MDIC3 identified 27 communications, which is more than the 15 communications from iTALK. Therefore, 14 of 27 communications were supported by literature from MDIC3, and only 8 of 15 communications were supported by the iTALK method (Figure S5A; Table S3). In addition, compared with the ROC curves of the other three methods, MDIC3 achieved the largest AUC (MDIC3, 0.641; CellChat, 0.523; CellPhoneDB, 0.393; iTALK, 0.589) (Figure 5D).
Finally, we selected scRNA-seq mouse data, including nine mouse brain cell types,66 and compared the cell type communications inferred from MDIC3 and CellChat. We found that MDIC3 and CellChat identified 43 and 34 intercellular communications among the nine cell types, respectively (Figure S5C). MDIC3 and CellChat separately identified 27 and 18 specific communications, respectively, and 16 overlapping communications were identified by both methods (Figure S5D). By literature search, we found that 79.07% (34 of 43) of communications identified by MDIC3 were supported by literature, a ratio significantly higher than the 55.88% (19 of 34) literature-supported ratio from the CellChat method (Figure S5C; Table S4). In addition, MDIC3 achieved a higher AUC than CellChat (MDIC3, 0.711; CellChat, 0.38) (Figure S5E).
Furthermore, we compared the results of MDIC3 with a two-sided self-representation model. The two-sided self-representation model identified 13 intercellular communications among five human islet endocrine cell types (alpha, beta, delta, epsilon, and PP cells),35 which is slightly higher than MDIC3. However, when both methods have 10 results that could be supported by the literature, only 2 results from MDIC3 had no supported literature, while there were 3 results from the two-sided self-representation model that were not supported by the literature (Figure 5A). Using the human AD scRNA-seq dataset,30 the two-sided self-representation model identified 25 intercellular communications, with 44% (11 of 25) of these communications being supported by literature (Figure S5A). However, the literature-supported ratio was significantly lower than the 51.85% (14 of 27) achieved by MDIC3 (Figure S5A). Finally, with the scRNA-seq mouse brain dataset,66 the two-sided self-representation model found only 29 intercellular communication results that could be supported by the literature, with 12 intercellular communication results lacking literature support. In contrast, MDIC3 found 34 intercellular communication results that could be supported by the literature and 9 that lacked literature support (Figure S5C).
In addition to comparisons at the single-cell data level, we also considered the cell type communications enriched in spatial proximity. We used a mouse brain 10X Visium spatial dataset including 1,073 spots. We first deconvoluted the mouse brain 10X Visium spatial dataset using Seurat (https://satijalab.org/seurat/articles/spatial_vignette.html) to predict the proportion of the eight cell types in each spot. Then, the cell type with the largest proportion of each spot was considered as the label of the spot (Figure S6A), and this label was used to investigate potential communications between spots using MDIC3 and CellChat. We assumed that the communications between two cell types with neighboring spots were considered true communication, and the AUCs from MDIC3 and CellChat were 0.572 and 0.432 (Figure S6B; Table S5), respectively. It is worth noting that some of the communications judged to be false positives may be due to multiple cell types mixing in spatially proximate spots. Therefore, these existing methods may not be suitable for the evaluation of cell-cell communications for spatial datasets.
Investigating cell-cell communication in 4 hpf zebrafish embryos
To better demonstrate the applicability to any species of MDIC3, we applied MDIC3 to zebrafish species. Cell-cell communication may be accomplished through contact among different cell types or mediated by proteins and may occur early during cell type determination.98 We used the 4 hpf (hours post fertilization) zebrafish embryo scRNA-seq dataset99 to explore cell-cell communications during early zebrafish embryonic development. There are four cell types detected in the 4 hpf zebrafish embryo, namely, epiblast, mesoderm, germline, and enveloping layer (EVL).
Epiblast cells play an important role during early zebrafish embryonic development, with the ability to generate epidermal and neural ectoderm.100 MDIC3 found communication between epiblasts and mesoderm, which is consistent with a previous report.98 The interactions between the initial cells of the mesoderm and epiblast determine the neuroectoderm and promote the development of the neuroectoderm.98 MDIC3 also found that the significant cell-cell communications at the 4 hpf zebrafish embryonic stage were centered around epiblast cells (Figure S7). Bidirectional paracrine signaling exists among the epiblast and mesoderm, germline, and EVL. This phenomenon of cell-cell communication centered around epiblast cells may be related to the formation of intercellular bridges during the early stages of zebrafish embryonic development.101 Caneparo et al.101 found that intercellular bridges join a significant fraction of epiblast cells in the zebrafish embryo, reaching several cell diameters in length and spanning across different regions of the developing embryos, and the rate of transfer of membrane proteins along the intercellular bridges is fast enough to mediate cell-cell communication during gastrulation and neurulation. In addition, the communication between EVL and epiblasts may be related to the spatial location of the two different cell types. EVL is the outer epithelial monolayer that protects the embryo and surrounds the entire embryo, including epiblast cells.98^,^101 This proximity in spatial location is probably the main reason for the existence of communication between EVL and epiblast cells.
Overall, we applied the MDIC3 algorithm to organisms beyond mice and humans, demonstrating that the MDIC3 algorithm has a wide range of applications for different biological species.
Identifying key L-R pairs from cell-cell communication
To further explain the mechanisms of cell-cell communications predicted by MDIC3, we identified the specific L-R pairs that participated in corresponding cell-cell communications.
We identified L-R pairs involved in communication between the inflammatory FIB and the inflammatory DC cell type in the human AD LS skin cell dataset30 using MDIC3 and CellChat. MDIC3 identified 2 L-R pairs, while CellChat identified 3 L-R pairs, and the results of MDIC3 are all included in the results of CellChat (Table S6). CCL19-CCR7 is the L-R pair involved in communication between the inflammatory FIB and the inflammatory DC, identified by immunofluorescence experiments in the original human LS skin cell dataset literature.30 In addition, CXCL12-CXCR4 is the L-R pair that plays a role in inflammatory diseases of the skin, including AD,102 and CXCL12-CXCR4 signaling is often associated with inflammatory cells.103 Regarding the L-R pair MIF-(CD74+CXCR4), there are no reports in the literature on its involvement in the communication process of inflame FIB and inflame DC. In addition, as the ligand MIF is significantly expressed in more than half of the cell types, it may not be a ligand gene with cell type specificity. Therefore, MIF-(CD74+CXCR4) may not be a specific L-R pair for communication between inflame FIB and inflame DC.
We also identified the L-R pairs significantly involved in different cell-cell communications under the E14.5 mouse skin dataset56 using MDIC3 and CellChat. We explored L-R pairs involved in cellular communication between the epidermal cell type basal-P and the dermal cell type FIB-B. MDIC3 identified 17 L-R pairs, while CellChat identified 8 L-R pairs. Four L-R pairs were identified to overlap between the two methods (Table S7). We found that 12 of the 13 pairs from MDIC3-specific results were WNT signaling LR pairs, which is consistent with literature reports that epidermal cells control and induce hair follicle development through WNT signaling to coordinate signaling crosstalk between the epidermis and dermis.57 In addition, WNT signaling has also been reported to be essential for fibroblast proliferation in hair follicle development.58 The 4 pairs of CellChat-specific results were dominated by the ligands Ptn and Mdk; however, the genes Ptn and Mdk were significantly expressed in more than half of the cell types, and thus, they may not be ligand genes with cell type specificity. Furthermore, we explored L-R pairs involved in autocrine communication of the epidermal cell type basal. MDIC3 identified 34 L-R pairs, CellChat identified 9 L-R pairs, and there was no overlapping pair between the results from MDIC3 and CellChat and the results (Table S8). We found that 24 of the 34 L-R pairs resulting from MDIC3 were WNT signaling L-R pairs, which may be related to the experimentally discovered involvement of the WNT signaling pathway in autocrine secretion in the basal layer of the epidermis.104 Additionally, biological experiments have shown that WNT autocrine signaling from epidermal cells promotes induced hair follicle development.58 In contrast, 8 of the 9 pairs of L-R results from CellChat resulted in ligand signaling by the genes Ptn and Mdk (Table S8), which are significantly expressed in more than half of the cell types and therefore may not be ligand genes with cell type specificity.
The main purpose of applying cell-cell communication analysis is to explain cell functions through L-R pairs. By comparison with CellChat, MDIC3 can not only investigate cell-cell communication without prior knowledge but also identify important and specific L-R pairs in cellular communication.
Discussion
In prior research, the single-cell gene expression matrix was decomposed by the one-sided self-representation model to reveal gene regulatory information. However, different variants of the one-sided self-representation model show that single-cell gene expression data not only contain regulatory information among genes but also contain regulatory information among cells. In fact, the expression of a gene in a cell is typically controlled by two pathways. One path is that the gene can be directly regulated by other upstream genes or factors in the same cell. Another path is that the cell can be affected or mediated by other cells in the same tissue or organ, and the gene expression in the cell can be indirectly regulated by other cells. Therefore, a single-cell expression matrix should contain both regulatory information among genes and crosstalk information among cells. Matrix decomposition or factorization can be used to reveal the discipline of regulation or crosstalk among cells from the expression matrix.
MDIC3 is a new method to investigate cell-cell communication by matrix decomposition with a GRN. These are three main advantages of MDIC3 compared with other existing methods. First, the MDIC3 method is based on matrix decomposition for inferring cell-cell communications and does not require any prior biological information, such as the L-R database. Consequently, MDIC3 is not affected by the quality of existing L-R information. In contrast, other methods, such as CellChat, CellPhoneDB, and iTALK, investigate cell-cell communications on the basis of L-R coexpression, making the final results more sensitive to the completeness and accuracy of L-R information (Table S9). Second, MDIC3 does not rely on prior information, allowing it to be applied to any species. However, most other methods investigate cell-cell communications on the basis of L-R information, limiting their applicability to species for which L-R information is available (Table S9). Existing L-R studies are mainly focused on humans, and the collection and establishment of L-R bioinformatics databases are also mainly focused on humans. Although there are a few L-R databases for other species, such as the CellChatDB L-R database including mice, establishing L-R databases for all species is challenging. Therefore, most existing methods are limited by the L-R information collection of species. Third, MDIC3 infers that cell-cell communications do not depend on specific L-R signaling and at the global cell level. In contrast, CellChat, CellPhoneDB, and iTALK investigate cell-cell communications at the local cell level on the basis of specific L-Rs or pathways (Table S9). MDIC3 is in line with the biological nature of cell-cell communication, as cell-cell communication is the result of interactions between individual cells via multiple L-Rs and pathways.
Current studies on intercellular communication mainly rely on gene coexpression of L-R pairs in different cells.1 However, building a complete and accurate L-R database is a challenging task. To date, there are still many L-R pairs waiting to be discovered, including some receptors lacking known ligands, such as DR6 (TNFRSF21), RELT, TROY, and NGFR from the TNF receptor family,105^,^106 and some ligands without receptors, such as IL17D.105^,^107 In addition, with the development of intercellular communication research, more and more information other than L-R pairs has been taken into account, such as the secretion of metabolites.108^,^109 Therefore, existing methods are unable to investigate cell-cell communications with high completeness and accuracy until a comprehensive L-R database is established. To our knowledge, MDIC3 is a rare cell-cell communication algorithm that is not limited to the dependence on biological prior information of specific species. The output of MDIC3 provides a global view of cellular crosstalk at the cell level and can be interpreted as all the cellular communication pairs across all known and unknown L-R pairs, rather than focusing on any specific L-R pairs. The cell-cell communications from MDIC3 avoid the effect of unknown L-R pairs and are in line with the biological nature. Similarly, GraphFP,110 like MDIC3, does not rely on L-R databases in its cell-cell interaction prediction process, and its results also provide an overall perspective. However, GraphFP is designed for predicting cell-cell interactions on the basis of a large amount of dynamic time-series data, making it more suitable for the analysis of time-series data with multiple time points and may be more advantageous when exploring changes in cell-cell interactions at different time points. More details about the relationship between the two types of approaches can be found in Note S1.
The successful performance of MDIC3 is attributed to three aspects. First, MDIC3 uses GRNs to extract gene regulatory information. The GRN can usually be inferred by gene expression data without prior knowledge, but the accuracy of the GRN can affect the results of cell-cell communication for MDIC3. Therefore, an efficient and accurate method to investigate GRNs from gene expression data is an important tool for this study. Here, we used the GNIPLR111 algorithm (gene network inference on the basis of projection and lagged regression) to investigate GRNs with high accuracy. In fact, GRN can cover most of the known and unknown gene regulatory information, among which the secretion of L-R and the activity of downstream factors are also included in the results of gene regulation. In addition, it should be noted that MDIC3 is not limited by GNIPLR, and any tool that infers regulatory networks can be used for MDIC3, and the quality of the regulatory network may affect communication inference results. Second, it is worth noting that most existing methods analyze the interaction among cell types, but it cannot be ignored that the life activities of multicellular organisms depend on the mutual communication between a large number of individual cells. Communications between cell types inferred by MDIC3 are based on the global integration of the communication network between individual cells. This derivation in a more detailed dimension greatly enhances the plausibility of the MDIC3 results and provides an effective tool for studying the interactions between individual cells. Considering the high drop-out effect in each single cell, we also explored whether different gene coverage affects the prediction results of MDIC3 (Note S2). We found that the MDIC3 algorithm is stable and robust, and the results from different gene coverages are also acceptable (Table S10). Finally, after decomposing the original gene expression data, the obtained singular matrix extracts the association information between genes and cells, which is an important hub connecting the relationship between genes and cells. Although there may be some missing information in the right submatrix when the number of cells is greater than the number of genes, we believe that adjusting according to the method of cell sampling can be considered a reasonable processing method.
For more benchmarking approaches, we calculated whether the global communication results of MDIC3 were significantly correlated with the expression of all L-R pairs from different L-R databases of different algorithms (Note S3). By applying this approach to the human LS skin dataset, we observed a significant correlation between the MDIC3 results and the overall expression of L-R pairs (Note S3). We also benchmarked how well the matrix decomposition actually works by calculating the mean squared error (MSE) between the reconstructed matrix and the original single-cell gene expression matrix (Note S4). Our findings indicated that the MSEs for the human LN dataset, the E14.5 mouse skin dataset, and the mouse brain dataset were 0.22, 0.25, and 0.24, respectively (Note S4).
MDIC3 is species-independent for analyzing cell-cell communication. To validate its effectiveness, we applied MDIC3 to five recently published scRNA-seq datasets13^,^30^,^35^,^56^,^66 from two species, human and mouse. As the communication network of individual cells is difficult to verify, we therefore demonstrated MDIC3’s ability to investigate communication networks among cell types. Ultimately, we found many inferences that were largely consistent with known biological conclusions. Furthermore, we compared the inference ability of MDIC3 with other existing methods in a unified manner and found that some known biological communication conclusions can be inferred by MDIC3 but do not appear in the results of other methods. Then, we also found some known L-R pairs that were not included in the existing L-R database. This observation not only underscores MDIC3’s ability to investigate more comprehensive results compared with other methods but also reveals the challenges associated with constructing and updating L-R databases. Furthermore, we evaluated the performance of MDIC3 using a spatial dataset from mice. We also extended the application of MDIC3 to zebrafish, demonstrating its potential for studying cell communication in diverse species. We recorded the computational cost of MDIC3 in both simulation datasets and real datasets (Note S5), and we considered that the computational costs of MDIC3 are acceptable for a personal computer (Table S11 and Table S12).
Experimental procedures
Resource availability
Lead contact
The lead contact for questions about this paper is Xiaoping Liu, who can be reached at [email protected].
Materials availability
No unique materials were generated from this study.
Data and code availability
This paper analyzes existing, publicly available data. The human LN dataset can be accessed via ImmPort repository (https://www.immport.org/shared/home): SDY997.13 The human LS skin dataset can be obtained via Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) database accession number GEO: GSE147424.30 The human NL skin dataset can be obtained via GEO: [GSE147424](GSE147424).30 The human islet dataset can be downloaded via GEO: GSE116753.35 The E14.5 mouse skin dataset can be obtained via GEO: GSE122043.56 The E13.5 mouse skin dataset can be obtained via GEO: GSE122043.56 The mouse brain dataset can be obtained via GEO: GSE60361.66 The 04hpf zebrafish embryo dataset can be obtained via GEO: GSM3067189.99 The mouse brain 10X Visium spatial dataset can be obtained from the 10X Genomics website: https://www.10xgenomics.com/resources/datasets/mouse-brain-serial-section-1-sagittal-anterior-1-standard-1-0-0. MDIC3 is publicly available as a Python package. Source code and tutorials have been deposited at the GitHub repository (https://github.com/LYxiaotai/MDIC3). All original code has been deposited at figshare (https://doi.org/10.6084/m9.figshare.24486046.v1) and is publicly available as of the date of publication.112 Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
Methodology
Examination of the cell-cell communication network using MDIC3
Given a single-cell gene expression matrix with size as input, the matrix contains genes and cells. Previous studies113^,^114^,^115^,^116 have focused on decomposing the matrix by using the one-sided self-representation model to investigate gene regulatory relationships. The formula is as follows:
where matrix is of size .
Equation 1 is based on single-cell gene expression data, and the matrix with size represents the GRN adjacency matrix among the genes, which means that the single-cell gene expression data itself contain gene regulation information.
Similarly, the matrix can also be decomposed by another one-sided self-representation model:
where the matrix with size in Equation 2 is able to represent the regulatory relationship among the cells. Considering that cell-cell communication can be regarded as regulatory among cells, the matrix may represent the adjacency matrix of the cell-cell communication network, which suggests that the single-cell gene expression data itself also contain cell regulation information.
However, using Equation 2 directly may be limited by the requirement for time-series data.
Notably, by combining the Equation 1 and Equation 2, single-cell gene expression data not only contain regulatory information among genes but also contain regulatory information among cells. Using matrix decomposition can reveal the discipline of regulation from the expression matrix. Therefore, we combined Equation 1 and Equation 2 to obtain the two-sided self-representation model for factorizing the single-cell gene expression matrix :
However, Equation 3 uses the original expression matrix as the middle submatrix, and the original single-cell expression matrix may always contain other noise.
Here, we propose a matrix decomposition-based method called MDIC3 to investigate the cell-cell communication network only from a single-cell gene expression matrix without relying on any prior knowledge, such as L-R pairs. Different from Equation 2, the input single-cell gene expression matrix in the MDIC3 model is not constrained by time-series data. Different from Equation 3, MDIC3 uses the singular matrix to better extract the information containing the relationships between genes and cells.
The cell-cell communication network can be described as a weighted directed graph that connects individual cells (Figure 1). The cell-cell relationships in the weighted directed graph consist of two parts: communication direction and communication strength. Specifically, an arrow from point to point in the weighted directed graph represents that the communication signal direction is sent from cell and received by cell , and the weight of the arrow from point to point represents the communication strength when the signal is sent from cell and received by cell .
Cell-cell communication represents a special type of regulatory relationship among cells. MDIC3 aims to investigate the regulatory relationship among cells on the basis of the regulatory relationships among genes and considers that cell-cell communication information can be derived from the single-cell gene expression profile itself. MDIC3 factorizes the matrix into three matrices , , and (Figure 1):
First, the matrix is the left submatrix with a size of , denotes the number of genes in single-cell expression matrix , and this matrix denotes the relationship among the genes. MDIC3 uses the GRN adjacency matrix of matrix as the left submatrix in Equation 4.
Then, the matrix is the middle submatrix with dimension , and this matrix connects the relationships between genes and cells. Considering that the singular matrix has the ability to extract the features of the matrix , MDIC3 uses the singular value matrix of matrix as the middle submatrix in Equation 4. The singular matrix corresponding to matrix can be obtained by decomposing matrix using the singular value decomposition method.
Finally, the matrix is the right submatrix with a size of , denotes the number of cells of matrix , and this matrix reflects the relationship among the cells.
On the basis of the adjacency matrix of the GRN, the single-cell expression matrix , and the singular value matrix , MDIC3 can obtain the cell-cell communication network by solving the right submatrix of Equation 4:
Here, Equation 5 is only applicable when the number of cells and the number of genes are equal and matrix is invertible. However, in most cases, the number of genes and the number of cells are not equal, which means that and is a nonsquare matrix. In addition, even if is a square matrix, its inverse matrix does not always exist.
Therefore, we adopt the pseudoinverse matrix to solve the matrix :
Each item in indicates how individual cells send to or receive from signals of others, and matrix is the cell-cell communications inferred from Equation 6.
To be more specific, item represents the value in the th row and the th column in matrix , when , represents the autoregulation of cell , and when , represents the signals between cells and . The sign of represents the communication direction between the cell and , and denotes the communication strength between the th and th cell. That is, if , the communication signal is sent from cell to cell ; if , the communication signal is sent from cell to cell . The final communication between cell and cell is determined jointly by and .
In summary, MDIC3 can investigate intercellular relationships on the basis of intergenic relationships from single-cell expression profiles, so we can uncover the network of cell communication among individual cells without using prior knowledge such as L-R databases.
Construction of GRNs (optional)
The GRN is a type of biological network that reflects the regulatory relationships among genes.117 The MDIC3 method uncovers the intercellular relationships among cells on the basis of the GRN, which can be determined from the single-cell expression profiles. Here, we use the GNIPLR111 algorithm, taking the single-cell gene expression matrix with size as input to investigate its GRN (Note S6). The GNIPLR algorithm projected gene data twice using the LASSO projection algorithm and the linear projection approximation to produce a linear and monotonous pseudotime series and then determined the direction of regulation in combination with lagged regression analyses.111 Each node of the GRN represents a gene, and each edge from gene to gene indicates that gene can regulate gene in the network (Figure 1). The GRN can be expressed as an adjacency matrix with size , where is the th row and th column in and the value of represents the regulatory strength from gene to gene (Figure 1).
However, it should be noted that any tool capable of inferring GRNs can potentially replace the GNIPLR algorithm. GRNs with varying quality can impact the inference results of cell-cell communication from MDIC3, yet most existing GRN tools yield similar results regarding cell-cell communication from MDIC3, albeit with subtle differences.
We used two regulatory network inference tools (GENIE3118 and GRNBoost119) to replace the GRN from GNIPLR and compared the cellular network from MDIC3 under the three tools for regulatory network inference tools (GNIPLR, GENIE3, and GRNBoost) in the human LS skin cell dataset and the human islet endocrine cell dataset. The results of the cell-cell communication network from the three regulatory networks show differences, albeit subtle, among the three GRNs (Table S13). MDIC3 performs better when using the GRN from the GNIPLR algorithm. We believe that MDIC3 is sufficiently robust, and users can also provide their own GRN results for use in MDIC3 inference.
The method of identifying L-Rs from cell-cell communication
We extracted L-R pairs from cell-cell communication and validated the importance of the L-R pairs in the literature. Considering that the CellChatDB L-R database contains both human and mouse L-R information, we obtained human and mouse L-R genes from CellChatDB.
We used a simple method to identify key L-R pairs from cell-cell communication. We can let the gene set contained in the gene expression data be and the gene set contained in the L-R information extracted from CellChatDB be . We first match the gene set and the gene set and then obtain the gene set , .
Then, genes in the gene set were further screened to identify genes with cell type expression specificity: if a gene is significantly expressed in more than half of the cell types, we consider it a universally expressed gene and conclude that it does not possess cell type-specific expression. Therefore, we removed the gene from the gene set . Through these filtering steps, we obtain a new gene set .
Next, on the basis of the gene set , we aim to identify L-R pairs involved in the communication from cell type A (containing cells) to cell type B (containing cells) in the following steps:
- (1)For a specific cell type, genes whose zero expression values are in more than 70% of cells in a cell type were removed.
- (2)For a pair of L-R gene pairs L-R, the ligand gene expression levels of the cells contained in cell type A were sorted in descending order, and the sorted result was recorded as . The receptor gene expression levels of the cells contained in cell type B were sorted in descending order, and the sorted result were recorded as 2 .
- (3)The correlation between the expression of ligand gene and the expression of receptor gene was calculated. If , Pearson’s correlation coefficient and significant p value between and were calculated. If , Pearson’s correlation coefficient and significant p value between and were calculated.
- (4)The Bonferroni correction method was used to adjust the p values obtained in (3) for significant receptor pairs. The L-R pairs with corrected p values less than 0.01 were considered significant.
Comparison with other common methods
L-R pairs are used by existing methods as input, but unlike existing methods, MDIC3 only uses L-R pairs for validation and comparison. The analysis of L-R pairs covered in the paper is mainly used for validation of the results and comparison with other tools. To compare with the other three existing tools, we selected L-R pairs that have been reported to be relevant to the biological context of the dataset. We used three human datasets, i.e., the human LN dataset,13 human LS skin dataset30 and human islet dataset35 (Table S1), to compare the results of MDIC3 with three other well-known methods (CellChat,4 CellPhoneDB,7 and iTALK8). As only CellChat provided the mouse L-R database, we only compared the results of MDIC3 with CellChat for two scRNA-seq mouse datasets: the E14.5 mouse skin dataset56 and the mouse brain dataset66 (Table S1). Furthermore, we compared the results of MDIC3 with CellChat for a mouse brain 10X Visium spatial dataset.
We use the AUC to evaluate the accuracy of MDIC3 and other tools. First, we aggregated the communication results from the three other methods (CellChat, CellPhoneDB, and iTALK). As the inferred communications of most existing methods are limited to specific L-R pairs, most of them only output the communication strength on the basis of specific L-R pairs, except CellChat. Accordingly, we calculated each cell-cell communication by aggregating all the communication probabilities under all L-R pairs between the two cells, referring to the process of CellChat. The three methods were used by their default parameters and the L-R database. More details for obtaining the aggregated communication results from other methods can be found in Note S7. Then, we consider the mean value of all cell type communication results as the threshold and calculate the threshold for each method. On the basis of the thresholds of each method, the communication results can be divided into two groups: communications above the threshold value were placed into the communication group (marked as “1” in Tables S2–S4), while those below the threshold were categorized as noncommunication (marked as “0” in Tables S2–S4). We then considered cellular communications supported by the literature as positive samples and those not supported by the literature as negative samples, treating the results as ground truth. Finally, the true positive rate and false-positive rate were calculated by comparison with the ground truth. Therefore, the performance evaluation of the communication results became a dichotomous problem and was evaluated using the AUC. The rules for selecting the literature are as follows: (1) the interaction between two cell types had been reported in the literature, and (2) the existing L-R had been mentioned in the corresponding cells.
We compared the AUC of MDIC3 with CellPhoneDB and iTALK by using the human LS skin scRNA-seq dataset30 and human islet scRNA-seq dataset.35 Additionally, we compared the AUC of MDIC3 with CellChat using the human LS skin scRNA-seq dataset,30 human islet scRNA-seq dataset,35 and mouse brain scRNA-seq dataset.66 Furthermore, we compared the AUC of MDIC3 with the two-sided self-representation model using both human and mouse scRNA-seq datasets (Note S8).
Moreover, we also compared the AUC of MDIC3 with CellChat using a mouse brain 10X Visium spatial dataset. We assumed that communication exists only between spatially proximate cell types, where two neighboring spots with main cell types are considered to communicate with each other. If cells are not spatially proximate but are predicted to communicate, that would be a false positive. If cells are spatially proximate but not predicted to communicate, that would be a false negative.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Armingol E.Officer A.Harismendy O.Lewis N.E.Deciphering cell-cell interactions and communication from gene expression Nat. Rev. Genet.22202171883316896810.1038/s 41576-020-00292-x PMC 7649713 · doi ↗ · pubmed ↗
- 2Ramilowski J.A.Goldberg T.Harshbarger J.Kloppmann E.Lizio M.Satagopam V.P.Itoh M.Kawaji H.Carninci P.Rost B.Forrest A.R.R.A draft network of ligand-receptor-mediated multicellular signalling in human Nat. Commun.6201578662619831910.1038/ncomms 8866 PMC 4525178 · doi ↗ · pubmed ↗
- 3Zhang Y.Liu T.Hu X.Wang M.Wang J.Zou B.Tan P.Cui T.Dou Y.Ning L.Cell Call: integrating paired ligand-receptor and transcription factor activities for cell-cell communication Nucleic Acids Res.492021852085343433144910.1093/nar/gkab 638PMC 8421219 · doi ↗ · pubmed ↗
- 4Jin S.Guerrero-Juarez C.F.Zhang L.Chang I.Ramos R.Kuan C.H.Myung P.Plikus M.V.Nie Q.Inference and analysis of cell-cell communication using Cell Chat Nat. Commun.12202110883359752210.1038/s 41467-021-21246-9PMC 7889871 · doi ↗ · pubmed ↗
- 5Jin S.Mac Lean A.L.Peng T.Nie Q.sc Epath: energy landscape-based inference of transition probabilities and cellular trajectories from single-cell transcriptomic data Bioinformatics 342018207720862941526310.1093/bioinformatics/bty 058PMC 6658715 · doi ↗ · pubmed ↗
- 6Yuan G.C.Cai L.Elowitz M.Enver T.Fan G.Guo G.Irizarry R.Kharchenko P.Kim J.Orkin S.Challenges and emerging directions in single-cell analysis Genome Biol.182017842848289710.1186/s 13059-017-1218-y PMC 5421338 · doi ↗ · pubmed ↗
- 7Vento-Tormo R.Efremova M.Botting R.A.Turco M.Y.Vento-Tormo M.Meyer K.B.Park J.E.Stephenson E.Polański K.Goncalves A.Single-cell reconstruction of the early maternal-fetal interface in humans Nature 56320183473533042954810.1038/s 41586-018-0698-6PMC 7612850 · doi ↗ · pubmed ↗
- 8Yuanxin W.Ruiping W.Shaojun Z.Shumei S.Changying J.Guangchun H.Michael W.Jaffer A.Andy F.Linghua W.i TALK: an R Package to Characterize and Illustrate Intercellular Communication Preprint atbio Rxiv 2019
