Integrative RNA-Seq and TCGA-BRCA Analyses Highlight the Role of LINC01133 in Triple-Negative Breast Cancer
Leandro Teodoro Júnior, Henrique César de Jesus-Ferreira, Mari Cleide Sogayar, Milton Yutaka Nishiyama-Jr.

TL;DR
This study explores how the lncRNA LINC01133 influences gene expression in triple-negative breast cancer, suggesting it may play a role in tumor progression.
Contribution
The study identifies LINC01133 as a novel lncRNA associated with TNBC progression through integrative RNA-Seq and TCGA-BRCA analyses.
Findings
LINC01133 knockout in TNBC cells leads to dysregulation of genes involved in cell adhesion, EMT, and ECM remodeling.
Lower LINC01133 levels in TCGA-BRCA tumors correlate with expression shifts in ECM/EMT-related genes.
Key DEGs like ITIH5, GLUL, and CACNB2 are linked to migration, invasion, and ECM remodeling in TNBC.
Abstract
Background: Triple-negative breast cancers (TNBCs) are among the most aggressive breast tumors, due not only to the absence of clinically functional biomarkers used in other molecular subtypes, but also their marked heterogeneity and pronounced migratory and invasive behavior. The search for new molecules of interest for risk prediction, diagnosis and therapy stems from the class of long non-coding RNAs (lncRNAs), which often display context-dependent (“dual”) functions and tissue specificity. Among them, lncRNA LINC01133 stands out for its dysregulation across cancer, although its molecular role in TNBC remains unclear. Methods: In the present study, we used the human TNBC cell line Hs578T to generate a cell panel comprising the parental line (Hs578T_wt), the control line (Hs578T_ctr), and the LINC01133 knockout line (Hs578T_ko). Subsequently, we performed bulk RNA-Seq to identify…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8- —CAPES
- —CNPq
- —FAPESP
- —INCT-Regenera
- —Productivity Award Process
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer-related molecular mechanisms research · Cancer Mechanisms and Therapy · Ferroptosis and cancer prognosis
1. Introduction
Breast tumors are a multifaceted group of pathologies characterized by their high incidence and lethality worldwide, representing the second most common and fourth most lethal cancer in 2022 (excluding non-melanoma skin tumors) [1]. The challenges posed by their treatment are correlated with their high heterogeneity, primarily defined by the existence of four classic molecular subtypes, classified by the expression of three main molecules, which are also used in clinical medicine as biomarkers and therapeutic targets of ubiquitous interest [2], namely: the estrogen receptor (ER, Luminal A subtype), the progesterone receptor (PR, Luminal A/B subtypes), and the HER2 protein (HER2+ subtype) [3,4]. The former two are linked to hormonal therapies, while the HER2 protein, which is associated with cell proliferation processes, is the primary target of the HER2+ subtype in anti-HER2 targeted therapy [5]. TNBC, unlike the other subtypes mentioned, presents a significant clinical challenge due to the lack of clinically active expression of those three biomarkers, endowing this subtype with an inherently more aggressive character, both due to its silent and significantly proliferative tumorigenic process and its aggressiveness in more advanced stages, its marked migratory rate, and high rate of recurrence with extremely poor prognosis [6].
However, beyond the usual characteristics of TNBC, there is also great difficulty in identifying potential molecules of interest due to its high internal complexity [7]. This is due to the existence of several tumor subclasses that present not only a differential expression of molecules that can render TNBC more inflammatory, invasive and/or secretory, but also intratumoral dynamics and tumor microenvironment that overactivate all pathways linked to tumor progression, further complicating its prediction, diagnosis, and prognosis [8,9].
Heterogeneity and complexity underscore the urgent need to identify new translational biomarkers that resolve TNBC heterogeneity and illuminate mechanisms associated with its malignancy grade, serving as biomarkers and/or guides for new therapeutic strategies in TNBC [10].
Among the main classes of biomolecules focused in Oncology are the lncRNAs, which emerged two decades ago as versatile regulators of the primary mechanisms of gene expression and tumor-associated phenotypes [11,12]. Their functions are as diverse as the number of lncRNAs of interest discovered in recent years, acting through mechanisms that include chromatin remodeling [13], epigenetics modulation [14], post-transcriptional and post-translational regulation [15,16], miRNA sequestration (acting as sponges) [17], and protein scaffolding that modulates signaling pathways [18]. Furthermore, lncRNAs can also have dual functions, displaying the capacity to act both as tumor suppressor molecules and oncogenic molecules, depending on cellular context [19,20].
Many lncRNAs have been associated with numerous normal and pathological functions, including the well-known lncRNAs: HOTAIR, MALAT1, TUG1, and FOXCUT, which exhibit diverse actions across tumor types [21]. More recently, LINC01133 has emerged as a molecule of interest due to its prominent context-dependent activities, particularly attributed to its different isoforms, which exhibit diverse interaction capacities [22,23,24].
Molecular and structural analyses indicate that LINC01133 has multiple annotated transcriptional isoforms and contains conserved regions predicted to form internal loops and stable secondary structures, which function as platforms for interaction with both proteins and small RNAs, suggesting a range of modulatory functions that may be of interest to different tumors [23]. Functionally, LINC01133 has been associated with regulatory processes of critical tumor progression pathways, mediating processes linked to EMT, cell migration and adhesion, as well as invasion and distant metastasis.
In breast tumors, recent studies have linked increased LINC01133 expression to tumor stem cell proliferation and modulation of tumor proliferation pathways, such as the PI3K/Akt and Wnt/β-catenin pathways [25]. Other studies refer to LINC01133 as a tumor suppressor, correlating gene knockdown assays with increased invasion and metastasis, and involving protein regulators such as EZH2 and effector protein molecules such as SOX4 [26]. Some studies correlate the modulatory action of LINC01133 on factors such as KLF4 in the tumor microenvironment with a role in cell differentiation mechanisms in the basal-like breast tumor subtypes [27]. Given the presented landscape, integrative approaches that combine the comparative study of LINC01133 expression with comparative analyses of patient cohorts may be particularly informative, providing relevant insights into the dynamics of LINC01133 expression in TNBC.
This study focused on the human Hs578T TNBC cell line (HTB-126, ATCC), which, unlike the human TNBC MDA-MB-231 cell line (HTB-26) that widely used in breast cancer studies, has been reported to exhibit lower expression of genes linked to proliferation pathways, such as EGF and TGF-β, in addition to exhibiting less clonal heterogeneity [28]. In this study, the LINC01133 gene was knocked out by transfection using the CRISPR/Cas9 system to define its transcriptional impact on pathways related to tumorigenesis and tumor progression. From RNA-Sequencing assays, transcriptomic analysis, and pathway enrichment, DEGs and associated ontological pathways were identified, which were integrated with basal-like cases from the TCGA-BRCA aiming to evaluate concordance and translational relevance linked to the DEGs between parental and the LINC01133 knockout lineage. This comparative strategy aims to reconcile in vitro perturbation with patient data and refine hypotheses about how LINC1133 may modulate TNBC biology.
2. Materials and Methods
2.1. Cell Lineages
Four cell lines were used in this study. The human MCF10A wild-type cell line (ATCC, Manassas, VA, USA) served as a non-tumor control in the knockout stability assay. The Hs578T wild-type cell line (Hs578T_wt, ATCC, Manassas, VA, USA) was used as the parental cell line in biomolecular assays and served as the basal cell line for the generation of genetically modified (GMO) cell lines. The control GMO Hs578T cell line (Hs578_ctr) was generated by inserting closed and empty plasmid vectors. The knockout cell line (Hs578T_ko) used in transcriptomic assays was generated by the same method, but with the insertion of plasmid vectors containing gRNAs and the Cas9 sequence for knockout in the canonical exonic region of LINC01133. The MCF10A cell line was cultured in DMEM/F12 1:1 culture medium (Gibco, Grand Island, NY, USA), supplemented with 20 ng/mL hydrocortisone (Sigma-Aldrich, St. Louis, MO, USA), 10 µM cholera toxin (Sigma-Aldrich), and 5% fetal equine serum (FES, Sigma-Aldrich, St. Louis, MO, USA). Hs578T cell lines were cultured in DMEM High Glucose culture medium (Gibco, Grand Island, NY, USA), supplemented with 10% fetal bovine serum (FBS, Athena, Campinas, São Paulo, Brazil). All culture media were also supplemented with the streptomycin (10 mg/L, Sigma-Aldrich, St. Louis, MO, USA) and ampicillin (25 mg/L, Sigma-Aldrich, St. Louis, MO, USA) antibiotics.
2.2. Generation of GMO Cell Lines and Confirmation of Knockout in Hs578T_ko
Genetically modified cell lines Hs578T_ctr and Hs578T_ko were generated through lentiviral transfection of the 293T cell line (produced using pREV, pTAT, and pHGPM2 vectors) and subsequent transduction into Hs578T_wt. To this end, the lentiviral plasmids pL-CRISPR.EF.GFP (AddGene #57818, Watertown, MA, USA), which expresses the GFP reporter gene, and p-lentiCRISPR-V2 (AddGene #52961, Watertown, MA, USA), which expresses the puromycin resistance gene, as well as the Cas9 enzyme, were used. For the Hs578T_ctr cell line, closed and empty plasmids were added, i.e., without gRNAs. For the Hs578T_ko cell line, gRNAs were integrated into plasmids by digesting both at the BsmBI restriction site, followed by the use of the T4 DNA ligase system to ligate the gRNAs to the plasmids.
Plasmids with four different types of gRNAs were generated (5′ end—gRNA #1: Primer up CACCGTCAACACACATTGCCTCGGG, primer down AAACCCCG AGGCAATGTGTGTTGAC; gRNA #2: Primer up CACCGGTACCATGGCTCTGC CGTGT, primer down AAACACACGGCAGACCCATGGTACC; 3′ end—gRNA #3: Primer up CACCGCAACTCAAACCGTACCACCT, primer down AAACAGGTG GTACGGTTTGAGTTGC; gRNA #4: Primer up CACCGATAAGGCTACCTAGGTGGTA, primer down AAACTACCACCTAGGTAGCCTTATC, designed using the online CRISPR MIT software, v. 2.0, Massachusetts, MA, USA). In total, four GMO cell lines with knockout attempts were generated: Hs578T_ko gRNA 1 + 3; Hs578T_ko gRNA 2 + 3; Hs578T_ko gRNA 1 + 4; and Hs578T_ko gRNA 2 + 4, with definitive knockout occurring in the Hs578T_ko 1 + 3 cell line, which was used in the stability analyses and transcriptomic assays.
The knockout confirmation was performed using biomolecular qRT-PCR assays (to select a cell line effectively knocked out for LINC01133 and to verify the stability of the knockout between cell passages p0 and p25) and genomic PCR with fractionation and visualization through an agarose gel. For the qRT-PCR assay, the RNeasy Mini Kit (QIAGEN) was used for total RNA extraction, followed by cDNA synthesis with the SuperScript III enzyme (Invitrogen, Grand Island, NY, USA). The qRT-PCR was performed via the Viia7 Real-Time PCR System thermocycler (Applied Biosystems, Carlsbad, CA, USA) using the SYBR^®^ Green Master Mix reagent (Applied Biosystems, Carlsbad, CA, USA). The GAPDH, HPRT1, and HMBS genes were used as reference genes. The data obtained were analyzed using GraphPad Prism (v. 10.6, Boston, MA, USA), through FoldChange based on the Hs578T_wt cell line.
For the genomic PCR assay, the Ilustra Tissue & Cells GenomicPrep Mini Spin Kit (GE Healthcare, Chicago, IL, USA) was used for genomic DNA extraction and subsequent amplification in a Life Touch TC-96 thermocycler (Bioer, Hangzhou, China). The amplification product was visualized on a 1% agarose gel (Sigma-Aldrich, St. Louis, MO, USA) using a control primer for the CHD7 gene. Other verification assays were performed and are extensively described in Jesus-Ferreira, H.C., 2021 [29].
2.3. RNA-Sequencing
For RNA-Seq, total RNA was extracted using the TRIzol reagent (Invitrogen, Grand Island, NY, USA), according to the manufacturer’s protocol. The purity and integrity of total RNA were verified using an Agilent 2100 Bioanalyzer with the RNA 6000 Nano assay. Total RNA was quantified using the Quant-iT™ RiboGreen^®^ RNA (Invitrogen, Grand Island, NY, USA).
All samples were prepared using the Illumina TruSeq RNA Library Prep Kit v2, Set A (Illumina, San Diego, CA, USA). For each biological replicate, three technical replicates were generated and subsequently multiplexed for sequencing. Briefly, mRNA was isolated with dT-oligo and purified, followed by fragmentation by heating at 94 °C in fragmentation buffer for 4 min. Double-stranded cDNA was synthesized, end-repaired, and A-tailed. Sequencing adapters were then ligated to the cDNA fragments, according to the manufacturer’s protocol. The cDNA fragments were enriched after 15 cycles of PCR amplification. The quality control of the library was evaluated by size distribution of the cDNAs measured by the 2100 Bioanalyzer with DNA1000 assay (Agilent Technologies, Waldbronn, Germany). An ABI StepOnePlus Real-Time PCR System was used for quantification of the cDNA libraries before sequencing in the Illumina HiSeq 1500 System, using a rapid paired-end flowcell in 302 cycles of 2 × 151 bp paired-end strategy.
2.4. RNA-Seq Data Processing and Quantification
For demultiplexing and separation of FASTQ files corresponding to each sample used in library preparation, raw paired-end reads generated on an Illumina HiSeq 1500 platform, with a mean quality score (Q) ≥ 30, were processed using CASAVA software (v. 1.8.2, Illumina, San Diego, CA, USA) [30]. An in-house pipeline was applied for pre-processing of the raw FASTQ files, including the removal of potential contaminants such as PhiX and UniVec sequences (ftp://ftp.ncbi.nlm.nih.gov/pub/UniVec/, accessed on 21 January 2026), using Bowtie2 (v. 2.2.5, MD, USA) with the parameters --very-sensitive-local and -k 1 [31]. Quality control filtering included the removal of low-quality reads, reads shorter than 40 bp, poly A/T/N tails, adapters, and primer sequences using Trimmomatic software (v. 0.39, Usadel Lab, Aachen, Germany) [32] with the parameters:
High-quality paired-end reads were aligned to the Homo sapiens reference genome (GRCh38.p13, Genome Reference Consortium, RefSeq Assembly GCF_000001405.39, NCBI:GCA 000001405.28) [33] using HISAT2 (v. 2.2.1, Toronto, ON, Canada) [34], considering a minimum intron length of 30 bp and restricting alignments to concordant, uniquely mapped read pairs (--no-mixed, --no-discordant). Gene-level read was quantified using the featureCounts function from the Rsubread Bioconductor package (v. 2.0.2, Melbourne, Australia), with the following parameters:
Differential gene expression analysis was conducted using raw gene counts as input, applying the internal normalization based on size factors implemented in the DESeq2 framework. Counts per million (CPM) values were calculated exclusively for exploratory and descriptive analyses, accounting for differences in library size. Additionally, gene expression abundance was estimated as FPKM and TPM values for visualization and descriptive expression profiling (Figure 1).
DEGs were identified using the DESeq2 Bioconductor package (v. 1.22.2, EMBL, Heidelberg, Germany) [35], based on raw gene-level read counts obtained from featureCounts. Genes with very low expression were filtered out prior to differential expression analysis by requiring a minimum of 10 reads in at least two biological replicates. Differential expression was assessed using the negative binomial generalized linear model implemented in DESeq2, with internal normalization based on size factors. Genes were considered significantly differentially expressed if they presented an adjusted p-value ≤ 0.05 and an absolute log2FoldChange (|log2FC|) ≥ 1.5. Transcriptomic comparisons included: Hs578T_ko vs. Hs578T_ctr (ko_vs_ctr), Hs578T_ko vs. Hs578T_wt (ko_vs_wt), and Hs578T_wt vs. Hs578T_ctr (wt_vs_ctr). The raw data are available at NCBI under BioProject PRJNA1222397 and BioSamples SAMN46782027–SAMN46782035, and the complete DEGs sheets are available in Supplementary Tables S1–S3.
2.5. Enrichment Analysis
All enrichment and statistical analyses were performed in R (RStudio, Posit, 2026.0.1+392, R software v. 4.6, Boston, MA, USA). Differential expression results (log2FC, raw and p-values), were filtered using FDR ≤ 0.05 and |log2FC| ≥ 1.5.
To distinguish transcriptional changes mainly driven by LINC01133 loss from those related to vector-associated background effects, a dominance-based filtering strategy was applied [36,37,38,39]. Genes differentially expressed in either ko_vs_ctr or ko_vs_wt comparisons were first pooled. For each gene, the maximum absolute log2FC observed in the knockout-related comparisons was contrasted with the corresponding effect detected between Hs578T_wt and Hs578T_ctr. Genes were retained as KO-dominant when the magnitude of expression changes in the knockout condition exceeded that observed in wt_vs_ctr by at least 0.5 log2FC units. Genes not detected as differentially expressed between Hs578T_wt and Hs578T_ctr were assigned a baseline WT–CTR effect of zero. Below, the mathematical scheme of KO-dominant delimitation is shown, where g denotes a given gene:
so,
for each g ∈ G_KO_, the dominant knockout association effect was defined by
The vector-associated background effect was defined as:
Otherwise,
*DE: differentially expressed.
Downstream transcriptomic analyses were performed integrating differential expression results generated by DESeq2 with visualization and functional enrichment tools. All comparison DEGs identified from raw count-based analysis were used as input for all subsequent analyses.
A Venn diagram (VennDiagram::venn.diagram() (v. 1.7.3)) [40] was used to represent overlaps among DEG sets obtained from pairwise transcriptomic comparisons (ko_vs_ctr, ko_vs_wt, and wt_vs_ctr). Only statistically significant DEGs passing the defined thresholds were included in the overlap analyses.
The primary heatmap was constructed to visualize expression patterns of selected DEGs across comparisons (pheatmap::pheatmap() (v. 1.0.13)) [41]. Prior to heatmap generation, gene expression values were transformed using regularized logarithmic transformation and subsequently scaled up by z-score normalization across genes, allowing for a comparison of relative expression patterns between experimental conditions. Hierarchical clustering of genes and samples was performed using distance metrics and linkage methods consistent with standard transcriptomic visualization assays.
Volcano plots were generated based on log2FC and statistical significance values derived from DESeq2 results, enabling the simultaneous visualization of effect size and significance across all tested genes. Genes were color-coded according to predefined thresholds of |log2FC| ≥ 1.5 and p-value ≤ 0.05.
Functional enrichment analyses were conducted using the KO-dominant DEG list as input for Gene Ontology (GO) enrichment, including the Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) categories, as well as pathway enrichment analyses (clusterProfiler package) [42]. The major analysis was based on all KO-dominant DEGs (265 DEGs), only upregulated KO-dominant DEGs (103 DEGs), and only downregulated KO-dominant DEGs (162 DEGs) for a better interpretation of the molecular pathways. Enrichment significance was evaluated using ORA based on a hypergeometric distribution, with multiple testing correction applied to control the false discovery rate. Enriched terms were ranked according to adjusted p-values, gene counts, and enrichment scores and visualized using bar plots and dot plot packages to highlight the most significantly affected biological functions and pathways. Quantitative enrichment data are provided in Supplementary Tables S1–S3.
2.6. TCGA-BRCA Comparative Analysis
To investigate whether the transcriptional program associated with the LINC01133 knockout corresponded to expression patterns observed in human tumors, an integrative and translational analysis was conducted using public TCGA-BRCA data via the TCGAbiolinks package [43]. The initial dataset comprised a total of n = 1231 samples from the TCGA-BRCA project (ID: TCGA-BRCA; dbGaP Study Accession: phs000178; Project Name: Breast Invasive Carcinoma), including 1098 tumor samples and 133 normal breast tissue samples, delimited primarily by “gene_id”, “tpm_unstranded”, and “fpkm_unstranded” [44]. For the purposes of this study, analyses were restricted to primary solid tumor samples. Normal tissue samples were excluded prior to molecular subtyping and all downstream analyses. Because subtype labels were not provided, molecular subtyping was attributed using the Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype (AIMS) algorithm [45] (BioConductor v. 3.22, R package v. 1.42.0). Each sample was assigned to an intrinsic breast tumor subtype (Luminal A, Luminal B, HER2+, Normal-like, and Basal-like). Considering the Hs578T in vitro model used in the initial transcriptomics, only the basal-like subset was carried forward, resulting in n = 199 samples for comparative analyses (raw data are available in Supplementary Tables S1–S3).
The RNA-Seq data, previously normalized by the TCGA consortium in TPM format, were transformed by log2(TPM + 1) to stabilize the variance. Then, the expression of each gene was standardized by z-score, calculated per gene across all samples, allowing for relative comparisons between patients regardless of global differences in gene abundance. This approach enabled the characterization of the continuous, heterogeneous distribution of LINC01133 expression in human tumors without assuming binary states of presence or absence of the transcript.
Based on the previously obtained KO-dominant DEGs, the 30 most upregulated and 30 most downregulated genes were selected based on their absolute log2FC values. These gene sets were defined exclusively from the cellular model, without any information from patient data, ensuring independence between the experimental and translational stages.
For the translational analysis, the expression of the selected genes was extracted from the basal-like TCGA-BRCA samples and normalized to gene-level z-scores. The samples were ordered by LINC01133 expression, allowing for visualization of possible transcriptional gradients associated with different levels of this lncRNA in human tumors. Heatmaps were generated separately for the upregulated and downregulated gene sets in the knockout, with hierarchical clustering applied only to the rows (genes), preserving the column ordering based on LINC01133 expression levels. This strategy allowed us to assess whether genes that were modulated to an extreme degree in the cell model showed coordinated or diffuse expression patterns across the LINC01133 expression spectrum in tumors.
In addition, to explore intertumoral heterogeneity in a more refined way, the TCGA-BRCA basal-like samples were stratified according to LINC01133 expression percentiles, and mean normalized expression z-scores were calculated for each gene within these strata. From these data, trend analyses were performed, including the identification of gene clusters with similar behaviors along the LINC01133 expression gradient. This approach allowed us to distinguish subsets of genes whose expression showed a monotonic, inverse, or non-linear association with LINC01133 levels in tumors.
3. Results
3.1. Generation of Hs578T_ko Cell Line
Genetically modified cell lines were generated by lentiviral transduction, as described in the Section 2, using the CRISPR/Cas9 system, as illustrated in Figure 2A. 293T cells were used to produce viral particles containing either a LINC01133-targeting a CRISPR/Cas9 vector or an empty control vector, both of which were subsequently used to transduce wild-type Hs578T cells.
The efficiency of LINC01133 knockout was initially evaluated by qRT-PCR gene expression analysis, revealing a significant reduction in transcriptional levels of LINC01133 in the Hs578T_ko cell lines when compared to the Hs578T_wt and Hs578T_ctr cell lines (Figure 2B), especially in the Hs578T_ko 1 + 3 cell line, which became the only one used due to the highest KO efficiency observed. Importantly, KO lines showed substantially lower residual expression, consistent with effective transcript depletion. In contrast, the Hs578T_ctr cell line exhibited a higher transcriptional expression level than the Hs578T_wt cell line, indicating that the observed reduction is mainly due to gene editing and not to the transduction process.
The stability of the LINC01133 knockout in the Hs578T_ko cell line was evaluated by qRT-PCR after successive subcultures, up to passage 25, as shown in Figure 2C. Hs578T_ko cells showed persistently low levels of LINC01133 over time, confirming the functional effectiveness of the knockout and stability of transcript depletion.
In addition, genome editing was confirmed by PCR using primers internal and external to the LINC01133 target locus (Figure 2D). The amplification pattern observed in the knockout clones was consistent with editing at the genomic locus, corroborating the knockout expression and stability data.
3.2. Identification of Comparison DEGs and KO-Dominant DEGs
RNA-Seq analysis revealed significant transcriptomic alterations associated with LINC01133 loss. Initially, three paired comparisons were performed: ko_vs_ctr, ko_vs_wt, and wt_vs_ctr. To distinguish alterations predominantly assignable to knockout from those related to vector background effects, a filtering strategy based on the dominance effect was applied. In this framework, ko_vs_ctr was treated as the primary contrast, while wt_vs_ctr was used to quantify vector-associated background effects.
Genes were classified as knockout dominant when they showed differential expression in at least one comparison involving knockout and when the absolute magnitude of the effect in knockout exceeded that observed between WT and control by at least 0.5 log2FC units. Genes absent as DEGs in the wt_vs_ctr comparison were considered to have a baseline effect equal to zero for effect-size comparison purposes (importantly, “baseline = 0” refers to the absence of detectable differential expression at the specified thresholds, not literal zero expression).
Based on these criteria, 265 KO-dominant DEGs associated with LINC01133 loss were identified, with one transcript, namely AC127496.7, not identified in downstream analysis. The heatmap shown in Figure 3 demonstrates that this set of genes clearly distinguishes the Hs578T_ko samples from the wild-type and control conditions, indicating a specific transcriptomic state induced by the knockout, with well-defined clusters in the ko_vs_ctr comparison, both in upregulated and downregulated genes, and showing strong convergence with those selected by the KO-dominant filter.
In Figure 3, the Venn diagram shows partial overlap between sets of DEGs across comparisons, indicating that only a fraction of the DEGs is shared across all conditions, which is expected given that WT vs. CTR captures vector/transduction-related effects and KO comparisons capture both knockout-dependent and background components. Accordingly, KO-dominant filtering was used to enrich for changes most plausibly attributable to LINC01133 loss.
The volcano plots illustrate the distribution of DEGs across comparisons, highlighting the broad amplitude and greater magnitude of effects in analyses involving knockouts. The fourth volcano plot in Figure 3 shows the dominant DEGs in Hs578T_ko, with genes exhibiting consistent, quantitatively superior alterations compared to those observed in the comparison between Hs578T_wt and Hs578T_ctr ones, including genes previously associated with cell adhesion, signaling, and structural organization, such as ITIH5, PDX1, and MFAP4 genes, respectively.
3.3. Enrichment Results
Functional enrichment analysis of the 265 differentially expressed dominant genes associated with LINC01133 knockout showed consistent alterations across multiple functional layers, including BP, CC, MF, and signaling pathways (Figure 4). To better characterize these alterations, the results were evaluated separately for the total set of dominant DEGs and for the upregulated and downregulated gene subsets (defined in ko_vs_ctr with FDR ≤ 0.05 and |log2FC| ≥ 1.5).
3.3.1. Functional Enrichment Considering All Dominant DEGs
The Biological Processes analysis of the total set of dominant DEGs demonstrated significant enrichment for terms related to the regulation of cell signaling, transport and secretion of molecules, modulation of cell communication, and cytoskeleton organization. Terms associated with the regulation of synaptic transmission and the control of neurotransmitter release were also observed, which likely reflects enrichment of generic vesicle trafficking/exocytosis modules rather than literal neuronal differentiation, indicating the activation of transcriptional programs linked to intercellular communication and vesicular dynamics.
In the CC, dominant DEGs showed a strong association with structures of the plasma membrane and its specializations, including the synaptic membrane, postsynaptic regions, stress fibers, actin bundles, and actin-rich cell projections. Terms related to the ECM and extracellular space were also significantly enriched, suggesting remodeling of the cell–microenvironment interface.
The MF analysis showed enrichment for activities related to actin binding, microfilament motor activity, interaction with ECM components, binding to G protein-coupled receptors, and modulation of ion-dependent signals, particularly calcium. These findings indicate coordinated changes in structural and signaling functions.
The pathway analysis identified enrichment of routes related to calcium signaling, via cGMP–PKG, cell adhesion, amino acid metabolism, and pathways commonly associated with neoplastic processes. Taken together, these results indicate that dominant DEGs are distributed across multiple interconnected cellular pathways. The integrated summary of the three ontologies reinforced the functional convergence of dominant DEGs in processes related to cellular structural organization, cell–cell communication, and interaction with the extracellular microenvironment.
3.3.2. Functional Enrichment of Upregulated Dominant DEGs in Hs578T_ko
When considering only genes with increased expression, the BP analysis revealed an enrichment of terms associated with cell communication and signaling, regulation of inflammatory processes, and vesicular formation and secretion.
In the CC, the upregulated genes showed a predominant association with endomembrane, secretory vesicle, extracellular space, and vesicular trafficking-related structures. The presence of terms associated with the ECM suggests increased expression of genes involved in cell–environment interactions.
The MF enriched in this set included intratumoral/ECM interaction and interface activities via neurexin and spectrin, enzymatic activities involved in protein processing via carboxypeptidases, and interaction with ECM proteins. These results indicate activation of functions related to secretion, signaling, and microenvironment remodeling, represented by genes such as ANK2, NEO1, and ASPN, respectively, which converge with the data from the pathway analysis, with a main enrichment in pathways associated with cell adhesion.
3.3.3. Functional Enrichment of Dominant Downregulated DEGs in Hs578T_ko
Analysis of genes with reduced expression in the Hs578T_ko condition revealed a distinct functional profile. In BP, terms related to cell signaling, vesicular release, and cell transport were enriched, as was the phospholipase C signaling pathway, associated with cell proliferation.
In the CC, downregulated genes showed a significant association with processes linked to actin and cell junctions, focal adhesion, and membrane regions associated with cell anchoring, consistent with the MF, which were enriched in microfilament activities and in functions associated with the regulation of cell adhesion.
Pathway analysis of the dominant downregulated DEGs showed enrichment for pathways related to calcium signaling (including the cGMP–PKG axis), cell adhesion, amino acid metabolism, and signaling pathways commonly associated with neoplastic processes. These enriched pathways converge with those identified in the global DEG enrichment analysis, indicating that downregulated genes substantially contribute to the overall functional landscape observed in Hs578T_ko cells. Mutually, these findings highlight a strong association between dominant downregulated DEGs and the broader pathway enrichment profile, emphasizing their relevance within interconnected cellular signaling and structural networks.
To further characterize the functional characteristics of the differentially expressed dominant genes associated with LINC01133 knockout, an integrated gene/function mapping analysis was performed, as shown in Figure 5. This approach allows for simultaneous visualization of the individual contributions of dominant genes to the enriched and analyzed ontologies, as well as the quantitative distribution of these genes across the enriched terms.
In the alluvial diagrams, located to the left of each panel, the direct connection between specific genes and the enriched functional terms can be observed, demonstrating that multiple genes contribute in a shared manner to different functional categories. These diagrams show that several dominant DEGs are simultaneously associated with multiple processes or pathways, indicating functional overlap among the transcriptomic programs altered by LINC01133 loss.
In BP, the diagrams reveal that the dominant DEGs are associated with processes regulating cell signaling, intercellular communication, and transport and secretion (PDX1, GLUL, CACNB2). Individual genes appear connected to multiple terms, reflecting their participation in processes such as the regulation of synaptic transmission, control of vesicle release, and cellular response to stimuli (SNAP25, RIMS2). The adjacent dot plot shows the distribution of enriched terms according to gene ratio, number of associated genes (count), and statistical significance. It was observed that the terms with the highest gene ratio and the highest number of genes were mainly related to signaling and cellular dynamics, reinforcing their functional relevance in the set of dominant DEGs.
In the CC panel, the diagrams show a wide distribution of dominant genes among compartments associated with the plasma membrane, ECM, actin fibers, cellular projections, and structures related to vesicular trafficking (ANK2, SNAP25, RIMS2). The recurring presence of genes associated with specialized membrane regions and the extracellular space suggests coordinated alterations in structures involved in cell–microenvironment interactions, as evidenced by a dot-plot showing a greater number of DEGs involved in ECM and EMT (CPA3, COL6A6, COL4A6, ASPN, ITIH5).
In MF, a significant influence of a convergent gene group is particularly evident in functions linked to cell morphology, cytoskeleton, and actin (CACNB2, ANXA8, MFAP4), in addition to genes linked to pathways related to cAMP and cell proliferation through regulation of the MAPK/ERK pathway (GUCYA1, ADCY1).
Among the enriched pathways associations, those related to cell adhesion molecules stand out. In this context, NRXN3 and PTGER3 showed negative log_2_FC values, whereas the NEO1 gene showed a positive log_2_FC. The joint occurrence of these profiles suggests a reorganization of cell adhesion processes, in which the loss of components that stabilize cell–cell interactions occurs in conjunction with the activation of molecules linked to cellular plasticity.
3.4. TCGA-BRCA/KO-Dominant DEGs Analysis
To assess the translational relevance of the transcriptomic signature associated with the LINC01133 knockout, the KO-dominant DEGs were integrated with gene expression data from primary TNBC tumors in the TCGA-BRCA cohort (Figure 6 and Figure 7).
Figure 6A shows the distribution of LINC01133 expression in basal-like/TNBC samples (n = 199), ordered according to transcript expression levels. A wide range of LINC01133 levels was observed among the tumors analyzed, allowing the samples to be stratified into low, intermediate, and high expression groups. This heterogeneity provides a basis for the comparative analysis of gene expression associated with the knockout along the LINC01133 gradient.
Figure 6B presents heatmaps showing the expression of the 30 most upregulated and 30 most downregulated dominant genes in the Hs578T_ko condition, evaluated in TNBC samples from the TCGA-BRCA cohort. The samples were organized according to increasing levels of LINC01133 expression. A more homogeneous and balanced profile was observed for upregulated genes, with samples showing higher LINC01133 expression, unlike downregulated genes, which presented a more heterogeneous profile.
However, heatmap analysis can lead to misinterpretations when dealing with data with subtle gradients or high intertumoral variability. Thus, clustering and quantitative expression trends for DEGs based on LINC01133 expression strata may support a more thorough analysis.
Figure 7 quantitatively explores the relationship between LINC01133 expression levels and the average expression of dominant genes associated with knockout in TNBC tumors. Samples from the TCGA-BRCA cohort were stratified by LINC01133 expression percentile, enabling the assessment of the gradual variation in genes associated with knockout along this gradient. Gene expression values were z-score normalized at the gene level (log_2_(TPM + 1)), so trends reflect relative, not absolute, expression differences.
In the panel corresponding to upregulated DEGs, two clusters with an inhibitory trend were observed, considering normalized mean expression amplitudes above 0.4 as significant and with moderate-to-high action in the phenotype linked to the associated molecular pathway. The analysis of downregulated DEGs also revealed two clusters with an inhibitory trend.
In contrast, for each DEG group, a cluster with a tendency toward expression was observed; however, the more significant effect was linked to the downregulated DEG group, associated with cell proliferation and migration. Notably, the strongest gradient effects were observed among the downregulated KO-dominant genes, which included candidates linked to cell adhesion, signaling, and migration-related programs. Overall, these results support heterogeneous but detectable transcriptional patterning of KO-associated genes across basal-like tumors, consistent with a concentration-dependent LINC01133-associated gradient rather than a binary on/off state.
4. Discussion
In this study, we investigated, in an integrated manner, the impact of LINC01133 knockout on the in vitro transcriptional profile of TNBC, combining a genetically controlled cell model with functional analysis and integration with human tumor data from the TCGA-BRCA cohort. The results indicate that the loss of LINC01133 does not induce isolated transcriptomic alterations, but rather is associated with coordinated reprogramming of multiple molecular networks related to cellular organization, interaction with the ECM, intracellular signaling, and cell/microenvironment communication. Taken together, these findings support the hypothesis that LINC01133 plays a transcriptional modulatory role, dependent on context and expression levels, moving away from a binary classification as a strictly oncogenic lncRNA or tumor suppressor.
The robust validation of LINC01133 knockout, confirmed at the genomic, transcriptional, and functional levels (Figure 2), was fundamental to ensure that the observed alterations predominantly reflect the loss of this lncRNA rather than technical effects associated with lentiviral transduction [36,39]. The higher expression of LINC01133 observed in the Hs578T_ctr cell line compared to Hs578T_wt reinforces the importance of analytical strategies capable of discriminating background effects. In this context, the effect of the dominance-based approach adopted in this work represents an important methodological advance, since it allows for the identification of genes whose expression alterations are quantitatively associated with the loss of LINC01133, even when basal effects are present. Nonetheless, we recognize that dominance filtering is an operational definition and should be interpreted alongside primary ko_vs_ctr results.
The KO-dominant gene definition identified 265 DEGs that clearly and consistently discriminated Hs578T_ko cells from wild-type and control conditions (Figure 3). The pattern observed in the heatmap and volcano plots indicates that LINC01133 knockout is related to a characteristic transcriptomic state, with convergent changes in genes related to cell adhesion, structural organization, and signaling. The low overall overlap between the DEG sets across the different comparisons suggests distinct contributions of KO-dependent and vector-associated effects, and underscores the importance of using KO vs. CTR as the primary comparison for mechanistic interpretation. Accordingly, we avoided interpreting low overlap as evidence against reproducibility and instead treated it as a reflection of different perturbations captured by each contrast.
The functional analysis of these dominant DEGs revealed a robust convergence in processes linked to cellular architecture and the cell/microenvironment interface, including actin cytoskeleton organization, focal adhesion, ECM remodeling, and calcium- and G protein-coupled receptor-dependent signaling [46,47]. These processes are particularly relevant given TNBC’s heterogeneity, since this tumor subtype exhibits high phenotypic plasticity and a strong dependence on dynamic interactions with the microenvironment to sustain migration and invasion [48,49]. Therefore, the set of dominant DEGs associated with the loss of LINC01133 points to a structural and functional reorganization consistent with cellular states that are more permissive to EMT and metastasis.
From a mechanistic perspective, ECM remodeling emerges as a central axis of this reprogramming [50]. Genes such as ITIH5, ASPN, COL4A6, COL6A6, and MFAP4, identified as dominant DEGs, play important roles in the composition, organization, and rigidity of the ECM [51,52,53,54]. Alterations in the ECM not only enable cell migration but also directly modulate intracellular signaling pathways. The loss of LINC01133, by being associated with altered expression of these genes, can shift the balance toward a more organized matrix or a matrix permissive to invasion, favoring more aggressive cellular states without necessarily inducing a complete EMT, but rather highly adaptive hybrid states [55,56].
Furthermore, the association of dominant DEGs with the actin cytoskeleton and focal adhesions indicates a potential transcriptional involvement of LINC01133 in processes related to cell anchoring and traction forces. This is supported by the downregulation of genes involved in cytoskeletal stabilization and cell–cell junctions, suggesting that LINC01133 depletion weakens the structural restraints on cell mobility. In the context of TNBC, in which cells can transition between collective and single-cell migration modes, such deregulation of structural integrity could facilitate rapid switches between migratory modes, thereby enhancing invasive potential. Notably, this invasive phenotype may be further amplified by the upregulation of genes (e.g., CPA3) with metallocarboxypeptidase activity, which may contribute to key modifications in the ECM (Figure 4) [57,58].
The results of the separate analyses of dominant, upregulated, and downregulated DEGs showed associations with shifts in the balance between antagonistic cellular programs. The increased expression of genes associated with secretion, vesicular trafficking, and intercellular communication suggests the activation of pathways that favor the release of pro-migratory factors, ECM remodelers, and inflammatory mediators. Given that the secretion of matrix proteins, proteases, and cytokines is an essential component of the emergence of highly invasive and pre-metastatic phenotypes [59,60,61], there may be a functional link between LINC01133 knockout and a general remodeling of the tumor microenvironment.
Conversely, reduced expression of genes linked to cell–cell adhesion, cytoskeletal organization, and control of proliferative signaling suggests a loss of structural and regulatory brakes. This pattern points to LINC01133 acting as a negative modulator of circuits that, when disinhibited, favor plasticity and phenotypic reprogramming. This plasticity is a key factor in therapeutic resistance and adaptation to diverse microenvironments, reinforcing the potential relevance of LINC01133 in tumor progression [62,63,64].
The integrated gene/function analysis deepens this model by showing that several dominant DEGs act as multifunctional genes in integrated networks. Genes such as SNAP25 and RIMS2, traditionally associated with vesicular release in neuronal systems, emerged here as potential regulators of secretion and intercellular communication in TNBC [65,66,67]. The presence of these genes among the dominant DEGs supports that the loss of LINC01133 can reactivate transcriptional programs usually restricted to specialized cellular contexts, conferring adaptive advantages to tumor cells.
Beyond structural components, LINC01133 knockout also affects key signaling hubs. Genes such as CACNB2, ADCY1, and GUCY1A1, which connect the cytoskeleton to calcium, cAMP, and cGMP signaling [68], are predominantly altered. This specific pattern suggests that LINC01133 may act as a transcriptional brake on these critical second-messenger pathways. Consequently, its loss may hypersensitize TNBC cells to extracellular and mechanical cues, potentially amplifying signaling flux through central oncogenic pathways such as MAPK/ERK and PI3K/AKT [69]. However, because our data are transcriptomic, direct changes in pathway activity should be validated at the protein/functional level (e.g., phospho-ERK/AKT assays). This functional convergence on major TNBC drivers could, in part, explain the pronounced migratory and survival phenotypes observed.
The concurrent upregulation of the PDX1 transcription factor and the MFAP4 matrix protein reveals a coordinated program linking cell-intrinsic signaling with microenvironmental remodeling. PDX1’s ectopic activation is likely associated with the transcriptional reprogramming of metabolic and stress-response pathways to support a tumor modulation state [70,71,72]. In parallel, MFAP4 reshapes the ECM and integrin signaling to facilitate cytoskeletal reorganization and adhesion dynamics. Critically, the loss of LINC01133 appears to synchronize these usually distinct processes, transcriptional control and structural alteration, thereby creating an integrated cellular response that permissively enables migration and invasion.
In contrast, downregulated genes such as PTGER3 indicate a selective reorganization of prostaglandin-mediated signaling. As a PGE2 receptor, PTGER3 modulates pathways involving cAMP and MAPK, thereby impacting cell proliferation and inflammatory responses within the tumor microenvironment [73,74,75].
Integration with primary tumor data from TCGA-BRCA supports the notion that these effects are not artifacts of the cellular model. The heterogeneity observed in LINC01133 levels in human TNBC, associated with gradual variations in the expression of dominant DEGs, suggests that LINC01133 acts as a continuous, not binary, regulator of the tumor molecular state [76]. In a subtype characterized by extreme intratumoral heterogeneity, this concentration-dependent regulation may explain the coexistence of cell subpopulations with different migratory, proliferative, and metabolic capacities within the same tumor (Figure 6), as schematically illustrated in Figure 8:
In this context, the observation of non-linear trends and the formation of distinct functional clusters along the LINC01133 gradient reinforces the idea that different levels of this lncRNA can stabilize intermediate cellular states. These hybrid states combine epithelial and mesenchymal characteristics and are associated with greater metastatic potential and therapeutic resistance. Therefore, quantitative modulation of LINC01133 may represent a fine-tuning mechanism for these cellular trajectories, influencing tumor progression without the need for additional genetic alterations.
It is important to emphasize that the translational analyses between the KO-dominant DEGs obtained in vitro and the clinical data from TCGA-BRCA were designed with a strictly translational and exploratory purpose, aiming to evaluate the coherence and potential clinical relevance of the transcriptional signatures derived from the cellular model when projected in a heterogeneous human tumor context [77,78]. At no point is a direct causal relationship assumed between LINC01133 expression and the tumor patterns observed, since factors such as cell composition, tumor microenvironment, clonal heterogeneity, and post-transcriptional regulation can significantly influence expression profiles in vivo.
Taken together, these data support a model in which LINC01133 acts as a possible transcriptional modulator of tumor plasticity in TNBCs, integrating structural, mechanical, and signaling signals to regulate networks associated with migration, EMT, and ECM remodeling. This modulatory function, dependent on context and expression level, offers a mechanistic explanation for the seemingly contradictory roles attributed to LINC01133 across different tumors and highlights the importance of considering lncRNAs as dynamic regulators of cellular states rather than as functionally one-dimensional elements.
5. Conclusions
The integrative transcriptomic analysis conducted in this study indicates that loss of the long non-coding RNA LINC01133 is associated with extensive cellular transcriptional reprogramming in TNBC. Knockout of this lncRNA in the Hs578T cell line triggered alterations encompassing ECM remodeling, proteolytic imbalance, and activation of metabolic and secretory pathways, collectively resulting in a tumor microenvironment that may be more permissive to invasion and migration. The functional convergence between the in vitro model and basal-like/TNBC tumors from the TCGA-BRCA cohort underscores the translational relevance of the findings, suggesting that the absence of LINC01133 favors a highly aggressive and pro-invasive phenotype. Therefore, LINC01133 emerges as a potentially informative regulator of the cell–microenvironment interface, acting as a molecular brake whose loss may remove critical barriers to tumor modulation by migration and invasion mechanisms. Taken together, these results highlight LINC01133 and its target genes as promising candidates for further mechanistic validation and integrative in vitro/in vivo studies.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1World Health Organization International Agency for Research on Cancer Cancer Over Time Available online: https://gco.iarc.fr/overtime/(accessed on 25 November 2025)
- 2Loi S. Molecular Analysis of Hormone Receptor Positive (Luminal) Breast Cancers—What Have We Learnt?Eur. J. Cancer 2008442813281810.1016/j.ejca.2008.09.01218990560 · doi ↗ · pubmed ↗
- 3Perou C.M. Sørile T. Eisen M.B. Van De Rijn M. Jeffrey S.S. Ress C.A. Pollack J.R. Ross D.T. Johnsen H. Akslen L.A. Molecular Portraits of Human Breast Tumours Nature 200040674775210.1038/3502109310963602 · doi ↗ · pubmed ↗
- 4Bergeron A. Bertaut A. Beltjens F. Charon-Barra C. Amet A. Jankowski C. Desmoulins I. Ladoire S. Arnould L. Anticipating Changes in the HER 2 Status of Breast Tumours with Disease Progression—Towards Better Treatment Decisions in the New Era of HER 2-Low Breast Cancers Br. J. Cancer 202312912213410.1038/s 41416-023-02287-x 37120672 PMC 10307899 · doi ↗ · pubmed ↗
- 5Yang T. Kang L. Li D. Song Y. Immunotherapy for HER-2 Positive Breast Cancer Front. Oncol.202313109798310.3389/fonc.2023.109798337007133 PMC 10061112 · doi ↗ · pubmed ↗
- 6Zagami P. Carey L.A. Triple Negative Breast Cancer: Pitfalls and Progress NPJ Breast Cancer 202289510.1038/s 41523-022-00468-035987766 PMC 9392735 · doi ↗ · pubmed ↗
- 7Lehmann B.D. JovanovićB. Chen X. Estrada M.V. Johnson K.N. Shyr Y. Moses H.L. Sanders M.E. Pietenpol J.A. Refinement of Triple-Negative Breast Cancer Molecular Subtypes: Implications for Neoadjuvant Chemotherapy Selection P Lo S ONE 201611 e 015736810.1371/journal.pone.015736827310713 PMC 4911051 · doi ↗ · pubmed ↗
- 8So J.Y. Ohm J. Lipkowitz S. Yang L. Triple Negative Breast Cancer (TNBC): Non-Genetic Tumor Heterogeneity and Immune Microenvironment: Emerging Treatment Options Pharmacol. Ther.202223710825310.1016/j.pharmthera.2022.10825335872332 PMC 9378710 · doi ↗ · pubmed ↗
