# Large-scale sequencing study of de novo regulatory Tandem Repeats (TRs) identifies new ASD (Autism Spectrum Disorders) candidate genes integrating gene expression mapping, brain scRNA-seq and organoid models

**Authors:** Maria Cristina Rodriguez Fontenla, Pablo Carballo-Pacoret, Sara Dominguez-Alonso, Javier Gonzalez-Peñas, Mara Parellada, Celso Arango, Angel Carracedo

PMC · DOI: 10.21203/rs.3.rs-8374597/v1 · Research Square · 2026-02-13

## TL;DR

This study identifies new autism risk genes by analyzing genetic variations in non-coding regions using advanced sequencing and brain cell data.

## Contribution

The study introduces a novel integrative approach combining de novo tandem repeats with gene expression and brain cell data to uncover ASD candidate genes.

## Key findings

- The gene ECHS1 was identified as a strong autism candidate through multiple lines of evidence.
- De novo tandem repeats in non-coding regions were linked to altered gene expression in brain cells.
- Integrating genetic and transcriptomic data improves the detection of autism risk genes missed by traditional methods.

## Abstract

In this study, we performed an integrative analysis of de novo tandem repeats (TRs) to unravel the missing heritability that may be hidden in 85,394 active cis-regulatory elements (cCREs) from ENCODE through target sequencing in a Spanish cohort of 200 ASD trios, using a robust bioinformatic pipeline. For the integrative analysis, we use data from 1,637 ASD simplex quad families from the Simons Simplex Collection (SSC). We then incorporated multiple layers of functional annotation, including predicted transcription factor (TF) binding sites, gene mapping based on physical proximity and expression correlation, pathogenicity scoring, single-cell RNA-seq data from human brain in ASD cases and controls and cortical organoid expression data.

Together, our analyses identified multiple ASD-relevant candidate genes supported by convergent lines of evidence. Notably, ECHS1 emerged as a strong candidate, affected by several de novo TRs in both the Spanish cohort and the SSC. It was also identified as the most significantly associated gene through expression-based gene mapping (T-Gene) and showed consistent differential expression in excitatory neurons of the cerebral cortex at the single-cell level along with increased expression in late-stage cortical organoids.

These findings remark the value of integrating genetic and transcriptomic information to improve the identification of potential risk genes for ASD, particularly within non-coding regions. Our approach also highlights the importance of identifying complex genetic variation, such as de novo TRs, that are typically missed in conventional exome or whole-genome analyses, and require specialized bioinformatic strategies for accurate detection and interpretation.

## Linked entities

- **Genes:** ECHS1 (enoyl-CoA hydratase, short chain 1) [NCBI Gene 1892]

## Full-text entities

- **Genes:** FLI1 (Fli-1 proto-oncogene, ETS transcription factor) [NCBI Gene 2313] {aka BDPLT21, EWSR2, FLI-1, SIC-1}, NRG1 (neuregulin 1) [NCBI Gene 3084] {aka ARIA, GGF, GGF2, HGL, HRG, HRG1}, ODC1 (ornithine decarboxylase 1) [NCBI Gene 4953] {aka BABS, NEDBA, NEDBIA, ODC}, NR4A1 (nuclear receptor subfamily 4 group A member 1) [NCBI Gene 3164] {aka GFRP1, HMR, N10, NAK-1, NGFIB, NP10}, FUOM (fucose mutarotase) [NCBI Gene 282969] {aka C10orf125, FUCU, FucM}, FMR1 (fragile X messenger ribonucleoprotein 1) [NCBI Gene 2332] {aka FMRP, FRAXA, POF, POF1}, C9orf72 (C9orf72-SMCR8 complex subunit) [NCBI Gene 203228] {aka ALSFTD, DENND9, DENNL72, FTDALS, FTDALS1}, RERE (arginine-glutamic acid dipeptide repeats) [NCBI Gene 473] {aka ARG, ARP, ATN1L, DNB1, NEDBEH}, THNSL2 (threonine synthase like 2) [NCBI Gene 55258] {aka SOFAT, THS2, TSH2}, NRF1 (nuclear respiratory factor 1) [NCBI Gene 4899] {aka ALPHA-PAL}, KTN1 (kinectin 1) [NCBI Gene 3895] {aka CG1, KNT, MU-RMS-40.19}, CALY (calcyon neuron specific vesicular protein) [NCBI Gene 50632] {aka DRD1IP, NSG3}, WIPF2 (WAS/WASL interacting protein family member 2) [NCBI Gene 147179] {aka WICH, WIRE}, ATG101 (autophagy related 101) [NCBI Gene 60673] {aka C12orf44}, THRA (thyroid hormone receptor alpha) [NCBI Gene 7067] {aka AR7, CHNG6, EAR7, ERB-T-1, ERBA, ERBA1}, PRRC2A (proline rich coiled-coil 2A) [NCBI Gene 7916] {aka BAT2, D6S51, D6S51E, G2}, KLF9 (KLF transcription factor 9) [NCBI Gene 687] {aka BTEB, BTEB1}, F2R (coagulation factor II thrombin receptor) [NCBI Gene 2149] {aka CF2R, HTR, PAR-1, PAR1, TR}, ECHS1 (enoyl-CoA hydratase, short chain 1) [NCBI Gene 1892] {aka ECHS1D, SCEH, mECH, mECH1}, PRAP1 (proline rich acidic protein 1) [NCBI Gene 118471] {aka PRO1195, UPA}, PBX1 (PBX homeobox 1) [NCBI Gene 5087] {aka CAKUHED}, GLI3 (GLI family zinc finger 3) [NCBI Gene 2737] {aka ACLS, GCPS, GLI3-190, GLI3FL, PAP-A, PAPA}, ZNF384 (zinc finger protein 384) [NCBI Gene 171017] {aka CAGH1, CAGH1A, CIZ, ERDA2, NMP4, NP}, UBE2K (ubiquitin conjugating enzyme E2 K) [NCBI Gene 3093] {aka E2-25K, HIP2, HYPG, LIG, UBC1}, CND (Corneal dermoids) [NCBI Gene 8231], FARP1 (FERM, ARH/RhoGEF and pleckstrin domain protein 1) [NCBI Gene 10160] {aka CDEP, FARP1-IT1, GLCC1, PLEKHC2, PPP1R75}, AOPEP (aminopeptidase O (putative)) [NCBI Gene 84909] {aka AP-O, APO, C90RF3, C9orf3, DYT31, ONPEP}, TM4SF1 (transmembrane 4 L six family member 1) [NCBI Gene 4071] {aka H-L6, L6, M3S1, TAAL6}, CASZ1 (castor zinc finger 1) [NCBI Gene 54897] {aka CAS11, CST, SRG, ZNF693, dJ734G22.1}, F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}, PHLDB2 (pleckstrin homology like domain family B member 2) [NCBI Gene 90102] {aka LL5b, LL5beta}, SOCS3 (suppressor of cytokine signaling 3) [NCBI Gene 9021] {aka ATOD4, CIS3, Cish3, SOCS-3, SSI-3, SSI3}, CTCF (CCCTC-binding factor) [NCBI Gene 10664] {aka CFAP108, FAP108, MRD21}, TMEM131L (transmembrane 131 like) [NCBI Gene 23240] {aka KIAA0922}, EWSR1 (EWS RNA binding protein 1) [NCBI Gene 2130] {aka EWS, EWS-FLI1}, PTGES3 (prostaglandin E synthase 3) [NCBI Gene 10728] {aka P23, TEBP, cPGES}, KLHL32 (kelch like family member 32) [NCBI Gene 114792] {aka BKLHD5, KIAA1900, UG0030H05, dJ21F7.1}
- **Diseases:** Huntington's Disease (MESH:D006816), Mental Disorders (MESH:D001523), schizophrenia (MESH:D012559), Autism (MESH:D001321), dystonia (MESH:D004421), ASD (MESH:D000067877), genetic disorders (MESH:D030342), seizures (MESH:D012640), Fragile X syndrome (MESH:D005600), arrhythmia (MESH:D001145), ALS (MESH:D000690), neonatal death (MESH:D066087), ataxia (MESH:D001259), encephalopathies (MESH:D001927), Leigh syndrome (MESH:D007888), NDDs (MESH:D002658), SPANISH COHORT (MESH:C564143), neurological impairments (MESH:D009422), communication deficits (MESH:D003147), DSM-IV-TR (MESH:D006011), neuronal defects (MESH:D009410), dementia (MESH:D003704)
- **Chemicals:** isoleucine (MESH:D007532), valine (MESH:D014633)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** EXT_9_L6 — Homo sapiens (Human), Chronic myelogenous leukemia, BCR-ABL1 positive, Cancer cell line (CVCL_SM61)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12919168/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12919168/full.md

## References

56 references — full list in the complete paper: https://tomesphere.com/paper/PMC12919168/full.md

---
Source: https://tomesphere.com/paper/PMC12919168