A retroelement-derived mammalian ARC protein exhibits selective RNA recognition and nucleic acid chaperone functions

Julita Gumna-Mikina; Angelika Andrzejewska-Romanowska; Maciej Antczak; Ewa Tykwińska; Marta Szachniuk; Katarzyna Pachulska-Wieczorek

PMC · DOI:10.1093/nar/gkag207·March 9, 2026

A retroelement-derived mammalian ARC protein exhibits selective RNA recognition and nucleic acid chaperone functions

Julita Gumna-Mikina, Angelika Andrzejewska-Romanowska, Maciej Antczak, Ewa Tykwińska, Marta Szachniuk, Katarzyna Pachulska-Wieczorek

PDF

Open Access

TL;DR

This study reveals how the ARC protein interacts with RNA, showing it can selectively bind and reshape RNA structures, which is important for neuronal communication.

Contribution

The paper presents the first detailed in vitro analysis of ARC-RNA interactions and identifies specific RNA motifs and domains involved in binding.

Findings

01

ARC binds RNA through specific GC-rich motifs and near stable helices in the 5′ region of Arc mRNA.

02

Positively charged regions of ARC's matrix and capsid domains enhance RNA binding cooperativity.

03

ARC functions as a nucleic acid chaperone, destabilizing RNA structures locally.

Abstract

Activity-regulated cytoskeleton-associated protein (ARC) is an RNA-binding protein that also serves as a central hub for neuronal protein–protein interactions. It is essential for intercellular signaling and contributes to synaptic plasticity. ARC includes Gag-like sequences of Ty3/Gypsy retrotransposons and retains the ability to self-assemble into capsid-like structures containing Arc mRNA. Here, we employ an integrative approach to provide the first detailed in vitro analysis of ARC–RNA interactions. Using quantitative binding assays, RNA structure mapping, and ribonucleoprotein (RNP) footprinting, complemented by extensive computational analyses, we identified Arc mRNA regions specifically and non-specifically bound by ARC, as well as ARC amino acid residues involved in RNA interactions. We show that ARC recognizes RNA sequence and structure. A specific GC-rich motif is common to…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Figures7

Click any figure to enlarge with its caption.

MST analysis of ARC binding to individual transcripts and the effect of the competitor on ARC–RNA interaction. (A) The scheme presents Arc mRNA transcripts used in the studies. (B) The heatmap shows EC50 values for ARC–RNA complexes without competitive RNA and in the presence of 150-, 250-, and 350-fold molar excesses of total yeast RNA. (C) The bar plot shows the fold change [mean ± standard error of the mean (SEM)] in EC50 values across ARC–RNA complexes at increasing excess of total yeast RNA. (D) The heatmap of EC50 values and bar plot of the fold change of the EC50 values (mean ± SEM) in the presence of competing RNA for ARC protein complexes with F1 transcript and its variants.

SHAPE-based comparison of Arc mRNA in unbound and bound states. (A) SHAPE reactivity distributions with medians. (B) Gini index distributions with medians. Significance was computed by the Wilcoxon rank-sum test; *P-value < 0.05; *P-value < 0.001. (C) Profiles of the median SHAPE (upper plot), Gini index distributions (upper middle plot), Pearson correlation (lower middle plot), and ΔSHAPE (lower plot), smoothed with a 55 nt sliding window. The gray shading indicates regions with no data. Below, a conservation sequence score calculated with a 55 nt sliding window for Arc mRNA is presented. (D) Logo of the RNA sequence present across the determined ARC-binding sites generated with MEME Suite.

Analysis of ARC-binding sites within the 5′ end of Arc mRNA. (A) SHAPE-based prediction of the Arc mRNA 5′ region structure: arc diagrams of predicted MFE structure and base-pairing probabilities (see scale); median SHAPE reactivity profile with respect to the global median reactivity and Shannon entropy profile smoothed in a 55 nt sliding window. Shadings mark lowSS regions. Red and navy blue boxes represent regions of significant changes identified in the ΔSHAPE analysis. (B) 2D structure model of +1–607 Arc mRNA. Regions protected by protein are marked in red. The lowSS region is marked in purple. The start codon is marked in green. (C) 3D model of +1–607 Arc mRNA predicted with RNAComposer. Nucleotides identified in the ΔSHAPE analysis as ARC-binding sites are emphasized as red spheres. The lowSS region is marked in purple.

Analysis of ARC-binding sites within F1, F1ΔhUTR, and F1ΔUTR transcripts. (A) Arc diagrams of predicted MFE structure and base-pairing probabilities (see scale); median SHAPE reactivity profile with respect to the global median reactivity, Shannon entropy profile, and ΔSHAPE profile smoothed in a 55 nt sliding window. Shadings mark lowSS regions. The red and navy blue boxes below represent regions of significant changes identified in the ΔSHAPE analysis for Arc mRNA. (B) Structural context of the ARC-binding Site 3 identified for Arc mRNA within F1, F1ΔhUTR, and F1ΔUTR transcripts. (C) 3D models of F1, F1ΔhUTR, and F1ΔUTR RNA predicted using RNAComposer. Nucleotides identified in the ΔSHAPE analysis as ARC-binding sites are emphasized as red spheres. The lowSS region is marked in purple. The nucleotide numbering in the figure follows the full-length Arc mRNA sequence.

Results of analysis of the RNA-binding domain of the ARC protein. (A) The amino acid sequence of ARC, with the MA-like domain (green), CA-like domain (purple), and oligomerization site (cyan) highlighted. The red arrow indicates the beginning of the ARCΔMA protein sequence. Red shading indicates amino acid residues predicted to interact with RNA in the docking of ARC–RNA complexes. (B) The heatmap presents EC50 values for ARCΔMA–RNA complexes calculated from binding curves obtained in experiments without competitive RNA and in the presence of total yeast RNA, fitted to the Hill equation. (C) The bar plot shows the fold change (mean ± SEM) of EC50 values for ARCΔMA protein complexes with the tested transcripts at 25-, 50-, 100-, and 150-fold excess of total yeast RNA. (D) The electrostatic potential mapped on the full-length ARC protein surface, calculated using the APBS Electrostatics Plugin in PyMOL (two-perspective view). Blue indicates regions of positive potential (up to +5), whereas red depicts negative potential values (up to −5). (E) The model of the ARC–RNA (binding Site 3) complex obtained with the HADDOCK 2.4 web server. Intermolecular interaction sites are marked in red. Amino acid residues defined as necessary for oligomerization are marked in cyan.

Nucleic acid chaperone activity of ARC compared with NCp9 Gag3 protein. The graphs present averaged data from at least three independent annealing experiments for each protein. The error bars represent SDs. The annealing assays of TAR(−) DNA/TAR(+) DNA (A) or TAR(−) DNA/TAR RNA (B) performed in the presence of increasing protein concentration (0, 15.625, 31.25, 62.5, 125, 250, and 500 nM). Lanes denoted as “C” are protein-free control samples, and the next lanes contain increasing amounts of NCp9 or ARC. The strand displacement assays for DNA (C) and RNA (D) performed in the presence of increasing protein concentrations (0, 0.625, 1.25, 2.5, and 5 µM). Lanes denoted as “no D− comp” are control samples without adding unlabeled D−, and the next lanes contain increasing amounts of NCp9 or ARC. The time-course strand displacement assays for DNA (E) and RNA (F) substrates, performed with 5 µM protein and without protein. Representative polyacrylamide gels are presented on the right.

Protein-induced Ty3 5′–5′ dimerization and Ty3 3′ RNA–tRNAiMet annealing assays. The graphs present the percentages of dimerized Ty3 RNA (A) and bound tRNAiMet (B) at increasing ARC or NCp9 concentrations. Representative agarose gels are presented below. Lanes denoted as “C” are protein-free control samples, and the next lanes contain increasing amounts of ARC protein. The data for NCp9 were previously presented in Andrzejewska-Romanowska et al. [69].

Funding2

—National Science Centre10.13039/501100004281
—Institute of Bioorganic Chemistry10.13039/501100020811

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRNA Research and Splicing · Nuclear Structure and Function · Neurogenesis and neuroplasticity mechanisms

Full text

Introduction

A single neuron establishes ~1000 synapses, crucial for transmitting information through neurotransmitters. Activity at these synapses can initiate communication with the nucleus, triggering the transcription of immediate early genes, including the mammalian activity-regulated cytoskeleton-associated gene (ARC, also known as activity-regulated gene 3.1 or Arg3.1). Once transcribed, Arc mRNA is transported to the dendrites, where it accumulates at sites of synaptic activity, and is translated to the 45 kDa ARC protein [1]. ARC is essential to various forms of synaptic plasticity, including synaptic scaling, long-term potentiation (LTP), and long-term depression (LTD) [2, 3]. Extensive research over the years highlights that ARC is involved in learning and memory processes, with implications for various neurological disorders such as Alzheimer’s disease, Angelman syndrome, schizophrenia, and autism [4, 5]. Beyond its role in the nervous system, ARC regulates the immune response and modulates the movement of skin-migratory dendritic cells in response to inflammation and T-cell activation [6, 7].

The ARC gene contains Gag-like sequences of the Ty3/Gypsy retrotransposon family [8]. Mammalian ARC homologs were identified in Drosophila, but they originate from independent domestication events within the Ty3/Gypsy lineage [9]. A significant breakthrough was the demonstration that both ARC and dARC1 can self-assemble into virus-like capsids that encapsidate their own mRNA and can be transferred between cells via extracellular vesicles (EVs). This was originally observed in primary mouse hippocampal neuronal cultures [9], and in vivo at the fly larval neuromuscular junction [10]. Subsequent mammalian studies have provided additional support for intercellular ARC signaling in vivo, highlighting ARC-EV-mediated communication in peripheral tissues [11] and activity-related interneuronal ARC redistribution in the mouse hippocampus [12]. The most recent work further implies that LTP-inducing stimuli promote IRSp53-dependent ARC-EV biogenesis, and that Arc mRNA is translated in recipient neurons, thereby reducing surface AMPA receptor levels and weakening synaptic strength [13]. IRSp53 is also implicated in human immunodeficiency vius 1 (HIV-1) Gag-driven budding, efficient particle assembly, and release [14, 15]. The transfer of Arc mRNA in virus-like capsids also occurs between human glioma cells, which may influence tumor progression and synaptic plasticity in cancer patients [16].

The structure of ARC capsids resembles that of Ty3 virus-like particles (VLPs), as well as the capsid structure of HIV-1 virions [17]. The mammalian ARC features a positively charged N-terminal domain (NTD) and a negatively charged C-terminal domain (CTD) [18]. The CTD exhibits homology to the capsid (CA) domain of retroelement Gag polyproteins [19, 20]. ARC proteins self-associate into various oligomeric forms via positively charged antiparallel helical coils located within the NTD, with a crucial seven amino acid residue oligomerization motif [21–23]. Nevertheless, interactions between ARC proteins alone are insufficient for forming capsids, and RNA is indispensable for proper assembly [9]. The involvement in RNA binding has been suggested for the computationally predicted retrovirus-like matrix (MA) domain in the NTD [8], although this has not been experimentally verified. In retroviruses, the MA domain of Gag facilitates membrane binding and can also contribute to RNA interactions [24–27]. Like the retroviral MA, the NTD of ARC can bind to phospholipid membranes [28]. The process of selecting and packaging mRNA into ARC capsids is poorly understood. Research involving Escherichia coli lysate suggests that the ARC protein displays minimal specificity for any particular RNA and instead encapsidates abundant RNAs based on their stoichiometry [9]. On the other hand, in vivo studies have shown that assembly of dARC1 capsids depends on the 3′-untranslated region (UTR) of darc1 mRNA, and the lack of this sequence inhibits intercellular darc1 mRNA transfer [10]. However, a significant distinction between dARC1 and mammalian ARC is the presence of a single zinc finger motif in dARC1, which is vital for RNA interactions [10].

Our research aimed to comprehensively investigate the RNA binding specificity of the rat ARC protein to better understand how ARC selects Arc mRNA from a large pool of other cellular RNAs. Our results demonstrate that under cell-free conditions, ARC exhibits a binding preference for its own mRNA over other RNAs. We identified the 5′ region of the Arc mRNA coding sequence (CDS) as essential for maintaining binding specificity. Two ARC-binding sites were also found in the 5′ UTR, but they are probably not involved in specific interactions. We showed that all ARC-binding sequences are enriched for guanine residues (∼40%) and exhibit high conservation at CDS positions. Moreover, we identified a GC-rich 10 nt RNA motif that is consistently present across all ARC-binding sequences. We also observed that ARC is influenced by RNA structure, with binding occurring adjacent to highly stable, solvent-exposed helices. Experiments using a truncated form of ARC, combined with computational modeling of ARC–RNA complexes, revealed that both MA-like and CA-like domains contribute to RNA binding activity. Importantly, we demonstrate that ARC acts as a highly effective nucleic acid chaperone (NAC) capable of inducing local relaxation of Arc mRNA structure.

Materials and methods

DNA, RNA, and protein substrates

DNA templates for in vitro transcription of Arc RNAs (Arc mRNA 1–2926, F1 nt 1–551, F2 nt 650–1176, F3 nt 1075–1652, F4 nt 1678–2260, F5 nt 2262–2926, F1ΔhalfUTR nt 82–551, F1ΔUTR nt 198–748) were obtained by PCR amplification from plasmid pBluescript-SKII-ArcUTRs (generously provided by Dr Jason D. Shepherd, University of Utah) containing the rat Arc mRNA sequence (accession Z46925.1). The short Arc RNAs encompassing independent long-range structural domains were determined based on in silico secondary structure prediction of Arc mRNA (Supplementary Fig. S1). The DNA template for Saccharomyces cerevisiae 18S rRNA (nt 1–576) was amplified from cDNA synthesized from total yeast RNA. The DNA templates for in vitro transcription of Ty3 5′ RNA (nt 1–429) and 3′ RNA (nt 4624–5052) were obtained by PCR amplification from plasmid pDLC201 containing a full-length Ty3 genome sequence [29]. All primers are listed in Supplementary Table S1. Transcripts were synthesized using SP6 or T7 MAXIscript transcription kits (Invitrogen) according to the manufacturer’s protocols, with the addition of Gp32 protein (Eurx) to the Arc RNA transcription reactions. DNase-treated transcripts were purified using the Monarch RNA Cleanup Kit (New England BioLabs). The transcript’s integrity was monitored by agarose gel electrophoresis under denaturing conditions or by high-performance liquid chromatography (HPLC). The templates for transcription of unmodified yeast tRNA_i_^Met^ and HIV-1 trans-acting responsive element (TAR) RNA were generated by PCR, and RNA was synthesized using the MEGAshortscript T7 Transcription Kit (Invitrogen). RNA was purified by denaturing gel electrophoresis (8 M urea) in a 1× TBE buffer, eluted from the gel matrix, and concentrated by ethanol precipitation. RNA 3′-end labeling with a fluorescent dye was conducted for 24 h at 4°C in an 18 μl reaction. The reaction mixture included 20 U of T4 RNA ligase (Thermo Fisher Scientific Inc.), 1× T4 RNA Ligase Buffer, 20 μM ATP, 40 μM pCp-Cy5 or pCp-Cy3 (Jena Bioscience), and 50 pmol of RNA. The labeled RNA was purified using a Monarch RNA Cleanup Kit (New England BioLabs). The purified RNAs were stored at −20°C.

The 21D+/R+ oligonucleotides [Cy5, 5′-ATGTGGAAAATCTCTAGCAGT-3′], the complementary 21D−/R− oligonucleotides [Cy3, 5′-ACTGCTAGAGATTTTCCACAT-3′], TAR(−) DNA [Cy5, 5′-GGG TTC CTT GCT AGC CAG AGA GCT CCC GGG CTC GAC CTG GTC TAA CAA GAG AGA CC-3′], and TAR(+) DNA [5′-GGT CTC TCT TGT TAG ACC AGG TCG AGC CCG GGA GCT CTC TGG CTA GCA AGG AAC CC-3′] corresponding to the TAR sequence of HIV-1MAL were obtained from Merck (Germany).

Dr Jason D. Shepherd, University of Utah, provided plasmid pGEX-6p1-Arc containing the open reading frame of full-length rat ARC (accession NP_062234.1). Plasmid pGEX-6p1-ArcΔMA for ARC-derived protein (amino acids141–396 of full-length rat ARC) expression was generated by GenScript (USA). Plasmids were transformed into E. coli BL21(DE3)pLysS cells (Invitrogen).

The Ty3 NCp9 protein (57 amino acids: TVRTRRSYNKPMSNHRNRRNNNPSREECIKNRLCFYCKKEGHRLNECRARKASSNRS) was prepared by chemical synthesis and purified by HPLC (GenScript, USA). NCp9 stocks were reconstituted at 0.8 mg ml^−1^ in an NC storage buffer [20 mM HEPES, pH 8.0, 150 mM NaCl, 5 mM dithiothreitol (DTT), 0.15 mM ZnCl_2_, and 10% glycerol], aliquoted, and stored at −80°C.

Expression and purification of recombinant proteins

The starter bacterial cultures were grown overnight at 37°C in Luria–Bertani (LB) medium supplemented with ampicillin and chloramphenicol. The fresh overnight culture was diluted 1:100 into 6 liters of LB medium and grown in an orbital shaker at 37°C and 150 rpm to an OD_600_ of 0.6–0.7. After adding isopropyl-β-d-thiogalactopyranoside (IPTG; 1 mM), the cultures were shifted to 16°C for 18–20 h. Cells were pelleted by centrifugation at 4000 g for 10 min at 4°C and resuspended in lysis buffer [50 mM HEPES, pH 7.0, 1 M NaCl, 5 mM β-mercaptoethanol, 1 mM DTT, 1% Tween-20, 0.5 mg ml^−1^ lysozyme, and protease inhibitor cocktail (Roche)]. The cell suspension was sonicated with 60 pulses of 3 s each, with a 17 s pause after each pulse. Debris was removed by centrifugation at 30 000 g for 30 min at 4°C. The nucleic acids were precipitated from the supernatant by slowly adding a 5% poly(ethyleneimine) (PEI) solution, pH 7.9, until it reached a final concentration of 0.45%. Then, the mixture was incubated at 4°C for 30 min and centrifuged at 35 000 g for 30 min at 4°C. This nucleic acid removal step was repeated twice. The cleared supernatants were passed through a 0.45 μm filter and then incubated with pre-equilibrated Glutathione Sepharose 4 Fast Flow (Cytiva) in a gravity flow column for 3 h at 4°C. The Sepharose column was washed with 12 volumes of wash buffer (50 mM HEPES, pH 7.0, 1 M NaCl, 5 mM β-mercaptoethanol, 1 mM DTT, 1.2% Tween-20) and 5 column volumes of wash buffer without detergent. Glutathione S-transferase (GST)–ARC or GST–ARCΔMA was eluted from Sepharose using a buffer containing 50 mM reduced l-glutathione. Subsequently, the elution buffer was exchanged for protein storage buffer (50 mM HEPES, pH 7.0, 1 M NaCl, 5 mM β-mercaptoethanol, 1 mM DTT), then the protein was concentrated, aliquoted, and stored at −80°C.

The GST protein was also expressed alone in the E. coli BL21(DE3)pLysS strain (Invitrogen). The overnight culture was used to inoculate a large-scale 3 liter culture of LB medium that was grown to an OD_600_ of 0.6–0.7. Following the addition of IPTG (1 mM), the culture was further grown at 37°C for 4–5 h. GST protein was purified according to the protocol developed for ARC proteins, omitting the PEI addition step.

Dynamic light scattering

Dynamic light scattering (DLS) measurements were performed using a Zetasizer Nano instrument (Malvern Instruments Ltd, UK). Each sample was measured five times, with a 30 s delay between measurements, and each measurement consisted of 12 instrument runs (12 × 10 s). Prior to the experiment, the samples were filtered and then tested at 150 mM NaCl (binding buffer) or 1 M NaCl (protein storage buffer). Data were analyzed as intensity distributions; however, mass distributions were additionally calculated to estimate the relative abundance of particle populations of different sizes.

Microscale thermophoresis

Cy5-labeled RNA was denatured in water by heating at 90°C for 2 min, placed on ice for 3 min, then adjusted to 40 nM with binding buffer (30 mM HEPES, pH 7.0, 150 mM NaCl, 10 mM DTT, 2 mM MgCl_2_, 0.1% Pluronic F-127), and incubated at 37°C for 15 min. GST–ARC was dissolved in the binding buffer, and a series of sixteen 2-fold dilutions was prepared using the same buffer. Each protein dilution was mixed with one volume of labeled RNA, leading to a final RNA concentration of 20 nM and final protein concentrations ranging from 0.000275 to 9 μM (from 0.00055 to 18 μM for ARCΔMA). For competition experiments, unlabeled total yeast RNA (Invitrogen) was folded separately and combined with Cy5-labeled RNA before being added to protein dilutions. Following incubation at 37°C for 15 min, the samples were loaded into standard Monolith NT.115 Capillaries (NanoTemper Technologies) according to the manufacturer’s instructions. Microscale thermophoresis (MST) was measured using a Monolith NT.115 instrument (NanoTemper Technologies) set to 22°C. Instrument parameters were adjusted to 40–100% LED (depending on the labeled RNA used in the reaction) and a medium MST power setting. The obtained data were analyzed with MO.Affinity Analysis software (version 2.3, Nano-Temper Technologies) using the signal from an MST on time of 5 s. The dissociation constant was then determined using a Hill model to fit the curve.

SHAPE-based footprinting

In vitro-transcribed RNA samples (10 pmol) were denatured by heating at 90°C for 2 min in water, followed by a 3 min incubation on ice. Next, a folding buffer (30 mM HEPES, pH 7.0, 2.5 mM MgCl_2_, 75 mM NaCl) was added, and RNA was incubated for 15 min at 37°C. Folded RNA samples were diluted 2.5× with 30 mM HEPES, pH 7.0. Subsequently, 1000 pmol of GST–ARC or protein storage buffer in a total volume of 16 μl was added to a 146 μl reaction, and samples were incubated for 15 min at 37°C. Each reaction was divided into two tubes and treated with NAI in dimethyl sulfoxide (DMSO) at a final concentration of 100 mM or with DMSO alone. Reactions were incubated for 15 min at 37°C and quenched by adding one volume of 1 M DTT. RNA was purified using TRI Reagent solution (Invitrogen) and the Monarch RNA Cleanup Kit (New England Biolabs). RNA was eluted with 20 µl of water. Detection of 2′-O-adducts and data processing were performed as described below.

Amplicon SHAPE-MaP

Reverse transcription was performed as described previously [30]. In brief, 10 µl of RNA was mixed with 1 μl of the corresponding 2 μM reverse primer. Primers were annealed at 65°C for 5 min, and then the mixture was cooled to 4°C, followed by the addition of 8 μl of 2.5× MaP buffer (125 mM Tris, pH 8.0, 187.5 mM KCl, 15 mM MnCl_2_, 25 mM DTT, and 1.25 mM dNTPs) and incubation at 42°C for 2 min. After adding 1 μl of SuperScript II reverse transcriptase (Invitrogen), 20 μl of the total reaction was incubated at 42°C for 3 h. Then, the enzyme was heat-inactivated at 70°C for 15 min. The generated cDNA was purified using the ZR DNA Sequencing Clean-Up Kit (Zymo Research). Then, 1 μl of 4 M NaOH was added to each cDNA sample and incubated at 95°C for 5 min to degrade the RNA. Next, the reaction was cooled on ice, and 2 μl of 2 N HCl was added to neutralize it. cDNA was again purified using the ZR DNA Sequencing Clean-Up Kit. One-tenth of purified cDNA was used as a template for PCR with amplicon-specific primers and NEBNext Ultra II Q5 Master Mix (New England BioLabs). For +1–2926 Arc mRNA, four primer pairs for overlapping amplicons were designed covering the +1–748 region. The primers contained 8 nt indexes for demultiplexing. For F1, F1ΔhUTR, and F1ΔUTR transcripts, a single primer pair was used. All primers are listed in Supplementary Table S2. The double-stranded DNA (dsDNA) amplicons were purified with a PCR/DNA Clean-Up Purification Kit (Eurx) and visualized on a 1.2% agarose gel. The accurate concentration of dsDNA amplicons was measured using a Qubit dsDNA High Sensitivity assay and Qubit™ 4 Fluorometer (Invitrogen). The dsDNA amplicons from experiments with +1–2926 Arc mRNA were pooled in equimolar proportions.

Illumina sequencing

The dsDNA amplicons were processed toward downstream library preparation by Novogene (UK) or Macrogen Europe (The Netherlands). Sequencing was performed on the Illumina NovaSeq 6000 system, outputting 2 × 150 or 2 × 250 paired-end datasets. To obtain sufficient sequencing depth in some cases, 2–3 rounds of library sequencing were performed.

MaP analysis

A quality assessment was performed using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). All raw sequencing data were analyzed using the ShapeMapper 2 (version 2.3) pipeline [31] and aligned to the rat Arc mRNA sequence (accession Z46925.1). The read-depth threshold setting of 5000 was used as a quality control benchmark. All libraries passed the three quality control checks of ShapeMapper 2. The SHAPE reactivities for the two replicates were averaged.

RNA structure modeling

The SuperFold software [30], based on algorithms from RNAstructure [32], was used to predict the RNA minimum-free energy (MFE) structure, compute base-pairing probabilities, and define lowSS regions, incorporating SHAPE reactivities as pseudo-energy constraints. A default value of a maximum pairing distance of 600 nt was imposed to force the prediction of local base pairs. The folding slope and intercept parameters were set to 1.8 and –0.6 kcal mol^–1^, respectively. The sensitivity and positive predictive value (PPV) for the obtained MFE models were calculated using the scorer function (implemented in RNAstructure). RNA structures were visualized with StructureEditor (RNAstructure package) [32] and VARNA [33].

RNA 3D structure models were generated using RNAComposer [34–36], with up to 100 candidate structures predicted per target via its automated pipeline, guided by experimentally curated secondary structures. The corresponding 2D structure for each model was extracted using RNApdbee 2.0 [37], followed by selecting up to five preliminary candidates based on two criteria: the highest agreement between predicted and input 2D structures and the lowest total energy values computed with XPLOR-NIH [38, 39]. Structures displaying steric clashes or topological issues (entanglements of structure elements flagged by RNAspider [40]) were excluded from further analysis. For the most energetically favorable 3D structure candidate, secondary structure elements, such as internal and multi-branched loops, were cataloged, and missing motifs were identified by querying specialized repositories: RNA FRABASE [41] and RNAloops [42]. A specialized remodeling protocol was employed to integrate these motifs, combining template-based replacement of poorly resolved regions and energy minimization to refine the hybrid structure. This iterative refinement produced a set of up to 10 candidate 3D models, from which the final structure was selected based on the absence of topological anomalies and the most favorable energy profile. We evaluated standard stereochemical quality metrics [38, 39, 43] and 2D–3D consistency measures (INF; [44]) for the resulting 3D models. Visualizations of RNA 3D structures were prepared using PyMOL Molecular Graphics System (Version 3.0, Schrödinger, LLC).

Statistical analysis

To test for global effects of using different RNAs and the presence of competing RNA on EC_50_ (half-maximal effective concentration) values, extra sum-of-squares F-tests on nested non-linear models were used: (i) RNA main effect test; (ii) competitor main effect test; and (iii) RNA × competitor interaction test. The analyses were performed in Python (version 3.13.2) using pandas (version 2.2.3), SciPy (version 1.15.1), Matplotlib (version 3.10.0), and lmfit (version 1.3.4). Statistical analyses of SHAPE-MaP data, including Pearson’s correlation, Wilcoxon rank sum test, and Gini index calculation, were computed with GraphPad Prism 8 (GraphPad Software) and Python (version 3.13.2) using pandas (version 2.2.3), NumPy (version 2.2.2), SciPy (version 1.15.1), Statsmodels (version 0.14.4), and Matplotlib (version 3.10.0). Statistical significance was defined as a P-value < 0.05 (* represents P-value < 0.05; ** represents P-value < 0.01, and *** represents P-value < 0.001). The ΔSHAPE framework [45] was used to detect statistically significant sites of RNA–protein interactions.

Sequence conservation analysis

For a set of homologous Arc mRNA sequences from organisms belonging to various taxonomic orders (Primates, Rodentia, Carnivora, Artiodactyla, Perissodactyla, and Chiroptera), multiple sequence alignment (MSA) was performed using the Clustal Omega web server hosted by EMBL-EBI [46] under default parameters. The resulting alignment (.aln) and associated log were downloaded and processed locally. This .aln file was parsed by our custom Python script, which utilizes Biopython (version 1.85) to map alignment columns to reference coordinates and compute the per-position Shannon entropy across the alignment. Entropy values were smoothed using a 55 nt sliding window, exported into a .csv file, and rendered as a one-row heatmap via Matplotlib (version 3.10.0).

Molecular docking

The tertiary structure of the ARC protein was predicted using AlphaFold 3 [47]. The electrostatic surface potential of the ARC protein was calculated using the APBS Electrostatics Plugin [48] within PyMOL following standard protocols. Electrostatic potentials were mapped onto the solvent-accessible surface using a −5 to +5 k_BT/e color space (red to blue).

For RNA–protein complex docking, we used the HADDOCK 2.4 web server [49, 50] under its standard “Expert” protocol. To improve sampling and reduce conformational complexity, input structures comprised the predicted structure of the protein and four distinct RNA fragments derived from the RNAComposer-modeled structure of Arc mRNA Domain 1. Fragments were selected based on the ΔSHAPE analysis data to cover putative binding regions, and each fragment was submitted in a separate docking run. Based on surface potential calculations (APBS), amino acid residues 331–338 were indicated as “active”, with neighboring residues automatically assigned as “passive”. “Active” nucleotides were defined based on footprinting results; automatic assignment of “passive” RNA residues was disabled. Clustering was performed on backbone root mean square deviation (RMSD) with a 20 Å cut-off, and clusters were ranked by their HADDOCK score. The top cluster from each docking, comprising 40–79 structures (cluster size ≥ 4), showed consistent interfacial contacts and the lowest average HADDOCK score. Representative models from these clusters were selected for further analysis and visualization using the PyMOL Molecular Graphics System (Version 3.0, Schrödinger, LLC), along with a script from PyMOLWiki for identifying interface residues.

Strand displacement assay

The Cy5-labeled 21D+ and Cy3-labeled 21D− oligonucleotides (or 21R+ and 21R− for RNA) were incubated at 37°C in annealing buffer (50 mM Tris, pH 8.0, 3 mM MgCl_2_, and 1 mM DTT) for 30 min (50 nM each for one reaction). Then, unlabeled D− or R− was added in a 10-fold molar excess, and reactions were incubated at 37°C in the absence or presence of protein (5 µM) for up to 30 min for DNA and 45 min for RNA. At the indicated time points, 10 µl aliquots were removed and mixed with 5 µl of stop solution [20% glycerol, 20 mM EDTA, pH 8.0, 0.2% sodium dodecylsulfate (SDS), and 0.4 mg ml^−1^ yeast tRNA) to denature protein and induce its release from the oligonucleotides. For assays with increasing protein concentration, the reactions were incubated at 37°C for 5 min for DNA and 45 min for RNA. The samples were analyzed by native polyacrylamide gel electrophoresis (PAGE; 15% w/v) in 0.5× TBE at 4°C using DNApointer (Biovectis).

TAR annealing assay

The substrates were heated separately at 95°C for 2 min, then chilled on ice for 3 min, and subsequently incubated at 37°C for 10 min after the addition of buffer A (20 mM HEPES, pH 7.5, 30 mM NaCl, 0.1 mM MgCl_2_, and 5 mM DTT). Following this, the Cy5-labeled antisense TAR(−) DNA oligonucleotide (60 nM) and complementary unlabeled RNA or sense DNA (60 nM) were mixed and, after adding 1 µl of protein or protein storage buffer, 10 µl of the total reaction was incubated at 37°C for 15 min. After that, 5 μl of stop solution (20% glycerol, 20 mM EDTA, pH 8.0, 0.2% SDS, and 0.4 mg ml^−1^ yeast tRNA) was added. The samples were analyzed by native PAGE (8% w/v) in 0.5TAR× TBE at 4°C using DNApointer (Biovectis).

Ty3 RNA dimerization and tRNAiMet annealing assays

Cy3-labeled Ty3 RNA (0.5 pmol) was refolded in a buffer containing 40 mM Tris–HCl, pH 8.0, and 130 mM KCl. The RNA was heated at 95°C for 3 min, then slowly cooled to 60°C, placed on ice for 2 min, and subsequently incubated at 37°C for 30 min after the addition of 4 mM MgCl_2_. For assays in the presence of tRNA_i_^Met^, Cy5-labeled tRNA was folded separately under equivalent conditions. Ty3 RNA was combined with tRNA_i_^Met^ at a 1:1 molar ratio prior to protein addition. RNAs were incubated with increasing protein concentrations at 37°C for 30 min. The reactions were stopped by incubation with a quenching solution (1% SDS, 5 mM EDTA) at room temperature for 5 min. For experiments with the GST–ARC protein, quenching was preceded by a 15 min incubation at room temperature with proteinase K (Thermo Scientific) and a 100-fold excess of yeast tRNA (Invitrogen). The samples were extracted with phenol/chloroform, and 15 μl of aqueous phase was mixed with 3 μl of 25% Ficoll 400. RNA was resolved on a 1.3% agarose gel in 0.5× TB at room temperature.

Data processing for gel-based assays

The gels were quantified by imaging using an Amersham Typhoon 5 Biomolecular Imager with ImageQuantTL v10.1 software (Cytiva). The obtained data were analyzed using GraphPad Prism 8 (GraphPad Software) and OriginPro 8.5 (OriginLab). In all cases, at least three independent experiments were performed, and the data presented are representative of the whole. The reproducibility of the experiments was assessed by standard deviation (SD).

Results

RNA binding specificity of ARC and identification of Arc mRNA regions involved in interactions with ARC

To characterize the RNA binding properties of rat ARC, we used MST. This highly sensitive technique measures binding affinities by monitoring changes in molecular motion within a temperature gradient upon complex formation [51]. The MST assays were performed using recombinant rat ARC protein fused to GST–ARC (hereafter referred to as “ARC”) because removal of the tag significantly reduced protein stability and caused precipitation at < 1 M salt at neutral pH. GST alone showed no detectable RNA binding in the MST assay (Supplementary Fig. S2). DLS measurements showed that ARC was predominantly monomeric in the 1 M salt storage buffer, but shifted to ∼12-mers and larger assemblies under the MST assay conditions (Supplementary Fig. S3).

The MST assays were conducted under conditions similar to those used in RNA binding studies of retroelement Gag proteins [52–55]. Initially, we tested five transcripts of similar length (F1–F5), representing distinct regions of rat Arc mRNA, to localize those that most strongly contribute to ARC association and specificity (Fig. 1A). A transcript derived from yeast 18S rRNA was used as a control. The fluorescently labeled RNAs were titrated with increasing concentrations of ARC, yielding measurable thermophoresis changes. Using the Hill equation, we estimated the average EC_50_ value for each ARC–RNA complex (Fig. 1B; Supplementary Table S3). We observed a significant main effect of RNA (F = 20.33, P-value = 1.1 × 10⁻^16^), indicating that the apparent EC_50_ differs among RNAs. We found that ARC binds all Arc mRNA fragments with high affinity, and the binding signal reached a plateau at ~1100 nM protein concentration in each case (Supplementary Fig. S4). Surprisingly, the lowest EC_50_ value was observed for 18S rRNA, with the binding plateau reached at 560 nM ARC.

MST analysis of ARC binding to individual transcripts and the effect of the competitor on ARC–RNA interaction. (A) The scheme presents Arc mRNA transcripts used in the studies. (B) The heatmap shows EC50 values for ARC–RNA complexes without competitive RNA and in the presence of 150-, 250-, and 350-fold molar excesses of total yeast RNA. (C) The bar plot shows the fold change [mean ± standard error of the mean (SEM)] in EC50 values across ARC–RNA complexes at increasing excess of total yeast RNA. (D) The heatmap of EC50 values and bar plot of the fold change of the EC50 values (mean ± SEM) in the presence of competing RNA for ARC protein complexes with F1 transcript and its variants.

To further investigate the RNA binding specificity of ARC, we employed competition assays using S. cerevisiae total RNA as a binding competitor. Using MST, we examined the competing RNA efficacy in inhibiting complex formation between the fluorescently labeled transcripts and the protein (Fig. 1B, C; Supplementary Table S3). We found that the competitor significantly affects each ARC–RNA complex (P-value 1.1 × 10^⁻16^), as evidenced by increased EC_50_ values, and the magnitude of this effect depends on the RNA tested (P-value 9.66 × 10^⁻6^). The largest EC_50_ fold change we measured was for 18S rRNA, indicating much lower ARC binding specificity to non-cognate RNA. Among the Arc transcripts, F1, F3, and F4 exhibited the smallest increases in EC_50_ at a 150-fold competitor excess. When the competitor excess was increased to 250-fold, F1 and F4 continued to show the smallest and comparable EC_50_ increases. However, at the highest level of competitor excess (350-fold), F1 remained the least affected by competition, whereas other Arc mRNA fragments exhibited >30-fold increases in EC_50_.

The least sensitivity of ARC–F1 RNA complexes to the competitor raises the possibility that the determinants of specific recognition are located, at least in part, within the first 551 nt of Arc mRNA. Driven by these findings, we tested two additional Arc mRNA fragments: F1ΔhalfUTR (F1ΔhUTR) and F1ΔUTR. They partially overlap F1, with F1ΔhUTR lacking the 5′-proximal half of the 5′ UTR, while F1ΔUTR begins with the start codon and includes an additional 197 nt of coding sequence to match the length of F1 (Fig. 1A). The EC_50_ values determined for ARC complexes with these transcripts were slightly higher than for F1, indicating a slight drop in the binding strength (Fig. 1D; Supplementary Table S3). In the competition experiments, we observed that F1ΔhUTR was even more resistant to competition than F1 across the tested competitor range, whereas F1ΔUTR was more sensitive. Collectively, these findings suggest that nucleotides essential for mediating specific interactions with ARC are most likely to be located in the +82–551 region of Arc mRNA, while the 5′-proximal half of the 5′ UTR may have contributed competitor-sensitive, less specific contacts. It is also conceivable that deletions within the 5′ UTR can induce alterations in RNA structure, and some of the observed effects may stem from structural changes rather than sequence-encoded determinants alone. This hypothesis was examined in the subsequent sections of this study.

For all tested transcripts, the Hill coefficient (n_H_) was >1 (Supplementary Table S3), showing that RNA binding by ARC in vitro is cooperative regardless of RNA type. Nevertheless, in the experiments without the competitor, the calculated n_H_ was lower for 18S rRNA (1.5) than for Arc transcripts (1.9–2.5). The n_H_ is not a direct measure of the number of binding sites but rather a measure of the extent of cooperativity [56, 57]. Thus, this observation suggests a weaker cooperativity in 18S rRNA binding, reflecting lesser interactions between ARC proteins during RNA binding. In competitive binding, the n_H_ value decreased for all Arc transcripts, indicating that the presence of competing RNA reduced overall binding cooperativity.

Identification of ARC-binding sequences within the 5′ end of Arc mRNA

To determine the RNA sequence and structure determinants of ARC binding in the 5′ region of Arc mRNA, we utilized the SHAPE-MaP technique (Selective 2′-Hydroxyl Acylation analyzed by Primer Extension and Mutational Profiling), which combines RNA chemical modifications with mutational profiling and massively parallel sequencing [30, 58]. We used conditions optimized for MST assays for RNA folding and the formation of ribonucleoprotein (RNP) complexes. The structural mapping was performed using an in vitro transcript of 2926 nt, corresponding to nearly full-length Arc mRNA (lacking the final 79 nt at the 3′ terminus), in two states: RNA alone and in the complex with ARC. The SHAPE-MaP datasets obtained had a median effective depth of > 400 000 reads per site and were highly reproducible, with reactivity profiles strongly correlated between replicates in both states (unbound state r = 1; bound state r = 1; Supplementary Fig. S5).

We observed a high global Pearson correlation coefficient of 0.97 between states, indicating that protein binding does not significantly remodel the overall Arc mRNA structure. The spread and overall shape of the SHAPE reactivity distributions were largely unchanged between the two RNA states (Fig. 2A). Analysis of the Gini index revealed that protein binding led to more intermediate values at the expense of extreme values (Fig. 2B). Next, we performed a sliding window analysis to distinguish random fluctuations from more systematic, local changes in SHAPE reactivity (Fig. 2C). As expected from the global similarity, the obtained profiles were largely consistent across states but pinpointed several sites with differences. To assess their statistical significance, we applied the ΔSHAPE framework [45], which quantifies the difference between the unbound and bound states. Positive ΔSHAPE values indicate nucleotides that interact with protein and are protected from modification (reduced reactivity in the bound state), while negative ΔSHAPE values indicate locally increased structural flexibility of RNA due to protein binding (enhanced reactivity in the bound state). We determined four ARC-binding sequences in the analyzed region of Arc mRNA, ranging in length from short 11–19 nt stretches and located in the 5′ UTR (Sites 1 and 2) to longer segments exceeding 50 nt within the CDS (Sites 3 and 4). Notably, unlike Sites 1 and 2, those from the CDS co-localize well with highly conserved nucleotide blocks (Fig. 2C), underscoring their likely functional importance. Beyond that, we identified two regions where protein binding significantly relaxed RNA structure, thereby increasing SHAPE reactivity.

*SHAPE-based comparison of Arc mRNA in unbound and bound states. (A) SHAPE reactivity distributions with medians. (B) Gini index distributions with medians. Significance was computed by the Wilcoxon rank-sum test; *P-value < 0.05; **P-value < 0.001. (C) Profiles of the median SHAPE (upper plot), Gini index distributions (upper middle plot), Pearson correlation (lower middle plot), and ΔSHAPE (lower plot), smoothed with a 55 nt sliding window. The gray shading indicates regions with no data. Below, a conservation sequence score calculated with a 55 nt sliding window for Arc mRNA is presented. (D) Logo of the RNA sequence present across the determined ARC-binding sites generated with MEME Suite.

A comprehensive analysis of the protein-binding sequences revealed an overall nucleotide composition of 38% G, 22% C, 20% A, and 20% U. The full-length Arc mRNA shows a more balanced distribution: 27% G, 31% C, 22% A, and 19% U. Using MEME Suite (Multiple EM for Motif Elicitation) [59], we identified a 10 nt motif, CWCGWRYGGA (e-value 0.01), which is consistently present across the determined binding sequences (Fig. 2D). Cytosine predominates at the first position, while guanine is notably enriched at several other positions, alongside cytosine. These findings suggest that the ARC protein recognizes GC-rich segments, potentially augmented by uracil or adenine at specific positions.

The structural context of ARC-binding sites

In the next step, we used the SHAPE-MaP datasets as folding constraints to predict the SHAPE-directed merged MFE secondary structure model and base-pairing probabilities of the 5′ region of the Arc mRNA in both unbound and bound states using the SuperFold pipeline [30]. In accordance with model-free analysis (Fig. 2A–C), the MFE model of the protein-bound Arc mRNA was nearly indistinguishable from that of the unbound state (Fig. 3; Supplementary Fig. S6). The PPV (percentage of modeled base pairs in the accepted structure) and the sensitivity (percentage of accepted base pairs modeled correctly) were equal to 99.89% and 99.78%, respectively. This confirmed that ARC binding induces only a very subtle local rearrangement of the Arc RNA structure. We noticed that nucleotides 9–602 form a stable self-folding domain (Domain 1) with 79% of base pairings occurring with a high probability (≥ 80%) (Fig. 3A). Within Domain 1, we identified a highly stable and structured region, characterized by low SHAPE reactivity and low Shannon entropy (lowSS region), located 35 nt downstream of the start codon (Fig. 3A, B). A second lowSS region was found directly downstream of Domain 1. ΔSHAPE analysis showed that ARC-binding sequences are located in close proximity (Sites 2 and 4) or overlap (Site 3) with the identified lowSS regions. Surprisingly, we also found that ARC binding results in significant increases in SHAPE reactivity within lowSS regions. It suggests partial structural relaxation of stable RNA motifs upon ARC binding. Analysis of protein-binding sequences, combined with the secondary structure model of the 5′ end of Arc mRNA, revealed that ARC has no apparent preference for single- or double-stranded RNA regions (Fig. 3B).

Analysis of ARC-binding sites within the 5′ end of Arc mRNA. (A) SHAPE-based prediction of the Arc mRNA 5′ region structure: arc diagrams of predicted MFE structure and base-pairing probabilities (see scale); median SHAPE reactivity profile with respect to the global median reactivity and Shannon entropy profile smoothed in a 55 nt sliding window. Shadings mark lowSS regions. Red and navy blue boxes represent regions of significant changes identified in the ΔSHAPE analysis. (B) 2D structure model of +1–607 Arc mRNA. Regions protected by protein are marked in red. The lowSS region is marked in purple. The start codon is marked in green. (C) 3D model of +1–607 Arc mRNA predicted with RNAComposer. Nucleotides identified in the ΔSHAPE analysis as ARC-binding sites are emphasized as red spheres. The lowSS region is marked in purple.

Using the SHAPE-directed MFE secondary structure of unbound Arc mRNA Domain 1 as the input, we generated an appropriate 3D model using RNAComposer (Fig. 3C) [34–36]. The resulting model was physically plausible and internally consistent: it contained no steric clashes or geometric outliers and preserved the input base-pairing pattern (Supplementary Table S4). We found that the nucleotides of the lowSS region are arranged in a U-shaped tertiary architecture comprising two roughly coaxial solvent-exposed helices connected by a horizontally oriented bridging helix. The ARC-binding sites are located on either side of the U-shaped motif and at the base of one of the coaxial helices. This binding geometry probably requires local loosening of the U-shaped helical scaffold to provide steric access for ARC, consistent with the ARC-dependent increases in SHAPE reactivity observed within the helices forming the motif (Figs 2C and 3A).

Identification of ARC-binding sites critical for specific recognition of Arc mRNA

To assess how the protein-protected sequences in Arc mRNA contribute to ARC binding specificity, we also performed footprinting experiments on F1, F1ΔhUTR, and F1ΔUTR RNAs. As we observed for Arc mRNA, SHAPE reactivity distributions revealed minimal changes upon protein binding for F1ΔhUTR and F1ΔUTR (Supplementary Fig. S7), and the overall correlation between unbound and bound states was high (r = 0.92 for F1ΔhUTR, r = 0.82 for F1ΔUTR). As with the Arc mRNA, the Gini index revealed greater differences, with median values changing by 0.02 upon protein binding. F1 also showed a high overall correlation (r = 0.86), but with larger shifts in both SHAPE reactivity and Gini index distributions. MFE secondary structure predictions revealed that F1 and F1ΔhUTR adopt nearly identical structures in unbound and bound states (F1: PPV = 97.59%, sensitivity = 93.10%; F1ΔhUTR: PPV = 97.92%, sensitivity = 98.60%), whereas F1ΔUTR shows a more divergent folding between states (PPV = 64.29%, sensitivity = 66.26%) (Supplementary Fig. S8). We found that all three transcripts contain a lowSS region that largely overlaps the lowSS identified in Domain 1 of Arc mRNA. F1ΔUTR extends beyond Domain 1 and contains an additional lowSS region that corresponds to the structured region immediately downstream of Domain 1 in Arc mRNA.

Using ΔSHAPE analysis, we identified regions with statistically significant changes in SHAPE reactivity upon protein binding (Fig. 4A). For F1, we identified seven short ARC-binding segments, including two that partially overlap with Sites 2 and 3 defined in Arc mRNA. In contrast, each truncated variant of F1 contained a single primary protein-binding site: in F1ΔhUTR, it partially overlaps with Site 3, whereas in F1ΔUTR, it aligns with Site 4 identified for Arc mRNA. Notably, despite the presence of the Site 3 sequence in F1ΔUTR, we observed no ΔSHAPE values consistent with ARC binding at this locus, suggesting that the local structural context disfavors ARC association. Consistent with this interpretation, 2D structure models indicated that the structural context of the Site 3 sequence differs across these three analyzed transcripts and from that observed in the corresponding region of Arc mRNA (Figs 3B and 4B). However, only that characteristic of F1ΔUTR is associated with the loss of the ARC binding at Site 3. When the structural context is perturbed, ARC can utilize Site 4 as an alternative, underscoring that structure and accessibility, not sequence alone, govern site usage. However, given that MST assays showed that F1ΔUTR interactions with ARC are less specific than those of F1 and F1ΔhUTR, which both engage Site 3, our data indicate that Site 3 is necessary for the specific recognition of Arc mRNA by ARC. Across all three transcripts, we observed ARC-induced relaxation of RNA structure within lowSS regions. In the case of F1ΔUTR, an additional site with a destabilized structure was detected adjacent to the protein-binding site, but not directly within a lowSS region.

Analysis of ARC-binding sites within F1, F1ΔhUTR, and F1ΔUTR transcripts. (A) Arc diagrams of predicted MFE structure and base-pairing probabilities (see scale); median SHAPE reactivity profile with respect to the global median reactivity, Shannon entropy profile, and ΔSHAPE profile smoothed in a 55 nt sliding window. Shadings mark lowSS regions. The red and navy blue boxes below represent regions of significant changes identified in the ΔSHAPE analysis for Arc mRNA. (B) Structural context of the ARC-binding Site 3 identified for Arc mRNA within F1, F1ΔhUTR, and F1ΔUTR transcripts. (C) 3D models of F1, F1ΔhUTR, and F1ΔUTR RNA predicted using RNAComposer. Nucleotides identified in the ΔSHAPE analysis as ARC-binding sites are emphasized as red spheres. The lowSS region is marked in purple. The nucleotide numbering in the figure follows the full-length Arc mRNA sequence.

The 3D models of unbound F1, F1ΔhUTR, and F1ΔUTR RNAs demonstrated solid stereochemistry and preserved the original base-pairing pattern (Fig. 4C; Supplementary Table S4). The model of F1ΔhUTR showed that the lowSS region forms the top of the solvent-exposed helix, and the protein-binding site is located at its base. In the F1ΔUTR model, the lowSS region adopts a U-shaped tertiary architecture similar to that identified in the Arc mRNA, with different nucleotide positions involved in its formation, according to the Arc mRNA sequence. Protein-binding sites are located immediately adjacent to this structural motif. F1 exhibits a somewhat different 3D arrangement: the lowSS region also forms a helix, but multiple other helices are exposed, consistent with additional binding sites unique to F1.

Identification of the RNA-binding regions in ARC

The involvement of ARC protein regions in RNA binding remains unknown. To investigate the role of the predicted MA-like domain [8] in the ARC–RNA interactions, we engineered a truncated variant of the rat ARC protein called ARCΔMA. This variant features a deletion of 140 amino acid residues from the N-terminus (Fig. 5A). Using MST, we analyzed the binding affinity of ARCΔMA for the F1 RNA and 18S rRNA and compared it with MST data obtained for the full-length ARC. The EC_50_ values for ARCΔMA complexes with RNAs were, on average, 4.3-fold higher than those for the ARC, indicating a decrease in the binding affinity (Fig. 5B; Supplementary Fig. S9; Supplementary Table S5). Nevertheless, the binding specificity was maintained as the EC_50_ increased significantly less for the ARCΔMA—F1 complexes than for those with 18S rRNA (Fig. 5B, C; Supplementary Fig. S9; Supplementary Table S5). The n_H_ values were >1 across all experiments with ARCΔMA (Supplementary Table S5) but lower than those calculated for ARC–RNA complexes, suggesting a diminished binding cooperativity. Similar to ARC, we observed a drop in n_H_ with increasing concentrations of competing RNA. Such a decrease in n_H_ value was not detected for 18S rRNA. Collectively, these data demonstrate that RNA binding and the specificity of these interactions are mediated not solely by the MA but also by other regions of ARC.

Results of analysis of the RNA-binding domain of the ARC protein. (A) The amino acid sequence of ARC, with the MA-like domain (green), CA-like domain (purple), and oligomerization site (cyan) highlighted. The red arrow indicates the beginning of the ARCΔMA protein sequence. Red shading indicates amino acid residues predicted to interact with RNA in the docking of ARC–RNA complexes. (B) The heatmap presents EC50 values for ARCΔMA–RNA complexes calculated from binding curves obtained in experiments without competitive RNA and in the presence of total yeast RNA, fitted to the Hill equation. (C) The bar plot shows the fold change (mean ± SEM) of EC50 values for ARCΔMA protein complexes with the tested transcripts at 25-, 50-, 100-, and 150-fold excess of total yeast RNA. (D) The electrostatic potential mapped on the full-length ARC protein surface, calculated using the APBS Electrostatics Plugin in PyMOL (two-perspective view). Blue indicates regions of positive potential (up to +5), whereas red depicts negative potential values (up to −5). (E) The model of the ARC–RNA (binding Site 3) complex obtained with the HADDOCK 2.4 web server. Intermolecular interaction sites are marked in red. Amino acid residues defined as necessary for oligomerization are marked in cyan.

To further investigate the ARC region involved in interactions with RNA, we estimated the ARC 3D model using AlphaFold 3 [47] and found that it was consistent with the SAXS-based model [28] and the crystallographic structures of ARC truncated variants [19, 20, 60]. Next, we calculated the electrostatic surface potential of ARC using APBS [48] and determined four distinct, positively charged regions (patches I–IV; Fig. 5D) as potential RNA-binding regions by analogy to other Gag and Gag-like proteins [25, 61, 62]. Patches I–III are located within the predicted MA-like domain. In contrast, patch IV is located on the surface of the CA-like domain and is the only one retained in the ARCΔMA protein. Since ARCΔMA retained the ability to specifically bind RNA in MST assays, we implicated patch IV as a candidate for RNA binding.

Next, we employed the HADDOCK 2.4 web server [49, 50] to model the ARC–RNA complex structure (Fig. 5E). To improve accuracy, docking was performed based on the 3D model of Arc mRNA Domain 1, using four distinct RNA fragments and one protein molecule (Supplementary Fig. S10). Each RNA fragment was designed to harbor a single defined protein-binding site. The active residues involved in the interaction with ARC were proposed based on findings from SHAPE-MaP and analysis of the protein’s surface electrostatic potential. In each case, ~200 structures were obtained, categorized into a few or several clusters, with one leading in scoring (Supplementary Table S6). Analysis of the top-ranked docking clusters indicates that, in its optimal conformation, ARC binds RNA via patches I and IV, both of which are exposed on the same side of the protein (Fig. 5E). These findings complement our conclusions from MST assays, indicating that RNA binding can occur across both the MA and non-MA regions.

ARC as a nucleic acid chaperone

Gag proteins from retroviruses and retrotransposons exhibit NAC activity, which is necessary for facilitating essential RNA–RNA interactions during replication [63]. Here, we performed the first comprehensive biochemical characterization of the ARC’s NAC activity employing standardized in vitro NAC assays with short DNA and RNA oligonucleotides [25, 61, 64–66]. Since NAC activity is generally sequence independent, standard DNA/RNA substrate models are often used to compare various NAC proteins. The applied approach enabled us to test the ability of ARC to promote nucleic acid strand annealing and strand exchange in pre-formed duplexes. As a control, we used NCp9, a 57 amino acid NAC protein representing the nucleocapsid (NC) domain of the Gag protein of the yeast Ty3 retrotransposon [67–69].

To assess ARC’s NAC activity in strand annealing, we employed a gel-based assay involving Cy5-labeled 56 nt oligonucleotides of the HIV-1 TAR sequence: TAR RNA or TAR (+)DNA and complementary TAR (−)DNA [25, 61, 64, 66]. The TAR oligonucleotides form a stable hairpin, and the mechanism of their annealing was studied in detail [70]. Spontaneous hybridization of complementary TAR strands in vitro is extremely slow, but is strongly accelerated by HIV-1 Gag or other NAC proteins [71]. We found that ARC displays high efficiency (up to ∼90%) in accelerating the formation of DNA/DNA and DNA/RNA TAR duplexes, comparable with NCp9 (Fig. 6A, B).

Nucleic acid chaperone activity of ARC compared with NCp9 Gag3 protein. The graphs present averaged data from at least three independent annealing experiments for each protein. The error bars represent SDs. The annealing assays of TAR(−) DNA/TAR(+) DNA (A) or TAR(−) DNA/TAR RNA (B) performed in the presence of increasing protein concentration (0, 15.625, 31.25, 62.5, 125, 250, and 500 nM). Lanes denoted as “C” are protein-free control samples, and the next lanes contain increasing amounts of NCp9 or ARC. The strand displacement assays for DNA (C) and RNA (D) performed in the presence of increasing protein concentrations (0, 0.625, 1.25, 2.5, and 5 µM). Lanes denoted as “no D− comp” are control samples without adding unlabeled D−, and the next lanes contain increasing amounts of NCp9 or ARC. The time-course strand displacement assays for DNA (E) and RNA (F) substrates, performed with 5 µM protein and without protein. Representative polyacrylamide gels are presented on the right.

Next, we explored the NAC activity of ARC by testing its ability to facilitate strand exchange in pre-formed nucleic acid duplexes (strand displacement assays) in both the DNA/DNA and RNA/RNA systems (Fig. 6C–F). We used complementary 21 nt DNA (D+ and D−) or RNA (R+ and R−) oligonucleotides labeled with Cy3 or Cy5 to form the initial duplex. Strand exchange was initiated by adding an unlabeled competing strand (D− comp or R − comp) and protein. In contrast to the RNA duplex, the strand exchange in the DNA duplex can occur spontaneously, even in the absence of NAC protein (up to 55%). We found that both ARC and NCp9 accelerated strand displacement in DNA and RNA duplexes. Notably, lower ARC concentrations were needed for efficient exchange (Fig. 6C, D). In the time-course experiments, we found that kinetically, ARC acts comparably with NCp9 in DNA-based assays (0.0147 versus 0.0162 pmol min^−1^) but slightly more slowly in experiments with RNA (0.0079 versus 0.0102 pmol min^−1^) (Fig. 6E, F). Taken together, these data indicate that ARC displays robust NAC activity.

Additionally, we tested whether ARC, as a Gag-derived protein, can mimic the function of retroelement Gag. To that end, we used established in vitro assays that had previously validated NCp9 activity, including Ty3 genomic RNA (gRNA) dimerization and the annealing of initiator tRNA^Met^ (tRNA_i_^Met^), a primer for reverse transcription, to the 3′ end of Ty3 gRNA (3′ Ty3 RNA) [63, 69]. In contrast to NCp9, which induced a transition of ∼40% of 5′ Ty3 RNA into dimeric form [69], we detected a maximum dimerization efficiency of just 5% in the presence of ARC (Fig. 7A). In assays mimicking primer tRNA annealing, tRNA_i_^Met^ binding fluctuated from 20% in the absence of protein to 25% with ARC, indicating its negligible activity in this context (Fig. 7B). Although these retroelement-specific RNA–RNA interactions are promoted by Gag/NC NAC activity, their effective execution additionally depends on the recognition of cognate cis-acting elements within gRNA [63, 72]. ARC produced only a marginal increase in reaction yield consistent with inefficient recognition of these Ty3-specific cis-acting RNA elements.

Protein-induced Ty3 5′–5′ dimerization and Ty3 3′ RNA–tRNAiMet annealing assays. The graphs present the percentages of dimerized Ty3 RNA (A) and bound tRNAiMet (B) at increasing ARC or NCp9 concentrations. Representative agarose gels are presented below. Lanes denoted as “C” are protein-free control samples, and the next lanes contain increasing amounts of ARC protein. The data for NCp9 were previously presented in Andrzejewska-Romanowska et al. [69].

Discussion

In this work, we gained a multidimensional view of ARC–RNA interactions by integrating quantitative binding measurements, structural probing, computational analyses, and NAC activity assays. We provide a detailed characterization of the interaction between mammalian ARC and its encoding mRNA, thereby addressing a significant knowledge gap about this important protein–RNA complex. The data presented herein can contribute to elucidating the mechanism by which ARC selects its own RNA from the extensive array of cellular RNAs.

The ARC protein is suggested to bind its own mRNA with low specificity since ARC capsids contain Arc mRNA and other cellular RNAs [9]. However, diverse cellular RNAs are also encapsulated with VLPs of retrotransposons and virions of retroviruses, and the amount of non-homologous RNAs is usually higher than that of specifically packaged genomic RNA [73–77]. Thus, the presence of RNAs other than Arc mRNA in virus-like capsids does not exclude the specific ARC–Arc mRNA interactions. To identify ARC’s intrinsic RNA binding preferences independent of a cellular RNP context, we examined ARC binding under controlled, cell-free conditions by employing recombinant ARC and transcripts encompassing various regions of Arc mRNA, along with non-cognate RNA. We show that ARC binds Arc transcripts and heterologous RNA with high but similar affinities. However, binding to heterologous RNA is much more sensitive to competing RNA, indicating ARC’s preference for its own mRNA. Our results highlight that the specificity of ARC–RNA interactions is driven by 5′-proximal sequences of the evolutionarily conserved coding region of the Arc mRNA. PEG10, another mammalian Gag-derived protein, also specifically binds to highly conserved segments in its own mRNA, but they are located in the UTRs [78]. The UTRs of the Arc mRNA are less evolutionarily conserved than the CDS and, in addition, Arc pre-mRNA from vertebrate lineages contains two conserved introns in the 3′ UTR [79]. Any processing heterogeneity could increase sequence variability and potentially affect local RNA structure, which is relevant given the structure dependence of ARC binding. Collectively, these observations may reflect a broader principle whereby Gag-like RBPs target and bind highly conserved sequences to maintain cross-species functionality.

While the CDS seems more relevant to ARC binding specificity, we found two additional contact sites in the 5′ UTR of Arc mRNA. In contrast to the overall Arc mRNA sequence composition, all identified ARC-binding sites are enriched for guanines and share a short, GC-rich motif, typically initiating with single-stranded nucleotides. Through integrating ΔSHAPE results with 2D and 3D modeling of Arc mRNA and its fragments, we showed that ARC-responsive nucleotides cluster around a compact structural scaffold formed by stable, solvent-exposed helices. Although we did not identify a single characteristic RNA structural motif bound by ARC, our data highlight that local RNA structure around binding sites strongly influences their recognition by ARC. Moreover, SHAPE-based RNA structure predictions indicate that the 5′ UTR provides an optimal structural context for the localized Site 3 within the CDS, which appears to be essential for specific ARC–RNA interactions. Another noteworthy observation is ARC’s capability to alternate between binding sites, albeit without preserving binding specificity. Given the presence of an internal ribosomal entry site (IRES) in the 5′ UTR of Arc mRNA [80], allowing cap-independent translation, the location of stable RNA motifs close to the start codon is particularly intriguing. The ARC binding guided by high RNA structure near the translation initiation site might act as a steric barrier that facilitates precise ribosome positioning. The in vivo structure of Arc mRNA, as well as the sequence and structural determinants of ARC binding, remain unknown. Nevertheless, the high probability of base pairings identified in vitro within Domain 1, combined with the high guanine content of ARC-binding sites, implies that the cellular environment may not substantially alter the structure of this Arc mRNA region. Nevertheless, this hypothesis necessitates validation through in vivo studies.

In contrast to many retroviral Gag proteins and dARC1, mammalian ARC lacks a canonical NC domain, typically responsible for specific RNA binding. Computational analyses have instead implicated the basic MA-like domain as a potential RNA-binding region in mammalian ARC [8]. We show that deletion of the MA-like domain decreases the RNA binding affinity. Nevertheless, the ARCΔMA retains preferential binding to Arc mRNA, indicating that amino acid residues outside the MA-like domain are also involved in the specific interactions with Arc mRNA. We further corroborate these findings through docking models that predict RNA contacts involving amino acid residues from both the MA-like and CA-like domains. The two basic RNA-binding patches we identified coincide with annotated functional modules: patch I interfaces with endocytic partners and overlaps the nuclear-retention segment, whereas patch IV lies in the C-terminal capsid-like domain that contributes to intermolecular ARC–ARC contacts and encompasses a nuclear localization signal [18, 21, 81, 82]. This suggests that RNA binding constitutes one of several activities executed by these regions, with effects that are likely to be context dependent.

RNA has been shown to facilitate the formation of larger ARC assemblies, reinforcing the idea that interactions between ARC and RNA can influence the shift between different oligomeric states [21]. Our predictions suggest that ARC binding to RNA through the MA and CA domains exposes an α-helix containing a previously defined [21] oligomerization site. The absence of the oligomerization element in ARCΔMA may also contribute to the weakened RNA binding by limiting cooperative ARC–ARC interactions, as we observed in MST assays. Notably, our study indicates enhanced binding cooperativity in the presence of Arc mRNA, which supports previous research showing that Arc mRNA serves as a more effective scaffold than non-cognate RNA for the assembly of higher-order structures [21].

We show that rat ARC displays NAC activity in vitro. Standard assays revealed that it exhibits robust RNA destabilization and annealing activities. Consistently, in vitro SHAPE-MaP indicates that ARC-dependent relaxation of stable RNA helices occurs. Although the functional importance of this newly discovered activity for ARC remains to be established, one can speculate that ARC-mediated remodeling may increase local RNA accessibility for interactions with other cellular factors. For example, ARC-induced structural relaxation of stable RNA motifs, which we mapped in the vicinity of AUG, might facilitate RBP binding or affect translation efficiency by modulating ribosome translocation.

Recently, the idea of using the ARC protein to create virus-like capsids for medical purposes has emerged. It has been proposed that ARC capsids could be engineered to deliver therapeutic RNA to neurons across the blood–brain barrier, where RNA could serve as a template for producing specific proteins [83]. Effective packaging of therapeutic RNA into ARC capsids requires its association with a fragment of Arc mRNA that can act as a packaging element. Initial studies used the 5′ UTR of Arc mRNA, following the example of retroviruses [83, 84]. We believe that the results of this study may facilitate the rational design of packaging RNA sequences. Nevertheless, earlier validation of these candidate packaging elements in cells will be critical.

Supplementary Material

gkag207_Supplemental_File

Bibliography84

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Steward O, Wallace CS, Lyford GL et al. Synaptic activation causes the m RNA for the IEG Arc to localize selectively near activated postsynaptic sites on dendrites. Neuron. 1998;21:741–51. 10.1016/S 0896-6273(00)80591-7.9808461 · doi ↗ · pubmed ↗
2Zhang H, Bramham CR. Arc/Arg 3.1 function in long-term synaptic plasticity: emerging mechanisms and unresolved issues. Eur J Neurosci. 2021;54:6696–712. 10.1111/ejn.14958.32888346 · doi ↗ · pubmed ↗
3Chen Y, Wang X, Xiao B et al. Mechanisms and functions of activity-regulated cytoskeleton-associated protein in synaptic plasticity. Mol Neurobiol. 2023;60:5738–54. 10.1007/s 12035-023-03442-4.37338805 · doi ↗ · pubmed ↗
4Wilkerson JR, Albanesi JP, Huber KM. Roles for Arc in metabotropic glutamate receptor-dependent LTD and synapse elimination: implications in health and disease. Semin Cell Dev Biol. 2018;77:51–62. 10.1016/j.semcdb.2017.09.035.28969983 PMC 5862733 · doi ↗ · pubmed ↗
5Epstein I, Finkbeiner S. The Arc of cognition: signaling cascades regulating Arc and implications for cognitive function and disease. Semin Cell Dev Biol. 2018;77:63–72. 10.1016/j.semcdb.2017.09.023.29559111 PMC 5865643 · doi ↗ · pubmed ↗
6Zhang XW, Huck K, Jahne K et al. Activity-regulated cytoskeleton-associated protein/activity-regulated gene 3.1 (Arc/Arg 3.1) enhances dendritic cell vaccination in experimental melanoma. Oncoimmunology. 2021;10:1920739. 10.1080/2162402 X.2021.1920739.34026332 PMC 8128181 · doi ↗ · pubmed ↗
7Ufer F, Vargas P, Engler JB et al. Arc/Arg 3.1 governs inflammatory dendritic cell migration from the skin and thereby controls T cell activation. Sci Immunol. 2016;1:eaaf 8665. 10.1126/sciimmunol.aaf 8665.28783680 · doi ↗ · pubmed ↗
8Campillos M, Doerks T, Shah PK et al. Computational characterization of multiple Gag-like human proteins. Trends Genet. 2006;22:585–9. 10.1016/j.tig.2006.09.006.16979784 · doi ↗ · pubmed ↗