Gene model for the ortholog of Roc1a in Drosophila persimilis
Megan E. Lawson, Kelsey Gammage, Calvin Dexel, Betty Duan, Elena M. Zerkin, Lindsey J. Long, Melinda A. Yang, Chinmay P. Rele, Laura K Reed

TL;DR
This paper presents a gene model for the Roc1a ortholog in Drosophila persimilis as part of a study on the evolution of the IIS pathway.
Contribution
Provides a new gene model for Roc1a in Drosophila persimilis using a standardized annotation protocol.
Findings
Identified the ortholog of Roc1a in Drosophila persimilis genome assembly.
Contributed to a dataset for studying IIS pathway evolution in Drosophila.
Abstract
Gene model for the ortholog of Regulator of cullins 1a ( Roc1a ) in the May 2011 (Broad dper_caf1/DperCAF1) Genome Assembly (GenBank Accession: GCA_000005195.1 ) of Drosophila persimilis . This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1|
"In this GEP CURE protocol students use web-based tools to manually annotate genes in non-model
“The particular gene ortholog described here was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus
“
“
|
- —National Institutes of Health (United States)https://ror.org/01cwqze88
- —National Science Foundation (United States)https://ror.org/021nxhr62
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene Regulatory Network Analysis
Description
**: **
We propose a gene model for the D. persimilis ortholog of the D. melanogaster *Regulator of cullins 1a * ( * Roc1a * ) gene. The genomic region of the ortholog corresponds to the uncharacterized protein LOC6600097 (RefSeq accession XP_002025205.2 ) in the May 2011 (Broad dper_caf1/DperCAF1) Genome Assembly of D. persimilis (GenBank Accession: GCA_000005195.1 ; Drosophila 12 Genomes Consortium et al., 2007). This model is based on RNA-Seq data from D. persimilis ( PRJNA388952 ; Yang et al., 2018 *) * and * Roc1a * in *D. melanogaster * using FlyBase release FB2023_02 ( GCA_000001215.4 ; Larkin et al., 2021; Gramates et al., 2022; Jenkins et al., 2022).
** Synteny **
The refrence gene, * Roc1a , * occurs on chromosome X in *D. melanogaster * and is flanked upstream by * CG13367 * and *suppressor of sable * ( * Su(sable) * ) and downstream by Histone methyltransferase 4-20 ( * Hmt4-20 * ) and SKP1-related A * ( SkpA ) * . The tblastn search of D. melanogaster Roc1a-PA (query) against the D. persimilis (GenBank Accession: GCA_000005195.1 ) Genome Assembly (database) placed the putative ortholog of * Roc1a * within scaffold super_26 ( CH479193.1 ) at locus LOC6600097 ( XP_002025205.2 )— with an E-value of 3e-48 and a percent identity of 74.53%. Furthermore, the putative ortholog is flanked upstream by LOC6600098 ( XP_002025206.2 ) and LOC6600094 ( XP_002025207.1 ), which correspond to * CG13367 * and * CG5815 * in *D. melanogaster * (E-value: 3e-129 and 3e-176; identity: 69.63% and 59.41%, respectively, as determined by blastp ) ( Figure 1A, Altschul et al., 1990). The putative ortholog of * Roc1a * is flanked downstream by LOC6600100 ( XP_026846708.1 ) and LOC6600099 ( XP_002025203.1 ), which correspond to * Hmt4-20 * and * SkpA * in D. melanogaster (E-value: 5e-34 and 2e-117; identity: 52.03% and 98.77%, respectively, as determined by blastp ). The putative ortholog assignment for * Roc1a * in D. persimilis is supported by the following evidence: the synteny of the genomic neighborhood is almost entirely conserved, with only one upstream gene not matching, and all of the *BLAST * search results indicate that these ortholog matches are of very high-quality.
** Protein Model **
Roc1a * in
- D. persimilis * has one unique protein-coding isoform (Roc1a-PA and Roc1a-PD; Figure 1B ), encoded by mRNA isoforms Roc1a-RA and Roc1a-RD that differ in their UTRs, and contain three CDSs. Relative to the ortholog in D. melanogaster , the RNA CDS number is conserved for these isoforms. However, *D. melanogaster * also has a third isoform with two CDSs, Roc1a-RC, that is not present in *D. persimilis * (see: “special characteristics”). The sequence of Roc1a-PA in
- D. persimilis* has 96.30% identity (E-value: 5e-76) with the protein-coding isoform Roc1a-PA in D. melanogaster , as determined by
- blastp * ( Figure 1C ). Coordinates of this curated gene model (Roc1a-PD, Roc1a-PA) are stored by NCBI at GenBank/BankIt (accession ** BK064560 , BK064561 ** , respectively ) . These data are also archived in the CaltechDATA repository (see “Extended Data” section below).
** Special characteristics of the protein model **
mRNA isoform Roc1a-RC in *D. melanogaster * is similar to the other two isoforms, Roc1a-RA and Roc1a-RD . The difference in coding region between these isoforms is that Roc1a-RA and Roc1a-RD have three CDSs whereas Roc1a-RC has two CDSs, with its first CDS (FlyBase ID: 1_2094_0) spanning the length of the first two CDSs of *Roc1a-RA * and Roc1a-RD (FlyBase IDs: 2_2094_0 and 3_2094_0) combined (All CDS IDs based on FlyBase release FB2023_02; GCA_000001215.4 ; Larkin et al.,2021). In D. melanogaster , there are no in-frame stop codons for any of the isoforms, including in the longer first CDS of Roc1a-RC . However, in *D. persimilis, * there is an in-frame stop codon present in the first CDS of the potential Roc1a-PC isoform that prematurely stops translation of this isoform. This has been highlighted in Figure 1D by the white portion of the CDSs of Roc1a-RC indicating the presence of in-frame stop codons. This, in combination with the lack of RNA-seq data and TransDecoder Transcript predictions supporting the existence of Roc1a-PC, suggests that Roc1a-PC is likely absent from *D. persimilis * ( Figure 1D ) *. * This special characteristic of Roc1a-PC is similar to the observations described in *D. eugracilis * Lawson et al. (2025).
Methods
Detailed methods including algorithms, database versions, and citations for the complete annotation process can be found in Rele et al. (2023). Briefly, students use the GEP instance of the UCSC Genome Browser v.435 (https://gander.wustl.edu; Kent WJ et al., 2002; Navarro Gonzalez et al., 2021) to examine the genomic neighborhood of their reference IIS gene in the D. melanogaster genome assembly (Aug. 2014; BDGP Release 6 + ISO1 MT/dm6). Students then retrieve the protein sequence for the D. melanogaster reference gene for a given isoform and run it using tblastn against their target *Drosophila * species genome assembly on the NCBI BLAST server (https://blast.ncbi.nlm.nih.gov/Blast.cgi; Altschul et al., 1990) to identify potential orthologs. To validate the potential ortholog, students compare the local genomic neighborhood of their potential ortholog with the genomic neighborhood of their reference gene in D. melanogaster . This local synteny analysis includes at minimum the two upstream and downstream genes relative to their putative ortholog. They also explore other sets of genomic evidence using multiple alignment tracks in the Genome Browser, including BLAT alignments of RefSeq Genes, Spaln alignment of
- D. melanogaster* proteins, multiple gene prediction tracks (e.g., GeMoMa, Geneid, Augustus), and modENCODE RNA-Seq from the target species. Detailed explanation of how these lines of genomic evidenced are leveraged by students in gene model development are described in Rele et al. (2023). Genomic structure information (e.g., CDSs, intron-exon number and boundaries, number of isoforms) for the D. melanogaster reference gene is retrieved through the Gene Record Finder (https://gander.wustl.edu/~wilson/dmelgenerecord/index.html; Rele et al *., * 2023). Approximate splice sites within the target gene are determined using tblastn using the CDSs from the D. melanogaste r reference gene. Coordinates of CDSs are then refined by examining aligned modENCODE RNA-Seq data, and by applying paradigms of molecular biology such as identifying canonical splice site sequences and ensuring the maintenance of an open reading frame across hypothesized splice sites. Students then confirm the biological validity of their target gene model using the Gene Model Checker (https://gander.wustl.edu/~wilson/dmelgenerecord/index.html; Rele et al., 2023), which compares the structure and translated sequence from their hypothesized target gene model against the *D. melanogaster * reference gene model. At least two independent models for a gene are generated by students under mentorship of their faculty course instructors. Those models are then reconciled by a third independent researcher mentored by the project leaders to produce the final model. Note: comparison of 5' and 3' UTR sequence information is not included in this GEP CURE protocol.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Altschul SF Gish W Miller W Myers EW Lipman DJ 1990105 Basic local alignment search tool.J Mol Biol 21530022-283640341010.1016/S 0022-2836(05)80360-22231712 · doi ↗ · pubmed ↗
- 2Bocca SN Muzzopappa M Silberstein S Wappner P 2001817 Occurrence of a putative SCF ubiquitin ligase complex in Drosophila.Biochem Biophys Res Commun 28620006-291X 35736410.1006/bbrc.2001.539411500045 · doi ↗ · pubmed ↗
- 3BUZZATI-TRAVERSOAA SCOSSIROLIRE 1955 The obscura group of the genus Drosophila.Adv Genet 70065-2660479213258372 · pubmed ↗
- 4Carson HL. 1951. Breeding sites of Drosophila pseudoobscura and Drosophila persimilis in the transition zone of the Sierra Nevada. Evolution 5 : 91-96.
- 5Dobzhansky T, Epling C. 1944. Taxonomy, geographic distribution, and ecology of Drosophila pseudoobscura and its relatives. Publs Carnegie Instn 554 : 1-46.
- 6Drosophila 12 Genomes Consortium. Clark AG Eisen MB Smith DR Bergman CM Oliver B Markow TA Kaufman TC Kellis M Gelbart W Iyer VN Pollard DA Sackton TB Larracuente AM Singh ND Abad JP Abt DN Adryan B Aguade M Akashi H Anderson WW Aquadro CF Ardell DH Arguello R Artieri CG Barbash DA Barker D Barsanti P Batterham P Batzoglou S Begun D Bhutkar A Blanco E Bosak SA Bradley RK Brand AD Brent MR Brooks AN Brown RH Butlin RK Caggese C Calvi BR Bernardo de Carvalho A Caspi A Castrezana S Celniker SE Chang JL Chapple C Chatterji S Chinwalla A Civetta A C · doi ↗ · pubmed ↗
- 7Gramates L Sian Agapite Julie Attrill Helen Calvi Brian R Crosby Madeline A dos Santos Gilberto Goodman Joshua L Goutte-Gattat Damien Jenkins Victoria K Kaufman Thomas Larkin Aoife Matthews Beverley B Millburn Gillian Strelets Victor B Perrimon Norbert Gelbart Susan Russo Agapite Julie Broll Kris Crosby Lynn dos Santos Gil Falls Kathleen Gramates L Sian Jenkins Victoria Longden Ian Matthews Beverley Seme Jolene Tabone Christopher J Zhou Pinglei Zytkovicz Mark Brown Nick Antonazzo Giulia Attrill Helen Garapati Phani Goutte-Gatta · doi ↗ · pubmed ↗
- 8Grewal SS 20081018 Insulin/TOR signaling in growth and homeostasis: a view from the fly world.Int J Biochem Cell Biol 4151357-27251006101010.1016/j.biocel.2008.10.01018992839 · doi ↗ · pubmed ↗
