Gene model for the ortholog of eIF4E1 in Drosophila yakuba
Bailey Lose, Jeremy Girard, Josephine Hayes, Lane Weast, Natalie Minkovsky, Sarah Justice, Jack A. Vincent, James J. Youngblom, Lindsey J. Long, Chinmay P. Rele, Laura K Reed

TL;DR
This paper describes the gene model for eIF4E1 in Drosophila yakuba as part of a study on the evolution of the IIS pathway.
Contribution
Provides a new gene model for eIF4E1 in Drosophila yakuba for evolutionary studies of the IIS pathway.
Findings
The eIF4E1 ortholog was identified in the Dyak_CAF1 genome assembly.
The gene model contributes to a dataset for studying IIS pathway evolution in Drosophila.
Abstract
Gene model for the ortholog of eukaryotic translation initiation factor 4E1 ( eIF4E1 ) in the Dyak_CAF1 Genome Assembly (GenBank Accession: GCA_000005975.1 ) of Drosophila yakuba . This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1|
"In this GEP CURE protocol students use web-based tools to manually annotate genes in non-model
“The particular gene ortholog described here was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus
“
|
- —National Science Foundation (United States)https://ror.org/021nxhr62
- —National Institutes of Health (United States)https://ror.org/01cwqze88
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene Regulatory Network Analysis
Description
**: **
We propose a gene model for the D. yakuba ortholog of the D. melanogaster eukaryotic translation initiation factor 4E1 ( * eIF4E1 * ) gene. The genomic region of the ortholog corresponds to the uncharacterized protein LOC6532576 (RefSeq accession XP_015049441.1 ) in the Dyak_CAF1 Genome Assembly of D. yakuba (GenBank Accession: GCA_000005975.1 ; *Drosophila * 12 Genomes Consortium et al., 2007). This model is based on RNA-Seq data from D. yakuba ( SRP006203 ; Graveley et al., 2011) and * eIF4E1 * in *D. melanogaster * using FlyBase release FB2022_04 ( GCA_000001215.4 ; Larkin et al., 2021; Gramates et al., 2022; Jenkins et al., 2022).
eukaryotic translation initiation factor 4E1 ( * eIF4E1 * ) encodes eIF4F cap-binding complex essential cap-dependent translation of mRNA, and binds the 7-methyl-guanosine cap structure of mRNA in *Drosophila * (Lachance et al., 2002; Lavoie et al., 1996) . The protein product of eIF4E-3 , a paralog of * eIF4E1 * , is specifically required during spermatogenesis in Drosophila (Hernendez et al., 2012).
** Synteny **
The reference gene, * eIF4E1 * , occurs on chromosome 3L in *D. melanogaster * and is flanked upstream by * CG4022 * and *Cuticular protein 67B * ( * Cpr67b * ) and downstream by * CG4080 * and *Heat shock protein 27 * ( * Hsp27 * ). The tblastn search of D. melanogaster eIF4E1-PB (query) against the D. yakuba (GenBank Accession: GCA_000005975.1 ) Genome Assembly (database) placed the putative ortholog of * eIF4E1 * within scaffold chromosome 3L (CM000159.2) at locus LOC6532576 ( XP_015049441.1 ) with an E-value of 1e-77 and a percent identity of 65.56%. Furthermore, the putative ortholog is flanked upstream by LOC6532574 ( XP_015049438.1 ) and LOC6532575 ( XP_002093319.1 ) which correspond to * CG4022 * and * Cpr67b * in D. melanogaster (E-value: 0.0 and 7e-170; identity: 90.08% and 98.46%, respectively, as determined by blastp ; Figure 1A, A ltschul et al., 1990). The putative ortholog * eIF4E1 * is flanked downstream by LOC6532577 ( XP_015049442.1 ) and LOC6532578 ( XP_002093322.1 ) which correspond to * CG4080 * and * Hsp27 * in D. melanogaster (E-value: 0.0 and 2e-132; identity: 96.63% and 89.72%, respectively, as determined by blastp ). The putative ortholog assignment for * eIF4E1 * in D. yakuba is supported by the following evidence: The genes surrounding the * eIF4E1 * ortholog are orthologous to the genes at the same locus in D. melanogaster and local synteny is completely conserved, supported by e-values and percent identities, so we conclude that LOC6532576 is the correct ortholog of * eIF4E1 * in D. yakuba ( Figure 1A ).
** Protein Model **
eIF4E1 * in
- D. yakuba * has two unique protein-coding isoforms eIF4E1-PB (identical to eIF4E1-PA, eIF4E1-PD, eIF4E1-PE, eIF4E1-PF, eIF4E1-PG, eIF4E1-PH, eIF4E1-PI) and eIF4E1-PC ( Figure 1B ). mRNA isoforms eIF4E1-RB ( eIF4E1-RA ,
- eIF4E1-RD* , eIF4E1-RE , eIF4E1-RF , eIF4E1-RG , eIF4E1-RH , eIF4E1-RI ) and eIF4E1-RC contain five CDSs. Relative to the ortholog in D. melanogaster , the RNA CDS number and protein isoform count is conserved. The sequence of eIF4E1-PB in
- D. yakuba* has 93.49% identity (E-value: 1e-142) with the protein-coding isoform eIF4E1-PB in D. melanogaster , as determined by
- blastp * ( Figure 1C ). Minor gaps in the dot plots of eIF4E1-PB ( Figure 1C ) and eIF4E1-PC ( Figure 1D ) represent a region of lower sequence similarity, highlighted by red circles, including a short indel of two amino acids in the second exon of both isoforms. Coordinates of this curated gene model are stored by NCBI at GenBank/BankIt (accession ** BK059542 , BK059543 , BK059544 , BK059545 , BK059546 , BK059547 , BK059548 , BK059549 ** and ** BK059550 ) ** . These data are also archived in the CaltechDATA repository (see “Extended Data” section below).
Methods
Detailed methods including algorithms, database versions, and citations for the complete annotation process can be found in Rele et al. (2023). Briefly, students use the GEP instance of the UCSC Genome Browser v.435 ( https://gander.wustl.edu ; Kent WJ et al., 2002; Navarro Gonzalez et al., 2021) to examine the genomic neighborhood of their reference IIS gene in the D. melanogaster genome assembly (Aug. 2014; BDGP Release 6 + ISO1 MT/dm6). Students then retrieve the protein sequence for the D. melanogaster reference gene for a given isoform and run it using tblastn against their target *Drosophila * species genome assembly on the NCBI BLAST server ( https://blast.ncbi.nlm.nih.gov/Blast.cgi ; Altschul et al., 1990) to identify potential orthologs. To validate the potential ortholog, students compare the local genomic neighborhood of their potential ortholog with the genomic neighborhood of their reference gene in D. melanogaster . This local synteny analysis includes at minimum the two upstream and downstream genes relative to their putative ortholog. They also explore other sets of genomic evidence using multiple alignment tracks in the Genome Browser, including BLAT alignments of
- RefSeq * Genes, Spaln alignment of
- D. melanogaster* proteins, multiple gene prediction tracks (e.g., GeMoMa, Geneid, Augustus ), and modENCODE RNA-Seq from the target species. Detailed explanation of how these lines of genomic evidenced are leveraged by students in gene model development are described in Rele et al. (2023). Genomic structure information (e.g., CDSs, intron-exon number and boundaries, number of isoforms) for the D. melanogaster reference gene is retrieved through the Gene Record Finder ( https://gander.wustl.edu/~wilson/dmelgenerecord/index.html ; Rele et al *., * 2023). Approximate splice sites within the target gene are determined using tblastn using the CDSs from the D. melanogaste r reference gene. Coordinates of CDSs are then refined by examining aligned modENCODE RNA-Seq data, and by applying paradigms of molecular biology such as identifying canonical splice site sequences and ensuring the maintenance of an open reading frame across hypothesized splice sites. Students then confirm the biological validity of their target gene model using the Gene Model Checker ( https://gander.wustl.edu/~wilson/dmelgenerecord/index.html ; Rele et al., 2023), which compares the structure and translated sequence from their hypothesized target gene model against the *D. melanogaster * reference gene model. At least two independent models for this gene were generated by students under mentorship of their faculty course instructors. These models were then reconciled by a third independent researcher mentored by the project leaders to produce the final model presented here. Note: comparison of 5' and 3' UTR sequence information is not included in this GEP CURE protocol.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Altschul SF Gish W Miller W Myers EW Lipman DJ 1990105 Basic local alignment search tool.J Mol Biol 21530022-283640341010.1016/S 0022-2836(05)80360-22231712 · doi ↗ · pubmed ↗
- 2Bock IR, Wheeler MR. 1972. The Drosophila melanogaster species group. Univ. Texas Publs Stud. Genet. 7(7213): 1.
- 3Drosophila 12 Genomes Consortium. Clark AG Eisen MB Smith DR Bergman CM Oliver B Markow TA Kaufman TC Kellis M Gelbart W Iyer VN Pollard DA Sackton TB Larracuente AM Singh ND Abad JP Abt DN Adryan B Aguade M Akashi H Anderson WW Aquadro CF Ardell DH Arguello R Artieri CG Barbash DA Barker D Barsanti P Batterham P Batzoglou S Begun D Bhutkar A Blanco E Bosak SA Bradley RK Brand AD Brent MR Brooks AN Brown RH Butlin RK Caggese C Calvi BR Bernardo de Carvalho A Caspi A Castrezana S Celniker SE Chang JL Chapple C Chatterji S Chinwalla A Civetta A C · doi ↗ · pubmed ↗
- 4Gramates LS Agapite J Attrill H Calvi BR Crosby MA Dos Santos G Goodman JL Goutte-Gattat D Jenkins VK Kaufman T Larkin A Matthews BB Millburn G Strelets VB the Fly Base Consortium. 202244 Fly Base: a guided tour of highlighted features.Genetics 22040016-673110.1093/genetics/iyac 03535266522 PMC 8982030 · doi ↗ · pubmed ↗
- 5Graveley BR Brooks AN Carlson JW Duff MO Landolin JM Yang L Artieri CG van Baren MJ Boley N Booth BW Brown JB Cherbas L Davis CA Dobin A Li R Lin W Malone JH Mattiuzzo NR Miller D Sturgill D Tuch BB Zaleski C Zhang D Blanchette M Dudoit S Eads B Green RE Hammonds A Jiang L Kapranov P Langton L Perrimon N Sandler JE Wan KH Willingham A Zhang Y Zou Y Andrews J Bickel PJ Brenner SE Brent MR Cherbas P Gingeras TR Hoskins RA Kaufman TC Oliver B Celniker SE 20101222 The developmental transcriptome of Drosophila melanogaster.Nature 47173390028-083647 · doi ↗ · pubmed ↗
- 6Grewal SS 20081018 Insulin/TOR signaling in growth and homeostasis: a view from the fly world.Int J Biochem Cell Biol 4151357-27251006101010.1016/j.biocel.2008.10.01018992839 · doi ↗ · pubmed ↗
- 7Hernández G Han H Gandin V Fabian L Ferreira T Zuberek J Sonenberg N Brill JA Lasko P 2012725 Eukaryotic initiation factor 4E-3 is essential for meiotic chromosome segregation, cytokinesis and male fertility in Drosophila.Development 139170950-19913211322010.1242/dev.07312222833128 PMC 3413165 · doi ↗ · pubmed ↗
- 8Hietakangas V Cohen SM 2009 Regulation of tissue growth through nutrient sensing.Annu Rev Genet 430066-419738941010.1146/annurev-genet-102108-13481519694515 · doi ↗ · pubmed ↗
