Gene model for the ortholog of lin-28 in Drosophila simulans
Megan E. Lawson, Hannan Saeed, Cassie Tran, Simran Chhina, Jack A. Vincent, Brian Schwartz, Kellie S. Agrimson, Christopher E. Ellison, Chinmay P. Rele, Laura K Reed

TL;DR
This paper presents a gene model for the lin-28 ortholog in Drosophila simulans as part of a study on the evolution of the IIS pathway.
Contribution
The paper provides a new gene model for lin-28 in Drosophila simulans using a standardized annotation protocol.
Findings
A gene model for the lin-28 ortholog was identified in the Drosophila simulans genome.
The study contributes to a dataset for analyzing the evolution of the IIS pathway in Drosophila.
The annotation was performed using the Genomics Education Partnership protocol for undergraduate research.
Abstract
Gene model for the ortholog of lin-28 ( lin-28 ) in the May 2017 (Princeton ASM75419v2/DsimGB2) Genome Assembly (GenBank Accession: GCA_000754195.3 ) of Drosophila simulans . This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1|
"In this GEP CURE protocol students use web-based tools to manually annotate genes in non-model
“The particular gene ortholog described here was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus
“
|
- —National Science Foundation (United States)https://ror.org/021nxhr62
- —National Institutes of Health (United States)https://ror.org/01cwqze88
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFOXO transcription factor regulation · Genetics, Bioinformatics, and Biomedical Research
Description
**: **
We propose a gene model for the D. simulans ortholog of the D. melanogaster * lin-28 * gene. The genomic region of the ortholog corresponds to the uncharacterized protein LOC27209473 (RefSeq accession XP_016030549.1 ) in the May 2017 (Princeton ASM75419v2/DsimGB2) Genome Assembly of D. simulans (GenBank Accession: GCA_000754195.3 ). This model is based on RNA-Seq data from D. simulans ( SRP006203 ; Graveley et al., 2011) and * lin-28 * in *D. melanogaster * using FlyBase release FB2023_02 ( GCA_000001215.4 ; Larkin et al., 2021; Gramates et al., 2022; Jenkins et al., 2022).
lin-28 * ( * lin-28 * ) is a positive regulator of the insulin signaling (Zhu et al., 2011) and JAK-STAT (Sreejith et al., 2019) pathways. * lin-28 * was discovered in Caenorhabditis elegans by mutations that produce heterochronic shifts in cell fate specification (Ambros and Horvitz 1984). Homologs were identified later in other animals, including Drosophila , Xenopus , mouse, and human (Moss and Tang 2003). lin-28 binds to many mRNA molecules to regulate their translation or stability (Balzer and Moss 2007; Cho et al., 2012). In Drosophila , lin-28 binds to the *insulin-like receptor (InR) * mRNA and stimulates the symmetric division of intestinal stem cells in response to nutrients (Chen et al., 2015; Luhur and Sokol 2015). In mammals, LIN28, in combination with OCT4, SOX2, and NANOG, can reprogram differentiated somatic cells to pluripotency (Yu et al., 2007).
** Synteny **
The referece gene, * lin-28 , * occurs on chromosome 3L in *D. melanogaster * and is flanked upstream by * Blimp-1 * and
- Esa1-associated factor 6* (
Eaf6 * ) and downstream by *Separase * ( * Sse * ) and * CG46320 . * The tblastn search of D. melanogaster lin-28-PA (query) against the D. simulans (GenBank Accession: GCA_000754195.3 ) Genome Assembly (database) placed the putative ortholog of * lin-28 * within scaffold CM002912 (CM002912.1) at locus LOC27209473 ( XP_016030549.1 )— with an E-value of 1e-42 and a percent identity of 97.67%. Furthermore, the putative ortholog is flanked upstream by LOC27208506 ( XP_016030546.1 ) and LOC6736898 ( XP_002083751.1 ), which correspond to * Blimp-1 * and * Eaf6 * in *D. melanogaster * (E-value: 0.0 and 2e-162; identity: 98.68% and 98.22%, respectively, as determined by blastp ; Figure 1A, Altschul et al., 1990). The putative ortholog of * lin-28 * is flanked downstream by LOC6736901 ( XP_002083754.1 ) and LOC6736902 ( XP_016030550.1 ), which correspond to * Sse * and * CG46320 * in D. melanogaster (E-value: 0.0 and 2e-34; identity: 95.90% and 100.00%, respectively, as determined by blastp ). The ortholog assignment for * lin-28 * in D. simulans is supported by the following evidence: the synteny of the genomic neighborhood is completely conserved across both species, and all *BLAST * results used to determine orthology indicate very high-quality matches.
** Protein Model **
lin-28 * in
- D. simulans * has one protein-coding isoform, lin-28-PA ( Figure 1B ). mRNA isoform lin-28-RA contains five CDSs. Relative to the ortholog in D. melanogaster , the CDS number is conserved, as
lin-28 * in *D. melanogaster * also has only one RNA isoform with five CDSs. The sequence of lin-28-PA in
- D. simulans* has 95.90% identity (E-value: 1e-139) with the protein-coding isoform lin-28-PA in D. melanogaster , as determined by
- blastp * ( Figure 1C ). Coordinates of this curated gene model are stored by NCBI at GenBank/BankIt (accession ** BK064527 ** ). These data are also archived in the CaltechDATA repository (see “Extended Data” section below).
Methods
Detailed methods including algorithms, database versions, and citations for the complete annotation process can be found in Rele et al. (2023). Briefly, students use the GEP instance of the UCSC Genome Browser v.435 ( https://gander.wustl.edu ; Kent WJ et al., 2002; Navarro Gonzalez et al., 2021) to examine the genomic neighborhood of their reference IIS gene in the D. melanogaster genome assembly (Aug. 2014; BDGP Release 6 + ISO1 MT/dm6). Students then retrieve the protein sequence for the D. melanogaster reference gene for a given isoform and run it using tblastn against their target *Drosophila * species genome assembly on the NCBI BLAST server ( https://blast.ncbi.nlm.nih.gov/Blast.cgi ; Altschul et al., 1990) to identify potential orthologs. To validate the potential ortholog, students compare the local genomic neighborhood of their potential ortholog with the genomic neighborhood of their reference gene in D. melanogaster . This local synteny analysis includes at minimum the two upstream and downstream genes relative to their putative ortholog. They also explore other sets of genomic evidence using multiple alignment tracks in the Genome Browser, including BLAT alignments of RefSeq Genes, Spaln alignment of
- D. melanogaster* proteins, multiple gene prediction tracks (e.g., GeMoMa, Geneid, Augustus), and modENCODE RNA-Seq from the target species. Detailed explanation of how these lines of genomic evidenced are leveraged by students in gene model development are described in Rele et al. (2023). Genomic structure information (e.g., CDSs, intron-exon number and boundaries, number of isoforms) for the D. melanogaster reference gene is retrieved through the Gene Record Finder ( https://gander.wustl.edu/~wilson/dmelgenerecord/index.html ; Rele et al *., * 2023). Approximate splice sites within the target gene are determined using tblastn using the CDSs from the D. melanogaste r reference gene. Coordinates of CDSs are then refined by examining aligned modENCODE RNA-Seq data, and by applying paradigms of molecular biology such as identifying canonical splice site sequences and ensuring the maintenance of an open reading frame across hypothesized splice sites. Students then confirm the biological validity of their target gene model using the Gene Model Checker ( https://gander.wustl.edu/~wilson/dmelgenerecord/index.html ; Rele et al., 2023), which compares the structure and translated sequence from their hypothesized target gene model against the *D. melanogaster * reference gene model. At least two independent models for a gene are generated by students under mentorship of their faculty course instructors. Those models are then reconciled by a third independent researcher mentored by the project leaders to produce the final model. Note: comparison of 5' and 3' UTR sequence information is not included in this GEP CURE protocol.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Altschul SF Gish W Miller W Myers EW Lipman DJ 1990105 Basic local alignment search tool.J Mol Biol 21530022-283640341010.1016/S 0022-2836(05)80360-22231712 · doi ↗ · pubmed ↗
- 2Ambros V Horvitz HR 19841026 Heterochronic mutants of the nematode Caenorhabditis elegans.Science 22646730036-807540941610.1126/science.64948916494891 · doi ↗ · pubmed ↗
- 3Balzer E Moss EG 2007430 Localization of the developmental timing regulator Lin 28 to m RNP complexes, P-bodies and stress granules.RNA Biol 411547-6286162510.4161/rna.4.1.436417617744 · doi ↗ · pubmed ↗
- 4Chen CH Luhur A Sokol N 20151015 Lin-28 promotes symmetric stem cell division and drives adaptive growth in the adult Drosophila intestine.Development 142200950-19913478348710.1242/dev.12795126487778 PMC 4631770 · doi ↗ · pubmed ↗
- 5Cho J Chang H Kwon SC Kim B Kim Y Choe J Ha M Kim YK Kim VN 20121025 LIN 28A is a suppressor of ER-associated translation in embryonic stem cells.Cell 15140092-867476577710.1016/j.cell.2012.10.01923102813 · doi ↗ · pubmed ↗
- 6Gramates L Sian Agapite Julie Attrill Helen Calvi Brian R Crosby Madeline A dos Santos Gilberto Goodman Joshua L Goutte-Gattat Damien Jenkins Victoria K Kaufman Thomas Larkin Aoife Matthews Beverley B Millburn Gillian Strelets Victor B Perrimon Norbert Gelbart Susan Russo Agapite Julie Broll Kris Crosby Lynn dos Santos Gil Falls Kathleen Gramates L Sian Jenkins Victoria Longden Ian Matthews Beverley Seme Jolene Tabone Christopher J Zhou Pinglei Zytkovicz Mark Brown Nick Antonazzo Giulia Attrill Helen Garapati Phani Goutte-Gatta · doi ↗ · pubmed ↗
- 7Graveley BR Brooks AN Carlson JW Duff MO Landolin JM Yang L Artieri CG van Baren MJ Boley N Booth BW Brown JB Cherbas L Davis CA Dobin A Li R Lin W Malone JH Mattiuzzo NR Miller D Sturgill D Tuch BB Zaleski C Zhang D Blanchette M Dudoit S Eads B Green RE Hammonds A Jiang L Kapranov P Langton L Perrimon N Sandler JE Wan KH Willingham A Zhang Y Zou Y Andrews J Bickel PJ Brenner SE Brent MR Cherbas P Gingeras TR Hoskins RA Kaufman TC Oliver B Celniker SE 20101222 The developmental transcriptome of Drosophila melanogaster.Nature 47173390028-083647 · doi ↗ · pubmed ↗
- 8Grewal SS 20081018 Insulin/TOR signaling in growth and homeostasis: a view from the fly world.Int J Biochem Cell Biol 4151357-27251006101010.1016/j.biocel.2008.10.01018992839 · doi ↗ · pubmed ↗
