Gene and genome duplications have contrasting impacts on biosynthetic and flower developmental pathways in California poppy

Le-Han Rössner; Clemens Rössner; Doudou Kong; Dominik Lotz; Andrea Weisert; Yasuyuki Yamada; Fumihiko Sato; Kevin Davies; Oliver Rupp; Jörg Fuchs; Ethan A Baldwin; John Lovell; Michael R McKain; Kerrie Barry; Tomas Bruna; Jayson Talag; Jerry Jenkins; Rachel Walstead; Jane Grimwood; Jeremy Schmutz; James H Leebens-Mack; Annette Becker

PMC · DOI:10.1093/plcell/koag039·February 20, 2026

Gene and genome duplications have contrasting impacts on biosynthetic and flower developmental pathways in California poppy

Le-Han Rössner, Clemens Rössner, Doudou Kong, Dominik Lotz, Andrea Weisert, Yasuyuki Yamada, Fumihiko Sato, Kevin Davies, Oliver Rupp, Jörg Fuchs, Ethan A Baldwin, John Lovell, Michael R McKain, Kerrie Barry, Tomas Bruna, Jayson Talag, Jerry Jenkins, Rachel Walstead

PDF

Open Access

TL;DR

The California poppy's genome reveals how gene duplication affects biosynthesis and flower development differently.

Contribution

A haplotype-resolved genome assembly and expression atlas for California poppy, revealing contrasting duplication impacts on biosynthetic and developmental genes.

Findings

01

BIA biosynthesis genes diversified through localized duplications, with expression similarity linked to phylogenetic relatedness.

02

Carotenoid biosynthesis genes lack phylogenetic clustering, while floral regulators retained duplicates from ancient polyploidy.

03

California poppy's genomic resources provide a valuable model for comparative studies in flowering plants.

Abstract

Benzylisoquinoline alkaloids (BIAs) represent a vast group of specialized plant metabolites with diverse pharmaceutical applications, synthesized by a variety of gene families. Among the multiple plant lineages that produce BIAs, the most notable is the poppy family (Papaveraceae), with California poppy (Eschscholzia californica) emerging as a model organism. Here, we report a haplotype-resolved genome assembly, in combination with a high-density expression atlas, for California poppy. Genome analyses reveal recent diversification of BIA biosynthesis genes in poppy through localized duplications. Furthermore, we demonstrate that the degree of phylogenetic relatedness among paralogs within BIA biosynthesis-associated gene families correlates with similarities in gene expression. In contrast, gene families involved in carotenoid biosynthesis, which contributes to the intense orange petal…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species2

Eschscholzia californica Papaveraceae

Chemicals2

carotenoid

BIA

Figures6

Click any figure to enlarge with its caption.

Overview of the phylogenic position of California poppy and its major alkaloid biosynthesis pathways, geographic distribution, and gene duplication history. (a) Simplified phylogeny of Ranunculales highlighting species for which the whole genome sequences are available. Eudicot Benchmarking Universal Single-Copy Orthologs (BUSCO) values on the right indicate the level of genome assembly completeness. (b) BIAs and their localization in California poppy plants. The dotted lines of light gray color show derived BIAs found in Papaver somniferum (opium poppy). (c) California poppy's native (western North America) and non-native (indicated with arrows) ranges. California poppy can be a problematic invasive outside its native range. (d) Ks plot for California poppy paralog pairs and putative ortholog pairs for California poppy and related species within the order Ranunculales. Numbers on nodes on the tree represent increasing time since last common ancestry between orthologs of genes sampled from California poppy and the other ranunculid species. A simplified phylogeny indicating WGDs in the Papaveraceae and in the lineage leading to Ranunculales.

Genome comparison among Ranunculales. (a) Synteny plot of Ranunculaceae and Papaveraceae showing syntenic blocks and divergence times between taxa. (b) Proportion of repetitive sequence elements (including TEs) of 5 Ranunculales genomes.

Expansion of BIA pathway elements in California poppy. (a) Simplified ML phylogenies of pathway gene families (clades including known biosynthesis genes are named accordingly, trees not drawn to scale). (b) Gene copy numbers per gene family plotted on BIA biosynthesis pathway (BIA synthesis pathway is reviewed in Hori et al. 2018), trees not drawn to scale. (c) Chromosomal position of gene copies for BIA, Carotenoid and MADS TFs (indicated by colored triangles, gray bars show gene density in 100 kb windows). (d) Duplications per evolutionary branch and average copy number per pathway component for each species.

Characterization of BIA gene family expression reveals clustered genes with similar sequence share expression pattern. Heatmaps show expression dynamics as Z-Score of log2 (TPM+1) of different tissues, the right side of the heatmap shows tissue categories. Gene names are color-coded based on their role in the BIA pathway: core pathway (blue), aerial pathway (green), root/cell culture pathway (brown). Comparative expression analysis of the most closely related homolog in Arabidopsis (At) and California poppy (Ec) for the OMT (a) and STS/CAS/CHS (b) gene families. The OMT gene family exhibits distinct tissue-specific expression, with higher expression in roots, leaves, late buds, petal/stamens and fruits (8 to 10 to 14 days after pollination). The STS/CAS/CHS gene family members show a less distinct expression pattern. (c) Correlation of sequence similarity, expression pattern and cluster membership of the CYP82P3 (DB10H) gene family. Phylogeny tree (bottom, not drawn to scale) showing gene clades A-F. Clades containing previously published BIA genes are highlighted in purple. Genes located in clusters on chromosomes 2 and 4 (Fig. 3c) are shaded. Star symbol denote clades containing genes with similar expression patterns, sequences and chromosomal positions. Abbreviations for Figs. 4 and 5: At, Arabidopsis thaliana; Ec, Eschscholzia californica, R, root; SA, shoot apex; CL, cauline leaves; YL, young leaves; RL, rosette leaves; OL, old leaves; S, stem; FL10, floral stage 10; B1, bud stage 1; FL11, floral stage 11; B2, bud stage 2; FL12, floral stage 12; B3, bud stage 3; FL13, floral stage 13; B4, bud stage 4; SE, sepal; PE, petal; ST, stamen; CA, carpel; GYN, gynoecium; SI2, silique stage 2; F3, fruit 3 days after pollination (DAP); SI3, silique stage 3; F8, fruit 8 DAP; SI4, silique stage 4; F10-14, fruit 10-14 DAP; SI5, silique stage 5; F18, fruit 18 DAP.

Tissue-specific expression of carotenoid biosynthesis orthologs and floral homeotic MADS-box orthologs in Arabidopsis and California poppy. (a) Carotenoid metabolism pathway showing enzymes and products according to their color. (b) Comparative heatmaps show expression (Z-Score of Log2(TPM + 1)) of carotenoid biosynthesis genes. Gene names assigned to heatmap is labelled with black lines. Minimum and maximum log2TPM + 1 values are placed below the lines. Left heatmap panel shows expression in Arabidopsis, right panel for California poppy. Gene identifiers are shown below. Genes responsible for petal pigmentation, such as bOHASEs are upregulated in petals in California poppy. (c) Simplified ABCDE model according to Theißen et al., 2016 indicating functionally characterized California poppy genes. Hypothesized protein complexes of homeotic gene functions are shown above. (d) Comparative heatmap as in (b) of floral homeotic MADS box orthologs showing conserved expression patterns.

Schematic representation of evolutionary trajectories of BIA, carotenoid and floral homeotic MADS box genes, summarizing their gene duplication history and divergence in expression patterns as indicated by gray, yellow, and blue.

Tables1

Table 1. Summary of statistics for the annotations of HAP1 and HAP2.

Statistic	HAP1	HAP2
Number of protein-coding genes	29,637	29,317
Number of protein-coding transcripts	55,068	54,559
Median exon length (bp)	164	164
Median intron length (bp)	184	185
Average number of exons per gene	5.3	5.3
embryophyta_odb10 BUSCO completeness	99.6%	99.3%
eudicots_odb10 BUSCO completeness	98.1%	97.8%

Funding5

—U.S. Department of Energy Joint Genome Institute
—Office of Science of the U.S. Department of Energy10.13039/100006132
—German Network for Bioinformatics Infrastructure10.13039/501100018929
—ELIXIR-DE
—German Research Foundation10.13039/501100001659

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBerberine and alkaloids research · Plant Gene Expression Analysis · Plant tissue culture and regeneration

Full text

Introduction

California poppy (Eschscholzia californica, Fig. 1) is widely recognized for its vivid golden-orange blossoms that blanket vast landscapes, and it is one of the few species whose flowers can be viewed from outer space (NASA Earth Observatory 2023). The species is native to western North America, from the Pacific Coast to the Great Basin, extending north to Oregon and south to Baja California in Mexico, and occurring from sea level to 2,000 meters above sea level (Cook 1962; Véliz et al. 2012). Due to its ecological resilience and adaptability, introduced California poppy populations have become naturalized and invasive across large areas of South America, Australia, and New Zealand in Mediterranean-type climates (Leger and Rice 2007). Its resilience to stressful environments and high fecundity also make California poppy a popular ornamental planted in pollinator gardens worldwide.

Overview of the phylogenic position of California poppy and its major alkaloid biosynthesis pathways, geographic distribution, and gene duplication history. (a) Simplified phylogeny of Ranunculales highlighting species for which the whole genome sequences are available. Eudicot Benchmarking Universal Single-Copy Orthologs (BUSCO) values on the right indicate the level of genome assembly completeness. (b) BIAs and their localization in California poppy plants. The dotted lines of light gray color show derived BIAs found in Papaver somniferum (opium poppy). (c) California poppy's native (western North America) and non-native (indicated with arrows) ranges. California poppy can be a problematic invasive outside its native range. (d) Ks plot for California poppy paralog pairs and putative ortholog pairs for California poppy and related species within the order Ranunculales. Numbers on nodes on the tree represent increasing time since last common ancestry between orthologs of genes sampled from California poppy and the other ranunculid species. A simplified phylogeny indicating WGDs in the Papaveraceae and in the lineage leading to Ranunculales.

Historically, California poppy holds cultural and medicinal significance among Indigenous communities in California, largely due to its content of pharmaceutically active benzylisoquinoline alkaloids (BIAs, Fig. 1b) (Anderson 2007). Herbal supplements from California poppy's above ground organs were shown to have analgesic, anxiolytic, and tranquilizing properties, with californidine, eschscholtzine, and isoboldine being the main BIAs (Rolland et al. 1991), supporting the traditional use to induce sleep. However, the roots mainly accumulate cytotoxic sanguinarine that intercalates into nucleic acids and inhibits choline acetyltransferase activity in vertebrates and insects, deterring microbes and herbivores (Laines-Hidalgo et al. 2022). California poppy has been used extensively to study the regulation of biosynthesis and the physiological effects of BIAs (Yamada and Sato 2021). However, even though the biosynthetic pathway of BIAs including the enzymes involved in root and cell culture have been characterized (reviewed in Hori et al. 2018; Becker et al. 2023), little is known about the molecular evolution of the BIA biosynthetic pathway.

California poppy has a rather simple floral morphology compared with other species within Ranunculales. Flowers consist of a single fused sepal cap, 4 petals, several whorls of stamens, and a syncarpous gynoecium composed of 2 carpels. The leaves are silvery and intricately dissected (Becker et al. 2005). Interestingly, California poppy is highly self-incompatible, like other Papaveraceae species, such as Papaver rhoeas. The self-incompatibility system of P. rhoeas is notable in that rapid pollen tube growth is initiated but arrests with subsequent programmed cell death (Wang et al. 2018). Aside from being a subject of biochemical and ecological research, California poppy has served as a model for non-core eudicots in evolutionary developmental (evodevo) studies (Becker et al. 2005; Ryan and Cleland 2021; Becker et al. 2023), helping to reveal the functions of transcription factors specifying floral organ identities, gynoecium development, and plant stature (Orashakova et al. 2009; Yellina et al. 2010; Pabón-Mora et al. 2012; Lange et al. 2013; Stammler et al. 2013; Fourquin und Ferrándiz 2014; Zhao et al. 2018; Mazuecos-Aguilera et al. 2021; Lotz and Rössner 2024). The bright orange pigmentation of California poppy petals can be attributed to mostly carotenoids (Barrell et al. 2010), and further, elongated petal epidermal cells are adorned with a prism-like ridge focuses light on the carotenoids, contributing to their shiny appearance (Wilts et al. 2018).

California poppy is a member of the order Ranunculales, which holds a pivotal position in the angiosperm phylogeny as the sister lineage to all other extant eudicots (Zuntini et al. 2024). Ranunculales is hyper-diverse in terms of morphology and specialized metabolism (The RanOmics group 2023/2024). Notable ornamentals include bleeding heart (Dicentra spectabilis), larkspur (Staphisagria picta), columbine (Aquilegia ssp.), and opium poppy (Papaver somniferum) and other poppies are important medicinals producing morphine and codeine as well as other BIAs (reviewed in Becker et al. 2023). Within Ranunculales, the 2 largest families, Ranunculaceae and Papaveraceae (Fig. 1a), diverged approximately 115 million years ago (MYA) (Xu et al. 2022). Phylogenomic and comparative genomic analyses have implicated multiple whole genome duplications (WGDs) within Ranunculales. Some studies have inferred WGDs on the lineages leading to California poppy, opium poppy, and an earlier event in a common ancestor of all extant Ranunculales lineages (Cui et al. 2006; Xiang et al. 2024; Zhang et al. 2024). Other studies have implicated separate WGDs in Papaveraceae and Ranunculaceae ancestors (Xu et al. 2022; Tian et al. 2024). Here we test these conflicting hypotheses, determine whether opium poppy and California poppy share the same Papaveraceae WGD, and assess how polyploidy may have contributed diversification of gene function (Fig. 1c and d).

We use our haplotype-resolved, chromosomal genome assembly and a comprehensive gene expression atlas for California poppy to elucidate how gene and genome duplications have contributed to diversification of BIAs, the origin of its carotenoid-based bright orange flower pigmentation, and poppy flower development. There are substantial differences in gene family size among pathways involved in BIA metabolism, carotenoid metabolism, and floral organ identity. We investigate the timing of gene family expansions and contractions relative to WGDs and speciation events. We also investigate diversification of gene expression patterns among retained duplicate genes. We find that whereas genes contributing to the same developmental or biosynthetic network show correlated duplication-loss patterns, genes contributing to BIA metabolism, carotenoid metabolism, and floral organ identity exhibit contrasting evolutionary dynamics in gene duplication and expression.

Results and discussion

Genome sequence

The assembly lengths for haplotypes 1 and 2 (HAP1 and HAP2) were 384.3 Mb and 375.4 Mb, respectively (Tables S1 to S6; Fig. S1), and flow cytometric genome size for the sequenced California poppy accession (cultivar Aurantiaca orange) was 392 Mb/1C (Supplementary Information S1). The N50 contig length was 67 Mb and N100 was 46 Mb, indicating that we produced haplotype-resolved chromosomal-level genome assemblies (Fig. S2). Our genomes add to the previously published genomic resources available for California poppy (wild-type draft genome, Nguyen et al. 2025; cultivar Hitoezaki draft genome, Hori et al. 2018). Read coverage and assembly statistics are provided in Tables S1 and S2. Gene annotations for haplotypes 1 and 2 recovered 97.4% and 98.7% of the eudicot BUSCO genes, respectively (Table 1; Tables S3–S6).

The HAP1 and HAP2 assemblies included 56.44% and 55.25% masked repeats, respectively (Fig. S3). Most repeat sequences are evenly distributed over the chromosomes, and their numbers are broadly similar between gene-rich and gene-poor/centromeric regions (Fig. S3a). However, in few instances, peaks of up to 60 repeats per 10 kb can be detected that are unique for each haplotype (Fig. S3). Most of the classified repeats are Ty1/Copia and Ty3/DIRS1 LTR retroelements and most DNA transposons are of the MULE-MuDR type (Fig. S3; Table S7). The highest SNP density was found on chromosome 5 (32.5 SNP/kb), and a total 800,429 SNPs were observed when comparing the HAP 1 and 2 genomes, with an average SNP density of 2.13 bp/kb. Several small segmental duplications, translocations, and inversions were identified in comparisons of HAP1 and HAP2, most of these occurring on chromosome 2, and several small-scale duplications (Fig. S3).

Evolution of genome structure across Ranunculales

To analyze the collinearity of Ranunculales genomes, we compared our California poppy genome assembly with published chromosomal assemblies for Coptis chinensis (Chinese goldthread, Ranunculaceae) (Liu et al. 2021), Aquilegia coerulea (columbine, Ranunculaceae) (Filiault et al. 2018); Papaver somniferum (opium poppy, Papaveraceae) (Guo et al. 2018), and Corydalis tomentella (Papaveraceae) (Xu et al. 2022) (Fig. 2a; Figs. S2 and S4). Genome sizes range from 240 Mb/C in n = 8 chromosomes for C. tomentella to 2.7 Gbp/C for opium poppy (n = 11) (Fig. 2b). Synteny between the 2 Ranunculaceae genomes, columbine and Chinese goldthread, is largely conserved (Fig. 2a). In contrast, the Papaveraceae genomes are highly rearranged. The Papaveroideae (including California poppy and opium poppy) diverged 100 MYA ago, while the Fumarioideae (including C. tomentella) diverged from the Papaveroideae around 120 MYA (Peng et al. 2023). Interestingly, the California poppy genome shows a higher level of synteny with the C. tomentella genome relative to opium poppy, even though California poppy is more closely related to opium poppy. In total, we identified over 10,000 gene families (i.e., phylogenetically hierarchical orthogroups [HOGs]) shared across all Ranunculales genomes (Fig. S4). However, C. tomentella shares more HOGS exclusively with opium poppy (353) than with California poppy (170), suggesting a high rate of orthogroup loss in California poppy. Interestingly, the number of shared HOGs between columbine and Chinese goldthread exclusively is higher (684), indicating lower levels of gene family loss in Ranunculaceae compared with Papaveraceae.

Genome comparison among Ranunculales. (a) Synteny plot of Ranunculaceae and Papaveraceae showing syntenic blocks and divergence times between taxa. (b) Proportion of repetitive sequence elements (including TEs) of 5 Ranunculales genomes.

At the same time, the California poppy genome includes many paralogous gene pairs shared with the other Ranunculales genomes included in our analyses. To test whether there were multiple rounds of WGDs or just a single event, we carried synteny plot, gene tree, and Ks analyses (Figs. 1d, 2a; Supplementary Fig. (Dataset) S5, https://doi.org/10.22029/jlupub-20142). Consistent with the findings of Xiang et al. (2024) and Zhang et al. (2024), our synteny and gene tree analyses implicate a WGD in a common ancestor of all extant Ranunculales lineages and a recent WGD on the terminal branch leading to opium poppy (Fig. 1d; Fig. S5). In addition, we identify a WGD on the terminal branch leading to California poppy with between-homeolog divergence values (Ks) very similar to Ks values for Papaver-Eschscholzia orthologs, suggesting a WGD shortly after divergence of the 2 genera ∼100 MYA (Peng et al. 2023).

Because we found vast size differences among the Ranunculaceae genomes and maintenance of synteny, we hypothesized that transposon activity could account for variation in genome size. Quantification of repeats showed that in the opium poppy and Chinese goldthread, with their larger genomes, 77.8% and 70.2%, respectively, of the genomes contain repeat elements (Fig. 2b). In contrast, columbine, C. tomentella, and California poppy genomes contain only up to 56.4% repeats, suggesting that propagation and retention of repeat sequences can account for some of the observed size differences among Ranunculales genomes. As expected, LTR retrotransposon families Ty1/Copia and Gypsy/DIRS (Fig. S6, Table S7) were found to be the most common repeat types. Interestingly, opium poppy does not exhibit the largest fraction of LTR transposons in relation to its genome size, suggesting that the LTR transposon gain and loss is not the only driver of genome size variation in Ranunculales, but other, possibly Ranunculales-specific transposon classes may play a role.

California poppy genome shows large size differences between gene families involved in BIA metabolism compared with those for flower color and organ identity

Gene models from the California poppy, other Ranunculales, selected core eudicots, Oryza sativa (rice, monocot), and Amborella trichopoda genomes were clustered into gene families (HOGs) to assess variation in gene copy numbers and the timing of gene duplication events within gene families contributing to BIA biosynthesis, carotenoid synthesis, and floral organ identity. California poppy is well known for its diverse set of specialized metabolites, especially BIAs. The BIA biosynthesis pathway is complex (Fig. 3). Whereas the core eudicot genomes encode a very limited set of genes with homologs involved in BIA metabolism, we identified 43 HOGs associated with BIA metabolism in at least 1 of the Ranunculales species, with as many as 50 ranunculid genes in a single HOG (Fig. S7). Only 10 BIA-related HOGs are present in the Arabidopsis genome, 15 in rice, and 12 in the A. trichopoda genome, suggesting the origin of many BIA-related orthogroups within the Ranunculales. We found that gene families with members encoding enzymes at branching points in BIA synthesis pathways, such as the Berberine bridging enzyme (BBE), or at the end points of the pathways (DBOX, OMT, and SR families) have most members (43, 43, 44, and 11, respectively). In contrast, several gene families encoding for proteins involved in the core pathway, such as the Tyrosine decarboxylase (TYDC), Coclaurine N-methyltransferase (CNMT), and N-methylcoclaurine 3′-hydroxylase (NMCH) families, include only 4 to 5 members (Fig. 3b). In contrast, lineage specificity and HOG gene numbers in carotenoid metabolism and MADS floral homeotic genes are very limited (Fig. S7), suggesting lack of lineage-specific expansions. Gene numbers per HOG range between 1 and 4 for carotenoids and 1 and 8 for the floral homeotic MADS-box genes. These data suggest that California poppy BIA biosynthesis depends on many genes within Ranunculales-specific HOGs. In contrast California poppy petal pigment biosynthesis involves orthologs of genes also found in Arabidopsis.

Expansion of BIA pathway elements in California poppy. (a) Simplified ML phylogenies of pathway gene families (clades including known biosynthesis genes are named accordingly, trees not drawn to scale). (b) Gene copy numbers per gene family plotted on BIA biosynthesis pathway (BIA synthesis pathway is reviewed in Hori et al. 2018), trees not drawn to scale. (c) Chromosomal position of gene copies for BIA, Carotenoid and MADS TFs (indicated by colored triangles, gray bars show gene density in 100 kb windows). (d) Duplications per evolutionary branch and average copy number per pathway component for each species.

Gene family phylogenies (Fig. S5, https://doi.org/10.22029/jlupub–20142) were interrogated to quantify the duplications within BIA, carotenoid, and floral development pathway–associated HOGs. The average number of gene family members involved in BIA and carotenoid biosynthesis as well as floral organ identity genes was then plotted onto the species phylogeny (Fig. 3c). Corroborating our interpretation of gene copy number variation across HOGs, placement of duplication events on the species phylogeny reveals many more lineage-specific BIA gene duplications than carotenoid biosynthesis and MADS-box genes (Fig. 3c). The largest fraction of BIA metabolism gene duplications was placed on terminal branches leading to Ranunculaceae and Papaveraceae species. The smaller number of carotenoid and floral homeotic MADS-box gene duplications are distributed across the species phylogeny, with opium poppy exhibiting the largest number of species-specific duplications (Fig. 3c), most likely derived from a lineage-specific WGD around 8 MYA (Guo et al. 2018). Taken together, our data show only a weak tendency to increase in gene numbers across angiosperms for floral homeotic MADS-box genes, hardly any increase in carotenoid biosynthesis gene number, and a massive species-specific diversification of BIA-related genes within Ranunculales.

Because we found high gene numbers in BIA-associated gene families of California poppy, we were interested to see if the BIA genes diversified through tandem duplications resulting in biosynthetic gene clusters (BGC). Large BGCs have been described in several plant species such as Brassicaceae (Liu et al. 2019), rice (Miyamoto et al. 2016), tomato (Itkin et al. 2013; Matsuba et al. 2013), and iconic STORR cluster, essential for morphine and codeine biosynthesis in opium poppy (Guo et al. 2018). These gene clusters are composed of coexpressed genes from diverse gene families operating together in one biological process. However, plantiSMASH results showed that only 8% of all BIA-related genes were found to be associated with BGCs. One candidate metabolic cluster at the start of chromosome 1 included a DBOX/BBE cluster adjacent to MSH-P6H-like and CNMT-like genes (Table S8) with unknown enzymatic function.

To further investigate the nature of BIA-related gene duplications, we plotted their position on the 6 chromosomes of California poppy. We defined a cluster as at least 5 BIA biosynthesis genes located next to each with only single genes interspersed. As expected, a large fraction of genes encoding for BIA biosynthesis enzymes locate in close vicinity on 13 non-BGC clusters throughout the genome. Interestingly, however, in the case of BIA biosynthesis genes, these clusters are composed of members of a single gene family per cluster (Fig. 3d). This pattern is not typical of BGCs that usually include metabolic pathway genes representing multiple gene families. For example, 7 of the 8 Ranunculales-specific STS-CAS-CHS-like genes located in a cluster on chromosome 6, and 7 of the 11 SR-like genes located in a cluster on chromosome 3. Larger gene families, such as the DBOX/BBE- and the OMT-like genes form more clusters: DBOX/BBE-like genes are found in 3 clusters on chromosome 1 and 2, and OMT-like genes on chromosome 3 and 4. Often, these clusters are disrupted by few genes without sequence similarity: for example the STS-CAS-CHS-like gene cluster on chromosome 6 or the SR-like gene cluster on chromosome 3. In contrast, the members of the carotenoid metabolism and the floral homeotic MADS-box genes are not clustered on California poppy chromosomes (Fig. 3d). In general, we show that the BIA biosynthesis genes of California poppy accumulate in large clusters through single-gene tandem duplications.

BIAs genes transcribed in all plant organs

In addition to the genome of California poppy, we also report long-read reference transcriptomes and a high-resolution expression atlas (Table S9). The California poppy expression atlas includes 18 tissues and developmental stages, with at least 3 replicates sequenced with at least 30 million reads (Table S10). These transcriptomes include individual floral organs at anthesis, 4 bud and 5 fruit development stages, roots, stem, and leaf tissue. Transcriptome analyses show specifically expressed genes for all tissues/stages (Figs. S8 and S9). Weighted gene coexpression network analysis based on the expression atlas shows 17 modules that correlate with specific organs and developmental stages (Fig. S10). For example, module 1 shows a strong correlation with the latest fruit development stage, whereas module 5 correlates with the earliest stage. Overall, a relationship with at least 1 coexpression module could be established for most tissues and development stages. This suggests that subfractions of the California poppy transcriptome are coexpressed in specific tissues and stages. We were then interested to shed light on the activity of BIA-related genes because California poppy BIA metabolism has been characterized mainly in roots and cell culture systems, although several BIAs have also been identified in above ground (aerial) tissues (Fig. 4; Fedurco et al. 2015; Purwanto et al. 2017; Hori et al. 2018). To gain insights into the expression of BIA-related genes of California poppy compared with Arabidopsis, we plotted the expression patterns of the California poppy BIA genes together with the Arabidopsis genes most similar in sequence (Fig. 4; Fig. S11). As examples, we show the large OMT gene family with 44 members (Fig. 4a) and the small family of STS-CAS-CHS-like genes with 9 members (Fig. 4b). Only a few OMT members are functionally characterized, and they are part of the core, aerial, and root pathways. OMT genes show very specific expression patterns, with subsets expressed specifically in vegetative above ground organs, others only in roots, and other subsets expressed in specific reproductive organs or flower developmental stages. Interestingly, we find 3 OMT family members expressed only in petals and stamens, 9 members in only later stages of flower development, and 7 members only in fruit development.

Characterization of BIA gene family expression reveals clustered genes with similar sequence share expression pattern. Heatmaps show expression dynamics as Z-Score of log2 (TPM+1) of different tissues, the right side of the heatmap shows tissue categories. Gene names are color-coded based on their role in the BIA pathway: core pathway (blue), aerial pathway (green), root/cell culture pathway (brown). Comparative expression analysis of the most closely related homolog in Arabidopsis (At) and California poppy (Ec) for the OMT (a) and STS/CAS/CHS (b) gene families. The OMT gene family exhibits distinct tissue-specific expression, with higher expression in roots, leaves, late buds, petal/stamens and fruits (8 to 10 to 14 days after pollination). The STS/CAS/CHS gene family members show a less distinct expression pattern. (c) Correlation of sequence similarity, expression pattern and cluster membership of the CYP82P3 (DB10H) gene family. Phylogeny tree (bottom, not drawn to scale) showing gene clades A-F. Clades containing previously published BIA genes are highlighted in purple. Genes located in clusters on chromosomes 2 and 4 (Fig. 3c) are shaded. Star symbol denote clades containing genes with similar expression patterns, sequences and chromosomal positions. Abbreviations for Figs. 4 and 5: At, Arabidopsis thaliana; Ec, Eschscholzia californica, R, root; SA, shoot apex; CL, cauline leaves; YL, young leaves; RL, rosette leaves; OL, old leaves; S, stem; FL10, floral stage 10; B1, bud stage 1; FL11, floral stage 11; B2, bud stage 2; FL12, floral stage 12; B3, bud stage 3; FL13, floral stage 13; B4, bud stage 4; SE, sepal; PE, petal; ST, stamen; CA, carpel; GYN, gynoecium; SI2, silique stage 2; F3, fruit 3 days after pollination (DAP); SI3, silique stage 3; F8, fruit 8 DAP; SI4, silique stage 4; F10-14, fruit 10-14 DAP; SI5, silique stage 5; F18, fruit 18 DAP.

The STS-CAS-CHS-like genes are members of the CYP719A clan of the cytochrome P450 enzyme family and include 9 genes in California poppy (Fig. 4b). They are involved in the formation of methylenedioxy-bridges that are found in berberine, sanguinarine, and californidine. STS, CHS, and CAS were all reported to participate in the root and cell culture pathway, while CYP719A9 and CYP719A11 are attributed to the above ground pathway (Ikezawa et al. 2009), which is corroborated by our data. Interestingly, genes characterized as aerial genes previously are mainly expressed in vegetative above-ground organs and carpel/early fruit development only, lacking from other floral tissues. In contrast, CAS, CHS, and STS show also strong transcript abundance in fruit development, suggesting additional activity of the root pathway genes in protecting fruits from herbivory. Most genes related to BIA biosynthesis show similar tissue-/stage-specific expression patterns. Our study shows that California poppy encodes for many tissue-specific BIA pathway genes, of which most are not characterized, providing valuable resources for further functional characterization of BIA biosynthesis genes, their regulators, and novel BIAs. Further, these results suggest that California poppy may produce tissue-specific BIA profiles via biosynthetic enzymes under the control of tissue-/stage-specific developmental regulators.

Physically clustered gene family members are often coexpressed

Because orthogroup members of BIA biosynthesis gene families often locate close to each other, we hypothesized that the Californian poppy clades within BIA-related gene families consist of genes with similar sequences and expression patterns. We thus assessed the degree of correlation in genomic location, phylogenetic relationships, and expression patterns among closely related CYP82N and P subfamily genes, including DS/DB10H, MSH, and P6H (Fig. 4c). The CYP82N+P members fall into 6 distinct clades (A–F), with clade D including the DS/DB-10H genes, clade E includes MSH, and clade F includes P6H. Of the 39 genes included in this phylogeny, 17 are in gene clusters. The dark gray cluster is composed of highly similar genes located on chromosome 2. All members of this cluster share strong expression in and some expression in old leaves and stems. Some genes in this cluster are also expressed in sepals and others in late fruit development stages (Table S11; Fig. S11).

Eight of 18 clade D genes are clustered on the far end (right side) of chromosome 4 are highlighted as medium gray in Fig. 4c. As shown in the gene tree, the phylogenetic relationships and variation in expression among genes in this cluster are more complex. Whereas DB10H, DS10H, and 3 closely related genes share expression in roots and in late-stage fruit development, the other genes in the cluster exhibit divergent expression patterns in tissues other than roots.

The light gray cluster on the left end of chromosome 4 includes 4 genes, 3 placed within clade D, which are not monophyletic, and the fourth in the sister lineage (C) representing and earlier duplication. Unusually for this gene family, all members from this cluster are strongly expressed in the stem, and, more typically, also in the root and, albeit weakly, during fruit development (Fig. 4c). Interestingly, the California poppy genome also includes physically clustered genes with high sequence similarity but divergent expression patterns. For example, MSH (2G475200) is expressed in several tissues while its sister paralog (2G475100) expression is restricted to root tissue. The neighboring genes 1G502300 and 1G502400 of group B are both expressed in petals, but one of them is additionally expressed in stamens while the other shows additional expression in roots. These may be examples for differential recruitment of regulatory sequences, which may lead to neofunctionalization after gene duplications.

Taken together, we can find our hypothesis is partially supported by data garnered from the CYP82N and P subfamilies: the gene cluster on chromosome 2 (highlighted as dark gray in Fig. 4) share high sequence and expression pattern similarity. Duplicate genes within this cluster may act redundantly or additively in a BIA biosynthesis pathway step and may multiply enzyme activity to increase metabolic flux. A subset of genes within the chromosome 4 gene clusters (medium and light gray in Fig. 4) show a similar pattern, but other members of these 2 clusters exhibit greater sequence divergence and variation in expression.

Carotenoid biosynthesis orthologs are differentially and floral homeotic MADS genes are similarly expressed in Arabidopsis and California poppy

We then asked how expression patterns are conserved between Arabidopsis and California poppy homologs of carotenoid biosynthesis genes and floral homeotic regulators to compare our findings for BIA biosynthesis gene family diversification with diversification of genes influencing other traits. Intriguingly, and in contrast to the BIA genes, carotenoid biosynthesis orthogroups are, with only few exceptions, conserved in low copy numbers across angiosperm evolution (Fig. 5a and b; Fig. S7). A comparative expression analysis of all genes encoding for enzymes required for carotene and xanthophyll biosynthesis in Arabidopsis and California poppy reveals similar expression levels of the genes encoding for carotenoid biosynthesis in vegetative organs (Fig. 5a and b). However, in later stages of flower development, genes encoding PDS, ZDS, CRTISO, CYP97A3, and CYP97B3 show a stronger expression in California poppy. Moreover, in sepals, petals, and gynoecium, LCYE, LCY1, CYP97B3, ZEP, VDE, and NSY are expressed at a higher level in California poppy than in Arabidopsis. Most California poppy genes show expression patterns distinct from their Arabidopsis orthologs. Interestingly, most carotenoid pathway genes are not expressed in roots or stamens. Absence of gene expression is often correlated with absence of the encoded protein, and while it is known that the carotenoid biosynthesis is transcriptionally regulated but also influenced by differential splicing, redox processes, plastid sequestration, posttranslational modifications, and substrate supply (Sun and Li 2020), enzyme availability in particular tissues plays a major role in differential carotenoid accumulation. Overall, high transcript abundances of many carotenoid biosynthesis genes may play a larger role in the orange pigmentation of California poppy petals than differences in gene copy number.

Tissue-specific expression of carotenoid biosynthesis orthologs and floral homeotic MADS-box orthologs in Arabidopsis and California poppy. (a) Carotenoid metabolism pathway showing enzymes and products according to their color. (b) Comparative heatmaps show expression (Z-Score of Log2(TPM + 1)) of carotenoid biosynthesis genes. Gene names assigned to heatmap is labelled with black lines. Minimum and maximum log2TPM + 1 values are placed below the lines. Left heatmap panel shows expression in Arabidopsis, right panel for California poppy. Gene identifiers are shown below. Genes responsible for petal pigmentation, such as bOHASEs are upregulated in petals in California poppy. (c) Simplified ABCDE model according to Theißen et al., 2016 indicating functionally characterized California poppy genes. Hypothesized protein complexes of homeotic gene functions are shown above. (d) Comparative heatmap as in (b) of floral homeotic MADS box orthologs showing conserved expression patterns.

MADS-box genes confer floral organ identity in a combinatorial way (Fig. 5c), and they also show roughly similar gene copy numbers per orthogroup (Fig. 5d; Fig. S7). However, gene expressions within these orthogroups are highly conserved between Arabidopsis and California poppy (Fig. 5d). Comparative expression analysis of the floral homeotic genes of California poppy and Arabidopsis reveals that most of them show their strongest expression in developing buds and floral organs at anthesis.

In summary, the California genome and expression atlas data reveal 3 different patterns of gene duplication loss and expression divergence depending on the biosynthetic or developmental pathways being considered (Fig. 6). BIA biosynthesis-associated gene families undergo massive duplication, with lagging divergence in expression profile. Carotenoid biosynthesis-associated gene families are mostly retained as single copy genes due to loss following single gene duplications and WGDs. At the same time, orthologous carotenoid biosynthesis genes exhibit lineage-specific expression profiles across the species phylogeny. Floral homeotic genes may increase their copy number per orthogroups slightly during WGD events, but the expression pattern remains remarkably stable across eudicot evolution.

Schematic representation of evolutionary trajectories of BIA, carotenoid and floral homeotic MADS box genes, summarizing their gene duplication history and divergence in expression patterns as indicated by gray, yellow, and blue.

While the opium poppy genome is already sequenced, a haplotype-resolved reference genome of California poppy in combination with a high-density expression atlas is a valuable resource for understanding the genetic base of Papaveraceae life history traits, and of the BIA metabolism. In this study, we present a haplotype-resolved genome assembly of the California poppy genome, which allowed us to reconstruct the duplication history of lineage-specific BIA biosynthesis genes, providing a powerful resource for future research into BIA metabolism and evolution. We show that many BIA biosynthesis genes are expressed in specific organs. For example, many BIA biosynthesis genes are expressed in floral organs and in fruits of all stages (Fig. 4). These BIAs may confer herbivore and pathogen deterrence, contributing to the invasiveness of the California poppy.

BIA genes duplicated massively by WGDs and small-scale duplications in the lineage leading to California poppy while the number of BIA gene homologs in the lineage leading to Arabidopsis remained low. Further, expression of the duplicates in California poppy diverged. Carotenoid gene numbers remained very low in both, the lineage leading to California poppy and Arabidopsis, but the expression patterns differ between California poppy and Arabidopsis orthologs. Floral homeotic MADS box genes show duplications at low frequency, but the expression patterns are almost identical between California poppy and Arabidopsis orthologs.

We further introduce a high-density expression atlas of California poppy, including vegetative and reproductive organs, and floral developmental stages allowing for the first time, to our knowledge, a detailed comparison of expression between Arabidopsis genes and their California poppy homologs. Using these datasets corroborates the previous finding that floral homeotic genes of the MADS box gene family share expression pattern across large phylogenetic distances (Johansen et al. 2002). The molecular evolution of these genes is characterized by a moderate increase in family members during angiosperm evolution, most likely due to WGDs, combined with a conserved expression pattern. A different evolutionary scenario is observed for gene families contributing to carotenoid biosynthesis. Here, the number of gene family members remains constant, suggesting purging of duplicates after WGD. Previous reports show that loss of duplicates after WGD is often associated with excision during genome fractionation (Yu et al. 2020), a scenario that seems also likely for Papaveraceae, whose genomes feature show only limited synteny, suggestive of genome fractionation after WGD. Interestingly, gene copy numbers of carotenoid biosynthesis genes remain largely constant throughout angiosperms but coincide with strong expression pattern divergence between Arabidopsis and California poppy. However, carotenoid metabolism regulation involves, in addition to transcriptional regulation, also post-transcriptional and post-translational regulation, and relies on protein interaction partner for enzyme complex formation, substrate availability, and chromoplast compatibility (Zhou et al. 2018, 2022). Thus, regulation of expression might be only a minor factor in regulating carotenoid biosynthesis gene activity and selection pressure stabilizing expression pattern may be mild, allowing for the divergence we observed. Further, our work shows that several gene families involved in poppy BIA metabolism exhibit similar duplication histories. Our phylogeny reconstructions combined characterization of chromosomal context reveal that increase in poppy BIA gene copy numbers occurred through small scale segmental duplications after the divergence of California poppy and opium poppy lineages. BIA gene duplicates are distributed in dense clusters and often share very similar gene expression patterns with their neighboring paralogs. Interestingly, the California poppy genome does harbor BGCs like the STORR cluster reported from opium poppy (Guo et al. 2018). Generally, we find lower copy numbers in the BIA core pathway when compared with branching points or the tips of the pathway, also described for the terpenoid biosynthesis pathway where negative selection is progressively relaxed along metabolic pathways (Ramsay et al. 2009). However, evolutionary rate analyses are required to elucidate this effect in California poppy.

Materials and methods

Plant growth and nucleic acid extraction

E. californica plants (cv. Aurantiaca orange) were grown in the greenhouse in natural daylight supplemented in 9-cm square pots with Einheitserde Classic as substrate and were fertilized every 2 weeks with liquid fertilizer. This cultivar was chosen because of its genetic resources like VIGS and stable transformation and availability from commercial growers in many countries (Wege et al. 2007; Lotz et al. 2022). For DNA extraction, young leaf tissue from a single individual was frozen. For RNA isolation, tissues from several plants were pooled and immediately snap frozen. RNA was extracted using the NucleoSpin RNA plant mini kit (Macherey-Nagel, Düren, Germany), and quantification of RNA was done using a spectrophotometer. High-molecular-weight DNA was extracted from young leaves using the protocol of Doyle and Doyle (1987) with minor modifications. Flash-frozen young leaves were ground into a fine powder in a frozen mortar with liquid nitrogen followed by very gentle extraction in 2% CTAB buffer (containing proteinase K, PVP-40, and β-mercaptoethanol) for 30 min to 1 h at 50 °C. After centrifugation, the supernatant was gently extracted twice with 24:1 chloroform:isoamyl alcohol. The upper phase was removed, and 1/10th volume of 3 M sodium acetate was added and gently mixed. DNA was then precipitated with isopropanol, collected by centrifugation, washed with 70% ethanol, air-dried for 20 min, and dissolved thoroughly in elution buffer at room temperature, followed by RNase treatment. DNA purity and concentration were measured with a spectrophotometer, and DNA size was validated via automated electrophoresis.

Library preparation and sequencing

Genome sequencing libraries were prepared as follows. For illumina PCR free sequencing: 500 ng to 1.5 µg of DNA was sheared on a Covaris Instrument to 350 bp, bead cleaned, end repaired, and then bead treated for large and small fragment removal. After adenylation, adaptors were ligated (Illumina TruSeq PCR-Free DNA Library Prep Kit). The prepared libraries were quantified using KAPA Biosystems’ next-generation sequencing library qPCR kit and run on a Roche LightCycler 480 real-time PCR instrument. Sequencing of the flowcell was performed on the Illumina NovaSeq sequencer using NovaSeq XP V1 reagent kits, tbd-sample dependent flowcell, following a tbd-sample dependent indexed run recipe. For PacBio HiFi sequencing, PacBio Sequencing primer was then annealed to the SMRTbell template library and sequencing polymerase was bound to them using Sequel II Binding kit 2.0. The prepared SMRTbell template libraries were then sequenced on a Pacific Biosystems’ Sequel II sequencer using tbd-sample dependent sequencing primer, 8 M v1 SMRT cells, and Version 2.0 sequencing chemistry with 1 × 1,800 sequencing movie run times.

For PacBio Isoseq reference transcriptome sequencing, full-length cDNA was synthesized using template switching technology with NEBNext Single Cell/Low Input cDNA Synthesis & Amplification Module kit. The first-strand cDNA was amplified and multiplexed with NEBNext High-Fidelity 2X PCR Master Mix using Barcoded cDNA PCR primers. The amplified cDNA was purified using 1.3X ProNex beads for non-size selection or 0.89X ProNex beads for above 2-kb size selection, and like sizes were pooled at the equimolar ratios in a designated Degree-of-Pool in the worksheet using PacBio Multiplexing Calculator. The pooled samples were end-repaired, A-tailed, and ligated with overhang nonbarcoded adaptors using SMRTbell Express 2.0 kit.

For Illumina sequencing with polyadenylation selection, plate-based RNA sample prep was performed on the PerkinElmer Sciclone NGS robotic liquid handling system using Illumina's TruSeq Stranded mRNA HT sample preparation kit (Illumina, San Diego, CA, USA) utilizing poly-A selection of mRNA following the manufacturer's protocol with the following conditions: total RNA starting material was 1 µg per sample and 8 cycles of PCR was used for library amplification. Quantification of libraries was done using KAPA Biosystem's next-generation sequencing library qPCR kit and run on a LightCycler 480 (Roche, Wilmington, MA, USA). The quantified libraries were then multiplexed and sequenced on the Illumina NovaSeq 6000 sequencing platform using NovaSeq XP v1.5 reagent kits, S4 flow cell, following a 2 × 150 indexed run protocol (Illumina, San Diego, CA, USA).

Genome sequencing

We sequenced California poppy (var. Aurantiaca Orange King Plant1.1) using a whole genome shotgun sequencing strategy and standard sequencing protocols. Sequencing reads were collected using Illumina and PacBio platforms. Illumina and PacBio reads were sequenced at the HudsonAlpha Institute in Huntsville, Alabama, USA. Illumina reads were sequenced using the Illumina NovoSeq6000 platform, and the Pac-Bio reads were sequenced using the SEQUEL II platform. Two 400-bp insert 2 × 150 Illumina fragment libraries (145.70×) were sequenced along with one 2 × 150 OmniC library (133.73×) (Table S1). Prior to assembly, Illumina fragment reads were screened for phix contamination. Reads composed of >95% simple sequence were removed. Illumina reads <50 bp after trimming for adapter and quality (q < 20) were removed. The final read set consists of 221,596,449 reads for a total of 89.1× of high-quality Illumina bases. For the PacBio sequencing, a total raw sequence yield of 98.6 Gb, with a total coverage of 140.81× total coverage was achieved (see Table S2).

Genome assembly and construction of pseudomolecule chromosomes

The version 1.0 assemblies were generated by assembling the PacBio CCS reads using the HiFiAsm (16.1) +HIC assembler (Cheng et al. 2021) and subsequently polished using RACON (1.14) (Vaser et al. 2017). This produced initial assemblies of both haplotypes (HAP1 and HAP2). OmniC Illumina reads from Eschscholzia californica (var. Aurantiaca Orange King Plant1.1) were separately aligned to the HAP1 and HAP2 contig sets with Juicer (1.5) (Durand et al. 2016), and chromosome scale scaffolding was performed with 3D-DNA (build_180419) (Dudchenko et al. 2017). The contigs were then oriented, ordered, and joined together into 6 chromosomes per haplotype using the HiC data. Contigs terminating in significant telomeric sequence were identified using the (TTTAGGG)n repeat, and care was taken to make sure that they were properly oriented in the production assembly. The remaining scaffolds were screened against bacterial proteins, organelle sequences, and GenBank nr and removed if found to be a contaminant. In comparison of the 2 haplotypes, it was noted that there was a large region on chromosome Chr01 that was present in the HAP2 chromosome but not the HAP1 copy. CCS reads aligned to the main chromosomes revealed that this region carried 2× the depth of the surrounding regions and was likely a large homozygous section of the chromosome. In the V1 release, the homozygous region in the HAP2 Chr01 (position 41,548,121 to 55,864,660) was duplicated in the HAP1 Chr01 (position 41,845,747 to 56,162,286). Finally, homozygous SNPs and INDELs were corrected in the HAP1 and HAP2 releases using ∼54.4 × of Illumina reads (2 × 150, 400 bp insert) by aligning the reads using bwa mem (0.7.17-r1188) (Li 2013) and identifying homozygous SNPs and INDELs with the GATK's UnifiedGenotyper tool (3.6-0-g89b7209) (McKenna et al. 2010).

Gene predictions

Transcript assemblies for both haplotypes were made from 2X150 stranded paired-end Illumina RNA-seq reads using PERTRAN (v2.0), which conducts genome-guided transcriptome short-read assembly via GSNAP (Version 2019-09-12, Wu and Nacu 2010) and builds splice alignment graphs after alignment validation, realignment, and correction. Additionally Pac-Bio Iso-Seq CCSs were corrected and collapsed by a genome-guided correction pipeline. A repeat library was created from de novo repeats predicted by RepeatModeler2 (2.04) (Flynn et al. 2020) on the California poppy var. Aurantiaca Orange King Plant1.1 HAP1 v1.0 genome. The predicted repeats underwent functional analysis through InterProScan (v5.51-85.0) (Jones et al. 2014), incorporating the Pfam (33.1) (Mistry et al. 2021) and PANTHER (15.0) (Mi et al. 2019) databases. Any repeats that displayed significant hits to protein-coding domains were subsequently excluded from the repeat library. Finally, the constructed species-specific repeat library was used to soft-mask the HAP1 and HAP2 genomes with RepeatMasker (4.12) (Smit et al. 2013-2015). Putative gene loci were determined by transcript assembly alignments and/or EXONERATE (2.4.0) (Slater and Birney 2005) alignments of proteins from 20 plant species to repeat-soft-masked HAP1 and HAP2 genomes with up to 2 K BP extension on both ends unless extending into another locus on the same strand. Gene models in each locus were predicted by a combination of different homology-based predictors and subsequent selection of the best models for each locus based on extrinsic data. The selected gene predictions were improved by PASA (2.0.2). The improvement included adding UTRs, splicing correction, and adding alternative transcripts. PASA-improved gene model proteins were subject to protein homology analysis to the above-mentioned proteomes to obtain different metrics to manually filter out low quality models.

Functional annotation

Every peptide sequence in the dataset was analyzed with a computational pipeline that includes the standard InterProScan (v5.65-97.0) (Jones et al. 2014) suite of programs to determine protein domains and other sequence features, E2P2 (v4.0) (Chae et al. 2014; Schläpfer et al. 2017) for enzyme assignments (EC), and PathoLogic (v20.0) (Karp et al. 2021) for metabolic pathway assignments. Additional processing was used to determine Eukaryotic Orthologous Groups (KOG) gene assignment using a modified mutual best hit algorithm. Results of the InterProScan calculations were used to assign standard InterPro protein domain associations and from these, gene ontology (GO) terms. Protein domains inferred from these calculations were used to develop a putative gene functional assignment, which includes a count of the multiplicity of the assignment in the proteome set.

RNA seq read abundance estimation

Read abundance and TPM values were estimated from the RNA-seq libraries on the annotated transcripts on the reference genome using salmon (v1.9.0, “quant” mode, default parameter) (Patro et al. 2017). The reference genome salmon index was computed with decoy sequences following the salmon manual.

Phylogenies reconstruction, estimation of gene copy numbers, and duplication events

Sequences were acquired via Phytozome (phytozome-next.jgi.doe.gov) BLAST search (Altschul et al. 1990; Goodstein et al. 2012) (for: A. trichopoda, T. arvense*, L. sativum*, C. violacea*, A. coerulea, E. californica, and S. lycopersicum) or in house BLAST server (Priyam et al. 2019) (for: C. tometella, C. chinensis, P. somniferum, and M. cordata) (*restricted to Fort Lauderdale Accord). A. thaliana sequences acquired from TAIR (arabidopsis.org) or known protein sequences of E. californica involved in BIA biosynthesis were used as query. Maximum-Likelihood phylogenies were computed using IQTREE 2 (2.0.7) (Minh et al. 2020) fed with MSAs generated with MAFFT (v7.490) (Katoh and Standley 2013), due to overall high sequence conservation with few non-informative sites no trimming was deemed necessary. All trees were evaluated and refined similar as described by Roessner et al. (2024) and are rooted to the closest A. trichopoda ortholog that was sister to clades including both Ranunculales and Brassicales. Duplication events per branch and total copy numbers were counted for each phylogeny (bootstrap consensus trees with a minimum of 3 bootstraps) and compiled per pathway or gene family (Supplementary Fig. S5 (Dataset); https://doi.org/10.22029/jlupub-20142). All numbers shown are minimal estimates, as our search did not account for pseudogenized copies (BLASTp algorithm) and only shows duplication events that could clearly be identified ignoring badly resolved parts of the topology (Supplementary Fig. S5 (Dataset) S1; https://doi.org/10.22029/jlupub-20142). All computing was carried out on de.NBI SimpleVM (de.NBI medium: 14 VCPUs 32 GB RAM or de.NBI large: 28 VCPUs 64 GB RAM).

GENESPACE (1.4) (Lovell et al. 2022) was used to identify and analyze syntenic blocks. Tracks for SNPs between haplotypes and gene density were generated using SyRi (v1.7.0) (Goel et al. 2019) and visualized in 100-kb bins with R (4.2.3). Computing for GENESPACE analyses were carried out on de.NBI SimpleVM (de.NBI large: 28 VCPUs 64 GB RAM).

Ks plots for homologous gene sequence pairs were generated using Tree2GD (v1.0.40) (Chen et al. 2022) with sequence information of all 5 Ranunculales species. Multiple sequence alignments and gene trees were estimated using MAFFT (v7.490) (Katoh and Standley 2013) and using IQTREE 2 (2.0.7) (Minh et al. 2020), respectively, for 5,937 syntenic HOGs that included all 7 genomes in our analyses. The resulting gene trees were scanned to place gene duplications on the species tree using PUG (vV2.1) (Phylogenetic Placement of Polyploidy Using Genomes; McKain et al. 2016; https://github.com/mrmckain/PUG). Duplication nodes in the gene trees were counted for each ancestral node if the node exhibited a bootstrap support value of 80 of higher for a clade including paralogs for species in the clade above the node. Duplications on the tip lineages were counted if 2 or more paralogs for each tip species formed a clade with a bootstrap support value of 80 or higher. If a species-specific paralog clade included more than 1 paralog (ie, suggesting multiple duplications), only 1 duplication was counted.

Gene expression analysis

Differential gene expression analyses were done with DESeq2 (1.48.2) (Love et al. 2014). Organ specificity was determined via calculating expression ratios by dividing the maximum TPM value for each gene and setting a cutoff at >0.1, representing one-tenth of max expression. Low expressed genes were filtered out (TPM_max cutoff > 5).

For the comparison of gene expression, RNA-seq data (TPM) from A. thaliana (Mergner et al. 2020) was downloaded from EBI Expression Atlas (www.ebi.ac.uk) and equivalent tissues were selected. Heatmaps were generated with ComplexHeatmap (2.24.1) (Gu et al. 2016) using Z-Scores of Log2(TPM+1) values. Pearson clustering of gene expression was used for Figs. S9 and S10. Numbers of gene copy variants were inferred from OrthoFinder (2.5.5) (Emms and Kelly 2019) analyses using proteomes of A. thaliana, L. sativum, B. rapa, C. papaya, V. vinifera, S. lycopersicum, P. somniferum, E. californica, C. chinensis*, C. tomentella*, A. coerulea, O. sativa, and A. trichopoda with default settings. Dot plot was generated with ggplot2 (v4.0) (Wickham 2016). Images were edited in Inkscape (1.4.2).

Additional information on methods used for flow cytometric genome size estimation, genome and transcriptome assembly, and gene predictions can be found in Supplementary Information S1.

Accession numbers

Sequence data from this article can be found in the Phytozome and TAIR data libraries under accession numbers provided in Table S12.

Supplementary Material

koag039_Supplementary_Data

Bibliography84

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol. 215:403–410. 10.1016/S 0022-2836(05)80360-2.2231712 · doi ↗ · pubmed ↗
2Anderson MK . 2007. Native American uses and management of California's Grasslands. California grasslands ecology and management. University of California Press. p. 57–67.
3Barrell PJ, Wakelin AM, Gatehouse ML, Lister CE, Conner AJ. 2010. Inheritance and epistasis of loci influencing carotenoid content in petal and pollen color variants of California Poppy (Eschscholzia californica Cham.). J Hered. 101:750–756. 10.1093/jhered/esq 079.20631045 · doi ↗ · pubmed ↗
4Becker A, Gleissberg S, Smyth DR. 2005. Floral and vegetative morphogenesis in California Poppy (Eschscholzia californica Cham.). Int J Plant Sci. 166:537–555. 10.1086/429866. · doi ↗
5Becker A, Yamada Y, Sato F. 2023. California poppy (Eschscholzia californica), the Papaveraceae golden girl model organism for evodevo and specialized metabolism. Front Plant Sci. 14:1084358. 10.3389/fpls.2023.1084358.36938015 PMC 10017456 · doi ↗ · pubmed ↗
6Chae L, Kim T, Nilo-Poyanco R, Rhee SY. 2014. Genomic signatures of specialized metabolism in plants. Science. 344:510–513. 10.1126/science.1252076.24786077 · doi ↗ · pubmed ↗
7Chen D, Zhang T, Chen Y, Ma H, Qi J. 2022. Tree 2GD: a phylogenomic method to detect large-scale gene duplication events. Bioinformatics. 38:5317–5321. 10.1093/bioinformatics/btac 669.36218394 · doi ↗ · pubmed ↗
8Cheng H, Concepcion GT, Feng X, Zhang H, Li H. 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 18:170–175. 10.1038/s 41592-020-01056-5.33526886 PMC 7961889 · doi ↗ · pubmed ↗