Preliminary Identification of Putative Terpene Synthase Genes in Caryocar brasiliense and Chemical Analysis of Major Components in the Fruit Exocarp
Helena Trindade, Bruno Nevado, Raquel Linhares Bello de Araújo, Viviane Dias Medeiros Silva, Lara Louzada Aguiar, Ana Ribeiro, Julio Onesio-Ferreira Melo, Paula Batista-Santos

TL;DR
This study identifies potential genes responsible for terpene production in the pequi fruit and analyzes its chemical composition, supporting future research on its health and commercial benefits.
Contribution
The first identification of putative terpene synthase genes in Caryocar brasiliense and their classification into phylogenetic subfamilies.
Findings
Over 90% of genes were identified in the fragmented genome, with 71% containing complete sequences.
Twenty-two putative terpene synthase genes were identified and classified into phylogenetic subfamilies.
Eleven chemical compounds, including a terpene, were identified in the fruit exocarp using GC-MS.
Abstract
Background: Caryocar brasiliense Camb. Caryocaraceae is a typical tree from the Brazilian Cerrado with commercial importance due to its edible fruit, known as pequi. This native plant holds significant economic value and is a key candidate for cropping systems. Rich in phytochemicals, such as phenolics, flavonoids, and terpenoids, it has shown notable health benefits. Methods: Considering the importance of terpenes and their biological properties, and based on the first draft genome of C. brasiliense, this study aimed to identify putative terpene synthase genes and classify them into the phylogenetic subfamilies previously identified across all plant lineages. The presence of terpenes was also verified in samples of the outer portion of the fruit by solid-phase microextraction gas chromatography mass-spectrometry. Results: Analysis of genome completeness showed that over 90% of genes…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2- —Universidade Federal de São João del-Rei (UFSJ—PROPE)
- —Universidade Federal de Minas Gerais (UFMG)
- —Portuguese funds through FCT—Fundação para a Ciência e a Tecnologia, I.P.
- —Centro de Estudos Florestais
- —LEAF-Linking Landscape, Environment, Agriculture and Food
- —Associate Laboratory TERRA
- —cE3c—Centre for Ecology
- —funds of the Tropical College of the University of Lisbon—CTROP-ULisboa
- —Fundação para a Ciência e a Tecnologia
- —Coordination for the Improvement of Higher Education Personnel (CAPES)
- —National Council for Scientific and Technological Development (CNPq)
- —Minas Gerais State Research Support Foundation (FAPEMIG)
- —he Teaching, Research and Extension Group in Chemistry and Pharmacognosy (GEPEQF)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPlant biochemistry and biosynthesis · Phytochemistry Medicinal Plant Applications · Sesquiterpenes and Asteraceae Studies
1. Introduction
The Cerrado biome is the second-largest biome of Brazil and is considered a hotspot due to the rapid loss of biological diversity caused by agriculture, livestock, and urbanization [1]. Pequi (Caryocar brasiliense Camb.) is a typical edible fruit from the Brazilian Cerrado, with great occurrence and economic importance in this region. Pequi trees play a valuable ecological role in these ecosystems, since they are a source of food and habitat for fauna [2]. These fruits are rich in nutrients, have unique sensory characteristics, and are used in regional cuisine, for the preparation of flours, for the extraction of oils for cosmetics, and for their therapeutic properties [3,4,5].
Pequi has recently been reported to have analgesic and anti-inflammatory properties [6], reducing oxidative stress, inflammation, and anemia associated with aging in Swiss mice [7]. It is also referred to have anticholinesterase and antioxidant activities, as well as to prevent memory loss in mice caused by aluminum consumption and brain lipid peroxidation [8]. Other studies with the ethanolic extract of pequi bark have shown very low toxicity in vitro and in vivo [9,10] as well as a protective effect against oxidative stress in human coronary artery endothelial cells [11], supporting its medicinal potential.
The compounds associated with positive health impacts include a list of specialized metabolites with active principles, ranging from phenolics, terpenes, and alkaloids [12,13]. Terpenoids are the largest group of specialized metabolites, and tens of thousands of terpenoid compounds have been identified in higher plants [14]. Chemically, they are polymeric isoprene units (C5) that can be arranged in various lengths of backbone polymers. Terpene synthase enzymes can give rise to a multitude of molecules that have been pivotal to the survival and evolution of higher plants. Furthermore, these compounds have been associated with beneficial health effects, such as anti-aggregatory, antiallergic, anti-coagulation, anti-inflammatory, neuroprotective, sedative, analgesic [15], and other biological properties [16], including antimicrobial and antifungal activities [15,17]. Some terpenes can be considered as ecological pesticides, and essential oils rich in terpenes have proven activity against several fungi [18,19]. Studies based on individual terpenes have also shown fungicide activity against Botrytis cinerea, a plant pathogen that affects cultures [20]. Terpenes have been previously identified from pequi fruits [21,22,23] and include several monoterpenes, e.g., α-phellandrene, β-myrcene, β-ocimene, and the diterpene geranyllinalool, just to mention a few. The technique of headspace solid-phase microextraction with gas chromatography—mass spectrometry analysis has been used to identify terpenes and other volatile compounds present in different fruits of the Cerrado, such as Eugenia dysenterica [24], Eugenia brasiliensis [25], Eugenia klotzschiana [26], and pequi peel [27]. Fruit peels have been used to extract a variety of bioactive compounds, including terpenes [28].
Considering the published genome sequencing data for Caryocar brasiliense [29], mining this information will allow the identification of genes responsible for important biological characteristics. In this study, we focused on the identification of putative genes involved in terpene biosynthesis in pequi, laying the foundation for future biotechnological approaches to improve terpene synthesis, several of which rely on yeast systems [30,31,32]. Our preliminary study should be extended, and functional gene characterization needs to be performed for full validation of gene function. These biotechnological tools are considered of utmost importance, considering that terpene extraction from natural sources, as performed in the past, nowadays raises environmental concerns and is no longer considered a viable option.
2. Materials and Methods
2.1. Sequence Retrieval and Identification of Putative Terpene Synthase Genes
In this study, genomic sequences from C. brasiliense [29] were obtained from GenBank under accession number GCA_004918865.1 (Table 1). To infer the genome completeness, BUSCO v. 4.1 (Benchmarking Universal Single-Copy Orthologs) [33] with the Viridiplantae database was used. To identify the Terpene synthase genes, the genome of C. brasiliense was annotated using the MAKER pipeline v2.31 [34]. Both ab initio and homology-based evidence were used and obtained from the proteomes of related species in the Malpighiales order (Table 2), available from the 1KP database [35]. The resulting protein-coding genes against the Pfam-A database were searched using interproscan v 5.61 [36]. We classified as putative Terpene-synthase genes all genes with the best hit against the Terpene_synth_C (PF03936) or the Terpene synthase N-terminal domain (PF01397) profiles [37].
2.2. Phylogenetic Analyses
To classify the putative Terpene synthase genes of C. brasiliense into the phylogenetic subfamilies identified in previous studies [37], we obtained the unaligned sequence data containing all Terpene synthase genes (longer than 350 amino acids) previously identified across green plants [37]. The newly identified Terpene synthase genes from C. brasiliense (minimum length: 298 amino acids) were added to this dataset. All data were aligned using mafft v 7.5 [38] with 1000 iterations of improvement. The best-fitting protein evolution model was identified with modeltest-ng v.0.1.7 [39,40]. The phylogenetic inference was performed using raxml-ng v.1.1 [41] with 10 random starting trees and 100 bootstrap replicates, using the best-fitting protein evolution model (JTT + G + F).
2.3. Solid-Phase Microextraction Gas Chromatography–Mass Spectrometry
Polydimethylsiloxane/Divinylbenzene (PDMS/DVB, 65 μm) fibers were employed for the solid-phase microextraction (SPME) and gas chromatography-mass spectrometry (GC-MS) analysis. Samples of the outer portion of three fruits of C. brasiliense weighing 1.0 g were transferred to 20 mL headspace vials, in triplicate, which were then sealed. The samples were taken to a heating plate, on which an aluminum block with a cylindrical bore was placed, in order to place the headspace vials for sample heating. Samples were pre-heated for 5 min, after which the PDMS/DVB fiber holder was inserted into the vial, and the fiber was exposed with the temperature kept at 50 °C for 10 min. The PDMS/DVB fiber was then retracted, transferred to the GC-MS injector, and exposed, where it remained in the equipment for 5 min and was retracted for the remainder of the run [42].
A gas chromatograph coupled with a mass spectrometer (Shimadzu Scientific Instruments, Kyoto, Japan). A split/splitless injector in splitless mode was used as an ion-trap type analyzer, and it was maintained for 5 min at a temperature of 250 °C. Helium gas (1 mL min^−1^ flow) was used with a HP-5, 30 m × 0.25 mm × 0.25 μm, MS capillary column (5% phenyl and 95% methylpolysiloxane) (Agilent Technologies Inc., Munich, Germany). The column was held at 40 °C for 1 min, and then, the temperature was increased at a rate of 12 °C min^−1^ up to 120 °C, maintaining it for 2 min, followed by an increase of 15 °C min^−1^ up to 150 °C and at 20 °C min^−1^ to 245 °C, held for 2 min [26].
Mass spectrometry was set to fragment ions between 35 and 300 m/z in 70 eV electron impact ionisation mode; the transference line temperature was 275 °C, and the ion source temperature was 200 °C. Volatile compounds were identified based on the mass-to-charge ratio (m/z) of the sample ion fragments corresponding to each peak generated by the chromatogram. The mass spectra of the analytes found were compared with the mass spectra data obtained from the NIST library (National Institute of Standards and Technology), using the 2011 version of the NIST/EPA/NIH Mass Spectral Database (NIST 11), using Xcalibur software version 2.1 (Thermo Scientific, San Jose, CA, USA), and considering the level of similarity (reverse lookup index, RSI) greater than 600. The RSI index consists of a numerical comparison factor where the higher its value, the closer the compound is to the finding in the NIST library literature. However, only peaks with a value above 600, a relative standard intensity (RSI) and a signal-to-noise ratio (S/N) above 50 decibels were selected.
3. Results and Discussion
Analysis of genome completeness using BUSCO showed that, despite a highly fragmented assembly, over 90% of BUSCO genes were found in the genome assembly of C. brasiliense, with 71% containing complete gene sequences and an additional 21% of genes present but fragmented (Table 1). We identified 33,767 protein-coding genes using the MAKER pipeline. Of these, 22 genes had homology with either (or both) of the Terpene synthase Pfam-A profiles and were thus retained as putative Terpene synthase genes (Table 3). Search on NCBI Conserved Domain [43] allowed us to find several motifs and domains that validated the sequences as partial putative terpene synthases. Larger sequences, CbTPS19, CbTPS20, CbTPS21, and CbTPS22, with respectively 458, 497, 561, and 729 amino acids, showed at least four (or five) of the total five conserved domains characteristic of these terpene synthases (Table 3).
Of the 22 genes identified, we used the ten longest (minimum length: 298 aa; Table 3) for phylogenetic inference. In the resulting phylogenetic tree, C. brasiliense terpene synthase genes clustered within the different angiosperm clades identified in previous work [37] and allowed us to classify each gene into the different phylogenetic subfamilies (Figure 1). Six genes belonged to the h/d/a/b/g subfamily: CbTPS14, CbTPS19, and CbTPS20 clustered together with accessions from Arabidopsis thaliana, while CbTPS15, CbTPS16, and CbTPS21 clustered together with accessions from Oryza sativa. The genes forming the group TPS-h/d/a/b/g are apparently involved in secondary (specialized) metabolism [37].
Considering the remaining four genes, three genes belonged to the c subfamily, namely CbTPS13, CbTPS17, and CbTPS18, and one gene, CbTPS22, was assigned to the e/f subfamilies. Both TPS-c and TPS-e/f subfamilies can be involved in gibberellin biosynthesis, but they can also give rise to numerous proteins involved in secondary metabolism.
The partial sequences we identified here can be used for primer design that will allow gene amplification, followed by heterologous gene expression. This will allow the obtention of protein that will be used to assay enzymatic activity. The final outcome will be the identification of the respective terpene synthases [44,45].
Gas chromatography-mass spectrometry analyses revealed a diverse phytochemical profile (Table 4) comprising five carboxylic acid esters (butanoic acid ethyl ester, (E)-2-butenoic acid ethyl ester, methyl hexanoate, ethyl hexanoate, and ethyl octanoate), one ethyl ester of carboxylic acids (all identified as ethyl acetate), one α-amino acid (alanine), one α,β-unsaturated aldehyde ((E)-2-hexenal), one α,β-unsaturated carboxylic acid ester (ethyl 2-hexenoate), one monoterpene hydrocarbon ((Z)-β-ocimene), and one formate ester (ethenyl formate). Figure 2 shows the resulting chromatogram of the HS-SPME-GC-MS analyses of samples of C. brasiliensis.
The α,β-unsaturated aldehyde (E)-2-hexenal (RT 5.995) functions as a critical green leaf volatile (GLV) rapidly synthesized via the lipoxygenase pathway following tissue damage [46]. The conjugated double bond system confers significant antimicrobial and antifungal properties, serving as part of the plant’s chemical defense arsenal against a broad spectrum of phytopathogens. Recent research has demonstrated its efficacy against economically important pathogens, including Botrytis cinerea and various Colletotrichum species, with minimum inhibitory concentrations in the low ppm range [47,48]. Agricultural applications have expanded to include (E)-2-hexenal-based biopesticides and plant elicitors that activate systemic acquired resistance mechanisms, potentially reducing conventional fungicide requirements by 30–40% when incorporated into integrated pest management programs [49].
Alanine (RT 1.420), an α-amino acid, plays fundamental roles in primary metabolism beyond its function as a protein building block. This compound participates centrally in transamination reactions and the alanine-glucose cycle that regulates carbon and nitrogen flux between plant tissues [50]. During environmental stress conditions, particularly drought and hypoxia, alanine accumulation serves as both a biochemical stress indicator and adaptive response, functioning as a compatible solute that maintains cellular osmotic balance without disrupting enzyme function [51].
The monoterpene hydrocarbon (Z)-β-ocimene (RT 10.150) represents a significant volatile compound derived from the isoprenoid pathway through the MEP pathway in plastids [52]. This acyclic terpene functions prominently in tritrophic plant-herbivore-predator interactions, serving as both a herbivore deterrent and an attractant for natural enemies, including parasitoid wasps and predatory insects. Studies have demonstrated that plants under herbivore attack can increase β-ocimene emissions by up to 1000-fold, triggering defense responses in neighboring plants through volatile signaling networks [53]. The compound’s conjugated diene structure confers notable antioxidant properties, with radical-scavenging activity comparable to vitamin E analogs in some assay systems [54]. β-ocimene serves as a key component of essential oil-based formulations targeting agricultural pest management, particularly in organic production systems where conventional pesticides are restricted [55].
4. Conclusions
In conclusion, this analysis based on the pequi draft genome allowed us to recover the majority of BUSCO genes, including most complete sequences, and to identify 22 putative terpene synthase genes. Phylogenetic analysis of ten well-supported sequences further revealed their distribution across established Angiosperm clades, enabling classification into distinct terpene synthase subfamilies and providing insight into their evolutionary relationships. These preliminary results lay the foundation for further studies to fully characterize terpene synthase gene sequences in pequi and to explore their potential applications. providing a basis for future investigations that may include quantification and functional validation. Furthermore, solid-phase microextraction gas chromatography mass-spectrometry has allowed the identification of significant chemical compounds, including a terpene that plays a key role in this species metabolism and putatively displays significant applications in the food, health, and agricultural industries.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Damasco G. Fontes C. Françoso R. Haidar R. The Cerrado biome: A forgotten biodiversity hotspot Front. Young Minds 201862210.3389/frym.2018.00022 · doi ↗
- 2Araujo F.D. A review of Caryocar brasiliense (Caryocaraceae)—An economically valuable species of the central Brazilian cerrado Econ. Bot.199549404810.1007/BF 02862276 · doi ↗
- 3De Carvalho L.S. Pereira K.F. de Araújo E.G. Botanical features, therapeutic effects and active ingredients present in pequi (Caryocar brasiliense)Arq. Ciênc. Saúde UNIPAR 20151914715710.25110/arqsaude.v 19i 2.2015.5435 · doi ↗
- 4Pinto L.C.L. Morais L.M.O. Guimarães A.Q. Almada E.D. Barbosa P.M. Drumond M.A. Traditional knowledge and uses of the Caryocar brasiliense Cambess. (Pequi) by “quilombolas” of Minas Gerais, Brazil: Subsidies for sustainable management Braz. J. Biol.20167651151910.1590/1519-6984.2291427058602 · doi ↗ · pubmed ↗
- 5Santos B.O. Tanigaki M. Silva M.R. Ramos A.L.C.C. Labanca R.A. Augusti R. Melo J.O.F. Takahashi J.A. de Araújo R.L.B. Development and Chemical Characterization of Pequi Pericarp Flour (Caryocar brasiliense Camb.) and Effect of in vitro Digestibility on the Bioaccessibility of Phenolic Compounds J. Braz. Chem. Soc.2022331058106810.21577/0103-5053.20220022 · doi ↗
- 6Junior A.J. Leitão M.M. Bernal L.P.T. dos Santos E. Kuraoka-OliveiraÂ.M. Justi P. Argandoña E.J.S. Kassuya C.A.L. Analgesic and Anti-inflammatory Effects of Caryocar brasiliense Antiinflamm. Antiallergy Agents Med. Chem.20201931332210.2174/187152301866619040814432030961515 · doi ↗ · pubmed ↗
- 7Roll M.M. Miranda-Vilela A.L. Longo J.P.F. da Agostini-Costa T.S. Grisolia C.K. The pequi pulp oil (Caryocar brasiliense Camb.) provides protection against aging-related anemia, inflammation and oxidative stress in Swiss mice, especially in females Genet. Mol. Biol.20184185886910.1590/1678-4685-gmb-2017-021830507999 PMC 6415600 · doi ↗ · pubmed ↗
- 8De Oliveira T.S. Thomaz D.V. Neri H.F.D.S. Cerqueira L.B. Garcia L.F. Gil H.P.V. Pontarolo R. Campos F.R. Costa E.A. Dos Santos F.C.A. Neuroprotective effect of Caryocar brasiliense Camb. leaves is associated with anticholinesterase and antioxidant properties Oxid. Med. Cell. Longev.20182018984290810.1155/2018/984290830420910 PMC 6215548 · doi ↗ · pubmed ↗
