Single- and Multi-Trait Genome-Wide Association Analyses Identify the Genetic Loci and Candidate Genes for Growth Traits in Plecoglossus altivelis
Zhongyu Chang, Ao Chen, Shuo Liang, Chenling Ma, Tao Zhou, Yunfeng Zhao, Li Jiang

TL;DR
This study identifies genetic markers and candidate genes for growth in Plecoglossus altivelis, a valuable fish species, to support more effective breeding programs.
Contribution
The study integrates single- and multi-trait GWAS methods to identify stable genetic loci and candidate genes for growth traits in Plecoglossus altivelis.
Findings
Twenty-nine SNPs significantly associated with seven traits were identified through combined GWAS analyses.
Five candidate genes (LOC131530706, LOC134022516, abat, slc25a12, dnah10) were found to be stable across both single- and multi-trait GWAS methods.
The study provides molecular markers and genetic resources for improving growth traits in Plecoglossus altivelis.
Abstract
This study aimed to identify the key genes controlling growth in the economically important fish, Plecoglossus altivelis, to enable faster genetic improvement through breeding. Ayu is an anadromous teleost fish—a group characterized by bony skeletons, which includes the vast majority of farmed fish species. It is prized for its delicate texture and unique melon-like aroma. We analyzed the DNA of 426 Plecoglossus altivelis to find variations called Single-Nucleotide Polymorphisms (SNPs), which are single-letter changes in the genetic code that serve as markers for traits. Using a Genome-Wide Association Study (GWAS) approach, we scanned these SNPs to find those linked to growth. For robust results, we used two complementary software tools: GCTA (version 1.94.1), which effectively accounts for family relatedness among individuals, and GEMMA (version 0.98), which excels at analyzing…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13- —Innovation Team Program of “Research and Application of Aquatic Biological Genetic Big Data”
- —Chinese Academy of Fishery Sciences (CAFS)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic and phenotypic traits in livestock · Genetic Associations and Epidemiology · Genetic Mapping and Diversity in Plants and Animals
1. Introduction
Plecoglossus altivelis belongs to the order Osmeriformes, family Plecoglossidae, and genus Plecoglossus [1] and is widely distributed in East Asia, especially in Japan, Korea, and China [2,3,4]. It exhibits a distinctive morphology characterized by an elongated, laterally compressed body, a hook-like downward-curving snout, a large mouth, and paired anterior protrusions on the lower jaw forming a concave structure [5]. Ayu is of extremely high economic value in Japan. Aquaculture production of ayu was approximately 5000 tons in 2017, the second-largest inland aquaculture production in Japan [6]. The average annual production of ayu in China stabilized at around 60,000 tons between 2019 and 2023. The Northeast region accounts for about 45% of the market share, followed by North China with about 30%, and South China ranks third with 18% market share. Together, these three areas form the heart of China’s ayu industry. In recent years, the market for ayu has seen significant growth due to the growing consumer demand for healthy food and the improvement of people’s consumption habits of high-quality aquatic products [7]. Therefore, ayu aquaculture has emerged as a pivotal growth driver in the fisheries industry of China. With the rapid development of genome technology, the genome of Plecoglossus altivelis has been decoded. However, the assembly remains rough. At present, research on the Plecoglossus altivelis genome is only at the scaffold level (i.e., the scaffold structure for gene expression regulation), and no scholars or research teams have conducted analysis and assembly of the ayu chromosome structure. A series of systematic efforts are still required to thoroughly refine it to the chromosome level. The ayu genome is relatively small, comprising approximately 420 Mb distributed across 28 chromosomes (n = 28). Moreover, a y-linked receptor gene was mapped in ayu for its sex-determination [8]. Comparison of whole-genome resequencing mapping coverage between males and females identified male-specific regions in sex-linked scaffolds. A duplicate copy of the anti-Mullerian hormone type-II receptor gene (amhr2bY) was found within these male-specific regions [8], distinct from the autosomal copy of amhr2. These findings provide a basis for studying the sex determination mechanism of ayu.
A genome-wide association study (GWAS) [9] is a high-throughput genomic approach that identifies genetic variants associated with target traits by analyzing dense genotyping data from large cohorts. It enables genome-scale screening for genetic polymorphisms linked to diseases or complex traits within specific populations [10], leveraging the principle of linkage disequilibrium (LD), where adjacent alleles on chromosomes are co-inherited non-randomly. By detecting single-nucleotide polymorphisms (SNPs) [11], GWAS infers trait-associated loci through LD patterns [12].
As a genome-level analytical framework, GWAS facilitates the discovery of causal genetic variants underlying phenotypic traits. Integrated with molecular marker-assisted breeding [13], GWAS holds transformative potential for aquaculture. Significant advancements have been achieved in fish species such as rainbow trout (Oncorhynchus mykiss) [14], yellow croaker (Nibea albiflora) [15], and large yellow croaker (Larimichthys crocea) [16]. For instance, Tai et al. [17] identified key candidate loci and genes (e.g., igf1, gh) associated with growth traits (body weight, body length, total length, and body height) in rainbow trout via GWAS. Similarly, studies on yellow croaker [18] revealed critical SNPs and genes (e.g., mstn, gdf8) linked to growth regulation. Cui et al. [19] conducted a GWAS on yellowtail amberjack (Seriola lalandi), pinpointing growth-related SNPs and candidate genes. Ali et al. [20] reported analogous findings in rainbow trout, while Wang et al. [21] identified growth-associated genetic markers in tiger pufferfish (Takifugu rubripes), providing valuable insights for selective breeding. Beyond growth traits, GWAS has been widely applied to investigate disease resistance and stress tolerance traits in fish.
2. Materials and Methods
2.1. Experimental Population and Phenotypic Measurements
In this study, 426 Plecoglossus altivelis individuals were collected from a single, closed breeding population at the aquaculture farm of Liaoning Plecoglossus altivelis Fisheries Co., Ltd. in Dandong, Northeast China. To ensure genetic consistency, all samples originated from the same broodstock population with a shared breeding history and management protocol. To minimize the influence of close kinship, which could confound genetic association analyses, individuals were randomly selected based on available pedigree records to avoid sampling full-sib or half-sib family groups. Phenotypic characterization was performed on all individuals. The sampled population consisted of 5-month-old fish with an average body weight of 21.1 g and an average body length of 11.4 cm. Six key growth-related traits were precisely measured using standardized protocols:
Body Weight (BW): Measured using an electronic balance with a precision of 0.01 g after wiping the surface moisture of the fish body with absorbent paper.
Total Length (TL): The straight-line distance from the most anterior tip of the snout to the distal end of the caudal fin, measured with a digital caliper with a precision of 0.01 mm.
Body Length (BL): The straight-line distance from the most anterior tip of the snout to the posterior edge of the caudal peduncle, measured with a digital caliper with a precision of 0.01 mm.
Body Height (BH): The maximum vertical distance from the dorsal contour to the ventral contour of the fish body, measured at the position of the first dorsal fin ray using a digital caliper with a precision of 0.01 mm.
Eye Diameter (ED): Defined as the horizontal cross-sectional diameter of the eyeball, referring to the straight-line distance between the left and right edges of the eyeball in the horizontal direction, measured with a digital caliper with a precision of 0.01 mm.
Gonad Weight (GW): The weight of the dissected gonad tissue, measured using an electronic balance with a precision of 0.01 g after rinsing with sterile phosphate-buffered saline (PBS) and blotting surface moisture.
Sex: Determined by visual inspection combined with histological observation of gonad tissue; individuals were categorized into male, female, and undifferentiated (if applicable).
Concurrently with phenotypic data recording, a portion of the caudal fin tissue from each fish was excised using sterile scissors and immediately preserved in 2.0 mL sterile EP tubes prefilled with absolute ethanol. All samples were stored at −20 °C for subsequent DNA extraction.
Euthanasia and Tissue Sampling: Prior to tissue sampling, all fish were euthanized by immersion in a buffered tricaine methanesulfonate (MS-222, 150 mg/L) solution to ensure unconsciousness and cessation of opercular movement, in accordance with established animal welfare guidelines. Following confirmation of death, a portion of the caudal fin tissue was excised using sterile scissors for DNA extraction.
2.2. DNA Extraction, Sequencing, and Genotype Data Acquisition
DNA was extracted from the collected caudal fin tissues using the phenol–chloroform method. The quality of extracted DNA was assessed via 1% agarose gel electrophoresis, and concentrations were adjusted to 2.5 ng/μL prior to sequencing by BGI Wuhan (Wuhan, China).
2.3. Sequencing Methods
Sequencing was performed on the DNBSEQ platform of BGI Wuhan Co., Ltd., which included library construction and sequencing steps. The specific procedures are as follows:
- DNA Sample Detection
The concentration of DNA samples was measured using a fluorometer, and the integrity of DNA samples was examined via 1% agarose gel electrophoresis. Only samples that passed the detection were used for library preparation.
2.DNA Sample Fragmentation
DNA samples were fragmented by ultrasonication, and short DNA fragments meeting the length requirements were obtained by adjusting the fragmentation parameters.
3.Fragment Size Selection
The fragmented samples were subjected to fragment selection using magnetic beads to concentrate the sample bands at approximately 300–400 bp. The amount of purified DNA samples was quantified using a fluorometer.
4.End Repair, A-Tailing, and Adapter Ligation
A reaction system was prepared and incubated at an appropriate temperature for a specific duration to repair the ends of double-stranded DNA and add an adenine (A) base to the 3′ ends. An adapter ligation reaction system was then prepared and incubated at an appropriate temperature for a specific duration to ligate adapters to the DNA fragments.
5.PCR Amplification and Product Recovery
A PCR reaction system was prepared, and the reaction program was set up to amplify the ligation products. The amplified products were subjected to fragment selection using magnetic beads, and the concentration and fragment size of the PCR products were detected.
6.PCR Product Circularization
The PCR products were denatured into single strands, after which a circularization reaction system was prepared, thoroughly mixed, and incubated at an appropriate temperature for a specific duration to obtain single-stranded circular products. After digesting the uncircularized linear DNA molecules, the final library was obtained.
7.Library Detection
The concentration of the library was determined.
8.Sequencing on the Instrument
Single-stranded circular DNA molecules were amplified via rolling circle replication to form DNA nanoballs (DNBs) containing more than 300 copies. The obtained DNBs were loaded into the mesh pores on the chip using high-density DNA nanochip technology, and sequencing was performed via the Combinatorial Probe-Anchor Synthesis (CPAS) technology.
9.Data Generation and Quality Assessment
Sequencing was performed on the BGI DNBSEQ platform (Wuhan, China) using a short-fragment library construction protocol. Paired-end sequencing was conducted with a read length of PE150. The clean FASTQ data adhere to the Phred+33 quality scoring system, with Q20 scores exceeding 98% for all samples. Each sample contains more than 120,000,000 Clean Reads, equivalent to over 36 billion clean bases.
2.4. Genotype Data Acquisition
Raw sequencing data were processed for genotyping using GATK (v4.1.8.0) software. The HaplotypeCaller tool was employed to generate single-sample gVCF files, followed by joint genotyping performed with the GenotypeGVCFs tool. Post-genotyping, stringent quality control filters were applied to exclude SNP loci failing analytical criteria, thereby minimizing false-positive outcomes.
Data refinement steps included:
- (a)We performed quality control and refinement of the raw genotype data using the following multi-step pipeline:
- (b)Raw Read Filtering: Raw sequencing reads were filtered using SOAPnuke with parameters: -n0.01-20-90.5—adaMR 0.25 -polyX50 —minReadLen 150.
- (c)Variant Calling and Merging: Single-sample variant calling was performed using GATK (v4.1.8.0) HaplotypeCaller with basic quality filters applied (Genotype Quality ≥ 20, Mapping Quality ≥ 40). The gVCF files from all samples were subsequently merged using BCFtools (v1.22).
- (d)Depth- and Frequency-based Site Filtering: The merged variant set was filtered using BCFtools: (a) retaining only SNPs; (b) removing sites with a read depth (DP) < 10 (—exclude ‘INFO/DP < 10′); (c) removing sites with a minor allele frequency (MAF) < 0.01 (—min-af 0.01).
- (e)Genotype Imputation: The remaining missing genotypes in the filtered dataset were imputed using BEAGLE v4.1.
- (f)Comprehensive QC and Format Conversion: The imputed data were converted to PLINK format and subjected to stringent filtering:
- (g)Individual-level: Samples with a genotype missing rate > 0.05 were removed (—mind 0.05).
- (h)Variant-level: The following filters were applied sequentially: (a) variants with a missing rate > 0.05 (—geno 0.05); (b) variants with MAF < 0.01 (—maf 0.01); (c) variants showing significant deviation from Hardy–Weinberg equilibrium in the control group (—hwe 1 × 10^−6^).
- (i)Linkage Disequilibrium Pruning: To obtain a set of independent variants, linkage disequilibrium pruning was performed using PLINK (parameters: —indep-pairwise 50 5 0.2). The resulting high-quality genotype dataset was used for downstream association analyses.
The initial sample pool contained 426 individuals and 1,460,282 loci. Following DNA sequencing performed by the BGI Group, low-quality individuals (individuals with poor DNA quality or excessive missing genotype data) were excluded, and subsequent filtration with PLINK resulted in the retention of 555,242 high-quality single-nucleotide polymorphism (SNP) loci and 171 individuals suitable for subsequent research, which were utilized for the subsequent genome-wide association study (GWAS).
2.5. Population Genetic Analysis
Genetic diversity and population structure were assessed using genome-wide SNP data. Principal component analysis (PCA) was performed using PLINK v1.9 with the —pca option after linkage disequilibrium pruning (—indep-pairwise 50 10 0.2). Genetic diversity indices, including observed heterozygosity (Ho) and the inbreeding coefficient (F), were calculated using PLINK’s —het function. Observed heterozygosity was calculated as Ho = (N.NM − O.HOM)/N.NM, where N.NM is the number of non-missing genotypes and O.HOM is the observed number of homozygotes.
2.6. Genome-Wide Association Analysis
First, a genomic relationship matrix (GRM) based on SNP-derived genetic similarity between individuals was constructed using the genomic relationship matrix (GRM) approach [22]. This matrix enabled direct estimation of additive genetic variance for each trait from genome-wide SNP data, followed by calculation of SNP-based heritability for the seven traits. Single-trait genome-wide association study (GWAS) analysis was then performed using GCTA software.
Subsequently, single-trait GWAS was independently conducted using GEMMA software. Based on phenotypic correlation coefficients and inter-trait heritability estimates, six growth traits were grouped for multi-trait joint analysis via GEMMA. By integrating results from both software tools (GCTA and GEMMA), complementary results were obtained, enhancing the reliability of identifying candidate genes associated with these economically important traits through combined single and multi-trait approaches.
For GWAS result visualization, Manhattan plots [23] and Q-Q plots [24] were generated using R 4.4.0. Significant loci were filtered using R 4.4.0. Significant loci were filtered based on the threshold of p ≤ 5 × 10^−8^. The candidate gene screening and positional mapping were conducted. Candidate genes were mapped within 100 kb windows (50 kb upstream and downstream) flanking each significant SNP [25].
2.7. Candidate Gene Identification and Functional Annotation
Based on the ayu (Plecoglossus altivelis) reference genome (Pal_1.0) provided by the Fish Aquaculture Laboratory, Department of Marine Biosciences, Tokyo University of Marine Science and Technology, and available on NCBI, 100 kb genomic regions (50 kb upstream and downstream of each significant locus) were extracted. These regions were subjected to BLAST sequence alignment on NCBI to identify potential candidate genes [26]. Functional annotation of the identified genes was then performed, supported by literature review, to further prioritize biologically relevant candidate genes.
3. Results
3.1. Genetic Correlations Among Pairwise Growth Traits
As shown in Table 1, body weight (BW) exhibited extremely strong genetic correlations (genetic correlation, rg > 0.9) with total length (TL; rg = 0.962), body length (BL; rg = 0.974), and body height (BH; rg = 0.950). Total length (TL) also demonstrated highly coordinated genetic relationships with body length (BL; rg = 0.952) and body height (BH; rg = 0.866).
Gonad weight (GW) showed strong genetic correlations with body length (BL; rg = 0.943) and body weight (BW; rg = 0.857) but a weaker correlation with eye diameter (ED; rg = 0.498). This indicates that gonadal development may integrate both growth-related genes and reproduction-specific regulatory mechanisms, necessitating a balanced approach to optimize growth and reproductive traits in breeding programs.
In contrast, eye diameter (ED) displayed generally moderate genetic correlations with other traits, such as BW (rg = 0.448), and GW (rg = 0.498), and a high genetics correlation with TL (rg = 0.731). These results suggest that ED may be governed by distinct genetic mechanisms, warranting separate optimization strategies or treatment as a secondary trait in selective breeding. Based on the heritability estimates, a heritability heat map was generated to visualize trait-specific genetic architecture (Figure 1). The genetic correlations (Figure 1) showed very high positive genetic correlations between BW and TL, BL, BH, and GW but not with ED.
3.2. Heritability Analysis of Various Growth Traits
To clarify the genetic regulatory characteristics of the growth-economic and biological traits of ayu (Plecoglossus altivelis), this study estimated the heritability of seven traits (including body weight, total length, body length, body depth, interorbital distance, sex, and gonad weight) using R (version 4.4.0). The results, as presented in Table 2, revealed significant differences in heritability among the traits. The heritability in descending order were: body weight (0.432, SE = 0.03), sex (0.353, SE = 0.03), gonad weight (0.338, SE = 0.03), body depth (0.274, SE = 0.03), interorbital distance (0.271, SE = 0.03), total length (0.270, SE = 0.03), and body length (0.269, SE = 0.03). In addition to the visual comparison of the heritability of each trait and their 95% confidence intervals, this study also summarized the individual trait data of ayu. The mean values and standard deviations of each trait are also shown in Table 2, where 1 represents males and 0 represents females. A mean value of 0.53 indicated that the sex ratio was approximately 1:1 (mean = 0.53, coded as 1 for male, 0 for female), indicating a balanced sample.
The 95% confidence intervals of the heritability estimates are shown in Figure 2. Among these traits, body weight, as a key growth trait, exhibited high heritability (h^2^ ≥ 0.4), indicating that it is dominated by genetic factors and serves as a priority selection index for improving the growth performance of ayu (Plecoglossus altivelis). Directed selection can rapidly enhance the body weight phenotype of the cultured population. Sex and gonad weight showed moderate heritability (0.3 < h^2^ < 0.4). suggesting great potential for the genetic improvement of these two reproduction-related traits. Molecular marker-assisted selection can be integrated to optimize the sex ratio and fecundity of ayu. Total length, body length, body depth, and interorbital distance are also displayed moderate heritability (0.2 ≤ h^2^ ≤ 0.3), implying that their phenotypes are jointly regulated by genetic and environmental factors. Therefore, the selection program should be accompanied by optimization of the rearing environment (e.g., water temperature and feed formulation) to minimize environmental interference. The significance of these heritability results lies in not only quantifying the degree of genetic controllability of each trait and providing core parameters for formulating the genetic breeding program of ayu—for example, prioritizing body weight for selection with high response efficiency and adopting a synergistic strategy of “genetic selection + environmental regulation” for traits with moderate heritability—but also laying a foundation for the subsequent application of technologies such as marker-assisted breeding and genomic selection. This will help shorten the breeding cycle of ayu, improve the accuracy of selection, and ultimately promote the high-quality and efficient development of the ayu breeding industry.
3.3. Population Genetic Structure and Diversity
To characterize the genetic background of the analyzed samples, we performed principal component analysis (PCA) and estimated genetic diversity indices based on genome-wide SNP data from all 209 individuals.
Principal component analysis revealed the genetic relationships among samples. The first five principal components explained 6.72%, 6.43%, 6.34%, 6.08%, and 5.92% of the total genetic variance, respectively, cumulatively accounting for 31.49% of the genetic variation (Table 3). The first two principal components together explained 13.15% of the variance. The relatively low proportion of variance explained by individual PCs and the lack of clear clustering along these axes suggest complex genetic relationships without strong population stratification.
Genetic diversity across all samples was moderate, with an average observed heterozygosity (Ho) of 0.395 ± 0.035 (Table 4). The inbreeding coefficient (F) averaged −0.107 ± 0.096, indicating a consistent excess of heterozygotes relative to Hardy–Weinberg expectations.
3.4. Single-Trait GWAS Results in Ayu (GCTA)
Using GCTA software, genome-wide association analyses were performed for six traits—body weight (BW), total length (TL), body length (BL), body height (BH), eye diameter (ED), and gonad weight (GW)—with sex included as a covariate. An additional analysis was conducted for sex alone, yielding seven association files. Due to the absence of chromosome-level assembly for the ayu genome, Manhattan plots and Q-Q plots generated via R 4.4.0 were scaffold-based, with scaffold positions plotted along the x-axis (Figure 3a–g).
A GWAS was performed for these seven traits independently through the method of GCTA (Figure 3 and Table 5). In total, 1047 significant loci were identified across the six growth traits (p ≤ 1 × 10^−5^): seven loci for BW, one locus associated with TL, three loci associated with BL, and three loci associated with BH and ED. Finally, GW was associated with the largest number of loci (n = 1030). However, a surprisingly high number of SNPs (1452) were associated with phenotypic sex.
After removing redundant loci, 100 kb genomic regions (50 kb upstream and downstream of each significant locus) were subjected to BLAST sequence alignment on NCBI. This process identified potential candidate genes and eliminated duplicates, yielding eight non-redundant candidate genes (Table 5), including slc48a1a, filip1L, nedd9 and Crebbpa which participate in various cellular metabolic processes. The proportion of phenotypic variance explained (PVE) by each SNP ranged from 0.09 to 0.31, indicating their substantial contribution to the measured traits.
3.5. Single-Trait GWAS Results in Ayu (GEMMA)
Using GEMMA software, genome-wide association analyses were conducted for six traits—BW, TL, BL, BH, ED, GW—with sex incorporated as a covariate. An independent analysis was performed for sex, generating seven association files. Similar to the GCTA workflow, Manhattan plots and Q-Q plots were visualized using R 4.4.0, with scaffold positions plotted along the x-axis due to the absence of chromosome-level genome assembly (Figure 4a–g).
A genome-wide association study (GWAS) was conducted for these traits using GEMMA with a suggestive threshold set at p ≥ 1 × 10^−5^. In total, 671 significant SNPs were detected across the genome. Among these loci, seven were associated with BW, two with TL, six with BL, seven with BH, three with ED, and four with GW. Similarly, when analyzing the sex trait using GCTA, the highest number of significant SNPs—642 in total—were identified.
Following removal of redundant loci, 100 kb genomic regions (50 kb upstream and downstream of each locus) were analyzed via BLAST sequence alignment on NCBI. This process identified 16 non-redundant candidate genes (Table 6), including LOC134036737, slc25a12, myo5aa, LOC136948769, nsfl1c, dok1a, abat, and LOC137018788. The PVE ranged from 0.03 to 0.31, indicating substantial genetic contributions of these loci to the traits.
3.6. Multi-Trait GWAS Results in Ayu (GEMMA)
Using GEMMA, single-trait GWAS was performed for the six growth traits and for phenotypic sex. With a significance threshold of p ≤ 1 × 10^−5^, we detected 671 significant SNPs for the growth traits (7 for BW, 2 for TL, 6 for BL, 7 for BH, 3 for ED, and 4 for GW) and 642 SNPs for sex. After redundancy removal and BLAST-based annotation of 100 kb flanking regions, 16 distinct candidate genes were identified (Table 6), including LOC134036737, slc25a12, myo5aa, abat, and dnah10. The PVE of these SNPs ranged from 0.03 to 0.31, reflecting their strong genetic effects.
According to the Manhattan plot and the set significance criteria, a total of 37 significant loci were identified in five groups (Figure 5). Among them, 8, 7, 11, 5, and 6 loci were identified in five groups in turn. After removing duplicate loci, sequences spanning 100K (50K upstream and downstream of each significant locus) were subjected to BLAST gene sequence alignment on NCBI. Through this process, potential candidate genes were identified while removing duplicates, resulting in a total of 10 candidate genes listed in Table 7.
3.7. KEGG Pathway Analysis
KEGG pathway enrichment analysis was performed for all candidate genes using KOBAS. Only four genes were mapped to any pathway (Table 8). abat was involved in six metabolic pathways, including butanoate, β-alanine, propanoate, and glutamate metabolism. maml3 participated in the Notch signaling pathway and Th1/Th2 cell differentiation, while ccn1 was associated with eight pathways such as cell cycle, AMPK signaling, and viral infection. Both maml3 and ccn1 were jointly involved in human papillomavirus infection. nsfl1c was uniquely associated with protein processing in the endoplasmic reticulum. These results highlight the pleiotropic roles of the identified genes.
4. Discussion
Phenotypic analysis of Plecoglossus altivelis experiment revealed the genetic correlations among traits. Four traits—body weight (BW), total length (TL), body length (BL), and body height (BH)—exhibited a high degree of correlation, indicating that these indices collectively reflect individual size. This formed the basis for grouping these traits in our multi-trait genome-wide association study (GWAS) analysis. While the six growth traits could be combined into numerous groups, five representative combinations were selected based on correlation coefficients to ensure comprehensiveness while avoiding redundancy in candidate gene screening.
When comparing the single-trait GWAS results from GCTA and GEMMA with the multi-trait analysis results from GEMMA, significant complementarity was observed between the methods in terms of candidate gene detection power, effect estimation accuracy, and biological interpretability. Partial gene overlaps (e.g., LOC131530706, LOC117378376) were found in GCTA and GEMMA single-trait analyses, indicating that these loci exhibit robust association signals within the mixed linear model framework. For example, LOC131530706 showed highly significant p-values in both methods (GCTA: 3.84 × 10^−29^; GEMMA: 3.84 × 10^−29^) and is involved in GTP binding, suggesting it may play a central role in transmembrane transport or cell signaling. The discovery of such overlapping genes by two different statistical methods enhances result reliability, consistent with the theoretical advantage of mixed models in controlling population structure bias [27].
The population genetic analyses provide important context for interpreting our primary findings. The relatively low proportion of variance explained by individual principal components and the absence of clear population stratification reduce concerns about false-positive associations due to population structure in subsequent analyses. The moderate level of genetic diversity (Ho = 0.395) indicates sufficient variation for association studies, while the average inbreeding coefficient (F = −0.107) suggests either historical outcrossing or potential heterozygote advantage. These genetic characteristics should be considered when evaluating the robustness of association signals and selection signatures detected in this study.
More genes were detected by only one method: the GCTA-specific gene slc48a1a participates in heme transport, while the GEMMA-specific gene myo5aa regulates actin movement. These differences likely arise from algorithmic optimizations: GCTA’s heritability estimation based on the genomic relationship matrix (GRM) emphasizes global variance decomposition, whereas GEMMA’s sparse matrix accelerates local effect detection with higher sensitivity to low-frequency variants [28].
The multi-trait analysis by GEMMA further revealed shared genetic architectures across traits. For instance, slc25a12 was identified in both single- and multi-trait analyses, functioning in amino acid-ion coupled transport and potentially integrating multiple phenotypes through metabolic pathways. The multi-trait model also detected genes not covered by GCTA, such as maml3, which participates in the Notch signaling pathway, indicating that multi-trait can capture hub genes in cross-phenotype regulatory networks [29]. Notably, LOC134022516 lacked functional annotation in both single- and multi-trait analyses, possibly representing an understudied novel regulatory element that requires validation with chromatin interaction data (e.g., Hi-C) [30]. These results highlight the necessity of multi-trait integration in GWAS: GCTA excels in heritability partitioning and candidate gene prioritization, while GEMMA expands functional association dimensions through multi-trait modeling and efficient computation.
In terms of statistical power, GEMMA showed higher resolution in estimating phenotypic variance explained (PVE). For example, LOC117378376 had highly significant p-values in both methods, but its PVE was significantly higher in GEMMA. This discrepancy may stem from GEMMA’s fine-grained modeling of random effects, where its Bayesian framework (e.g., BSLMM) more accurately decomposes additive and non-additive genetic effects [31]. Additionally, maml3 in GEMMA multi-trait analysis had a low PVE of 0.0016 but showed significant pathway enrichment, indicating that low-PVE genes can still amplify phenotypic impacts through regulatory networks. In contrast, GCTA’s PVE estimates are more conservative, potentially underestimating the contribution of pleiotropic genes—a phenomenon widely discussed in complex trait analysis [32].
Biologically, both methods converged on three functional modules: transmembrane transport, cytoskeletal dynamics, and metabolic regulation. GCTA-detected slc48a1a (heme transport) and GEMMA-identified slc25a12 (amino acid transport) both belong to the solute carrier (SLC) family, confirming that transmembrane material exchange is a core mechanism for the target traits. Furthermore, the GEMMA-specific gene dnah10, which drives microtubule movement via ATP hydrolysis, and GCTA-detected filip1L (myosin binding) jointly regulate cell morphology and migration, potentially influencing tissue development or pathogen responses [33]. Notably, cica in multi-trait analysis, as an RNA polymerase II-dependent transcription factor, may integrate multiple traits through epigenetic regulation, aligning with recent hypotheses about “super-enhancers” regulating multi-gene clusters [34]. Future studies should combine CRISPR screening or single-cell sequencing to validate upstream–downstream regulatory relationships of these candidate genes and explore their molecular mechanisms of interaction with the environment.
5. Limitations
This study acknowledges several limitations. Primarily, the Plecoglossus altivelis genome has not yet reached the chromosomal level and remains at the scaffold level. This limitation may lead to a series of challenges, such as less precise genomic positioning, scattered signals in Manhattan plots, and potential biases in statistical models, which could increase false positive rates and affect the reliability of GWAS results to some extent.
Furthermore, due to the lack of a comprehensive gene annotation file for P. altivelis in public databases, candidate gene identification relied on extracting sequences from significant loci for BLAST alignment. This traditional approach may introduce a degree of subjectivity. Nonetheless, the identification of overlapping candidate genes (e.g., five genes identified by both GEMMA single- and multi-trait analyses) through complementary GWAS methods enhances confidence in our screening results.
6. Conclusions
Analysis of correlation coefficients among six growth traits in Plecoglossus altivelis showed that body weight (BW) was significantly positively correlated with total length (TL), body length (BL), and body height (BH), indicating these indices collectively influence body size, with BW tightly linked to length and height. TL and BL showed high consistency in assessing body length, while the correlation between BL and BH highlighted their importance in evaluating individual size. Gonad weight (GW) was strongly associated with BW but weakly linked to TL, BL, BH, and eye distance (ED), suggesting GW may be more closely related to reproductive traits or physiological status. ED showed weak associations with other indices, indicating insignificant links to body shape characteristics and potential relevance to survival adaptation or predatory behavior.
GCTA genetic correlation analysis revealed strong positive genetic correlations (rg > 0.9) between BW and TL, BL, and BH, with high synergy among TL, BL, and BH, suggesting these morphological traits are regulated by shared polygenic networks—selecting for BW could synchronously improve other growth-related traits. GW was strongly correlated with BL and BW but weakly with ED, indicating gonadal development integrates growth metabolism and reproduction-specific regulatory mechanisms, requiring balanced breeding goals. ED had low genetic correlations with most traits (e.g., BW, GW) and only moderate correlation with TL, showing relatively independent genetic regulation. Treating ED as a secondary trait for separate optimization could improve breeding efficiency, consistent with phenotypic analysis results.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Nishida M. A New Subspecies of the Ayu, Plecoglossus altivelis, (Plecoglossidae) from the Ryukyu Islands Jpn. J. Ichthyol.19883523624210.1007/BF 02938423 · doi ↗
- 2Iguchi K. Nishida M. Genetic biogeography among insular populations of the amphidromous fish Plecoglossus altivelis assessed from mitochondrial DNA analysis Conserv. Genet.2000114715610.1023/A:1026582922248 · doi ↗
- 3Iguchi K. Tanimura Y. Takeshima H. Nishida M. Genetic Variation and Geographic Population Structure of Amphidromous Ayu Plecoglossus altivelis as Examined by Mitochondrial DNA Sequencing Fish. Sci.199965636710.2331/fishsci.65.63 · doi ↗
- 4Mun S.J. Ryu J.-S. Lee M.-O. Son Y.S. Oh S.J. Cho H.-S. Son M.-Y. Kim D.-S. Kim S.J. Yoo H.J. Generation of expandable human pluripotent stem cell-derived hepatocyte-like liver organoids J. Hepatol.20197197098510.1016/j.jhep.2019.06.03031299272 · doi ↗ · pubmed ↗
- 5Nishida M. Geographic Variation in the Molecular, Morphological and Reproductive Characters of the Ayu Plecoglossus altivelis (plecoglossidae) in the Japan-Ryukyu Archipelago Jpn. J. Ichthyol.19863323224810.1007/bf 02904160 · doi ↗
- 6Sugahara K. Fujiwara-Nagata E. Eguchi M. Dynamics of the Bacterial Cold-water Disease Pathogen, Flavobacterium psychrophilum, in Infected Fish Organs and Rearing Water after Warmed Water Treatment Fish Pathol.201045586510.3147/jsfp.45.58 · doi ↗
- 7Jeong B.-Y. Jeong W.-G. Moon S.-K. Ohshima T. Preferential accumulation of fatty acids in the testis and ovary of cultured and wild sweet smelt Plecoglossus altivelis Comp. Biochem. Physiol. Part B 200213125125910.1016/S 1096-4959(01)00501-211818246 · doi ↗ · pubmed ↗
- 8Nakamoto M. Sakamoto T. Improvement of the Ayu (Plecoglossus altivelis) draft genome using Hi-C sequencing BMC Res. Notes 2023169210.1186/s 13104-023-06362-737254197 PMC 10230677 · doi ↗ · pubmed ↗
