Genetic Diversity and Population Structure of Farmed Longfin Batfish (Platax teira) in the South China Sea
Yayang Gao, Baosuo Liu, Huayang Guo, Kecheng Zhu, Lin Xian, Nan Zhang, Tengfei Zhu, Dianchang Zhang

TL;DR
This study examines the genetic diversity and structure of farmed longfin batfish in China, finding low diversity and shared origins among some populations.
Contribution
This is the first analysis of genetic diversity and population structure in farmed longfin batfish (Platax teira) in China.
Findings
Low genetic diversity was observed in all four farming populations of P. teira.
The NA population showed significant inbreeding based on ROH analysis.
Three populations (NA, XC, ZP) likely share a common origin of fry stocks.
Abstract
Background: Longfin batfish (Platax teira) is an important economic species in southern China. In recent years, its wild population has significantly declined due to overfishing. Around 2015, breakthroughs in the artificial large-scale seedling technology for P. teira have promoted the growth of its aquaculture scale in regions such as Hainan and Guangdong. Methods: To study the genetic diversity, inbreeding status, and population structure of the current P. teira farming populations in China, we performed whole-genome resequencing technology and high-density SNP markers to analyze the genetics of four main farming populations. A total of 109 individuals from four populations (NA, ZP, XL, and XC) were sequenced, identifying 5,384,029 high-quality SNPs. Results: The results showed that the nucleotide diversity (π) of each population ranged from 0.00155 to 0.00165 and observed…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —Hainan Province Science and Technology Special Fund
- —the Central Public-Interest Scientific Institution Basal Research Fund, CAFS
- —the Seed Industry Revitalization Project of Special Fund for Rural Revitalization Strategy in Guangdong Province
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic diversity and population structure · Genetic and phenotypic traits in livestock · Identification and Quantification in Food
1. Introduction
P. teira (Forsskål, 1775), commonly known as the longfin batfish, belongs to the class Osteichthyes, order Perciformes, family Ephippidae, and genus P. teira. It is widely distributed across the Indo-Pacific region, including the Arabian Sea, Indonesia, and the Sea of Japan, and it is commonly found in the South China Sea and coastal waters of Taiwan [1]. In recent years, P. teira has gained increasing attention in China’s culture industry, particularly in Guangdong and Hainan Provinces, owing to its rapid growth rate, high ornamental and nutritional value, and excellent environmental adaptability. The successful establishment of P. teira as a new aquaculture candidate aligns with the strategic need of the Chinese aquaculture industry to diversify farmed species, enhance economic resilience, and promote sustainable mariculture.
Despite its growing importance in aquaculture, genetic studies on P. teira remain limited. To date, research has focused mainly on resource distribution [2], embryonic and larval development [3,4,5], muscle nutritional composition [6], and aquaculture techniques [7], while no studies have yet investigated its nuclear genome-level genetic diversity or population structure [8].
Genetic variation is fundamental to biological evolution. However, the expansion of aquaculture often involves repeated use of limited broodstock and frequent seed translocation, which may lead to inbreeding and a loss of genetic variation [9,10]. Following the onset of inbreeding, a population’s genetic variation can diminish considerably across a few generations, and restoring this lost diversity often presents a significant challenge [11]; this can negatively affect disease resistance and growth performance.
Single-nucleotide polymorphisms (SNPs) have emerged as the molecular marker of choice in population genetics studies due to their high genetic stability and lower genotyping error rates [12]. The application of single-nucleotide polymorphisms enables highly accurate and resource-effective investigations, which are fundamentally important for conducting advanced studies in genome-scale population evolution [13]. SNPs enable efficient and high-resolution analyses of genetic variation, population structure, and inbreeding across the genome. Numerous studies have successfully applied these markers to marine fish. For instance, in European sea bass (Dicentrarchus labrax), researchers analyzed the genetic structure of wild and cultured populations using SNP markers, revealing the genetic impact of aquaculture practices and providing valuable data for broodstock management [14]. Similarly, studies in groupers (Epinephelus coioides) identified genetic bottlenecks and declines in diversity in hatchery stocks, emphasizing the need for genetic monitoring [15]. Whole-genome resequencing (WGS) has emerged as a powerful tool to address these challenges, enabling high-resolution assessment of genetic diversity, inbreeding, and population structure in aquaculture species. In Nile tilapia (Oreochromis niloticus), WGS has revealed signatures of domestication selection and genetic erosion in farmed populations compared to their wild counterparts [16]. Similarly, in rainbow trout (Oncorhynchus mykiss), the use of WGS provided novel insights into the genetic diversity and structure of Sweden’s three main farmed rainbow trout populations [17]. However, a critical knowledge gap remains regarding the genome-wide genetic diversity and population structure of farmed P. teira populations. To date, no study has applied whole-genome resequencing to assess the genetic status of its cultured stocks.
In this research, we conducted whole-genome resequencing on 109 individual P. teira collected from four separate cultured populations across southern China. Our objectives were to (i) assess the genetic diversity within each population, (ii) evaluate genomic inbreeding through runs of homozygosity (ROH) analysis, and (iii) evaluate genetic differentiation and population structure among populations. Furthermore, this study serves as a valuable reference for genome-assisted breeding strategies in P. teira and related species.
2. Materials and Methods
2.1. Sampling and Data Collection
In this study, a total of 109 longfin batfish were collected from four distinct cultured groups on Hainan Island and Guangdong Province, China (Table 1, Figure 1). Each population was named according to its collection site, with all four originating from principal breeding farms located in the South China Sea, including NA (Nanao, Shenzhen, China), ZP (Zhapo, Yangjiang, China), XL (Xilian, Zhanjiang, China), and XC (Xincun, Lingshui, China). Fins were collected from each population in a random sampling procedure. These fins were stored in 75% alcohol at −20 °C for DNA extraction, with each group’s fins being stored separately. DNA was extracted using TIANamp Marine Animals DNA Kit (Tiangen, Beijing, China) following the manufacturer’s instructions [18]. DNA quality was determined via agarose gel electrophoresis and an Agilent 4200 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Sequencing of PE libraries (2 × 150 bp) was performed on the MGI-2000/MGI-T7 platform (The Beijing Genomics Institute, Qingdao, China). All experiments in this study were approved by the Animal Care and Use Committee of the South China Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences (no. SCSFRI96-253), and performed according to the regulations and guidelines established by this committee.
2.2. Whole-Genome Resequencing and Variant Calling
Adaptor sequences and low-quality bases were filtered out from the raw reads using Fastp (v0.20.0, -n 10 -q 20 -u 40). The clean reads were mapped to the P. teira reference genome [19] using BWA v0.7.15 [20]. Mapping results were then converted into BAM format and sorted with SAMtools v1.3.1 [21]. The SNPs were filtered using the following criteria: (i) SNPs with QD < 2.0, FS > 60.0, MQ < 40.0, MQRankSum < −12.5, and ReadPosRankSum < −8.0; (ii) variant missing rate less than 0.1 and a per-genotype depth filter of >4 to exclude low-confidence genotype calls; (iii) minor allele frequency (MAF) > 0.01; and (iv) linkage disequilibrium pruning, performed using PLINK v1.9 [22]. Using a sliding window of 100 kb with a 10 kb step, we annotated and predicted the functional effects of the filtered SNPs based on the genome annotation file with snpEff v5.0 [23].
2.3. Genetic Diversity and Linkage Disequilibrium
Prior to analysis, each genotype with markers was assembled head to tail and missing sites were replaced by “-”. The parameters of genetic diversity were analyzed by VCFTOOLS v0.1.16 [24], including the expected heterozygosity (He), polymorphism information content (PIC), observed heterozygosity (Ho), nucleotide diversity (π), and Hardy–Weinberg equilibrium p-value (HW-P) (p < 0.05 indicates Hardy–Weinberg disequilibrium). We compared linkage disequilibrium (LD) patterns among different groups and LD decay was measured by calculating correlation coefficients (r^2^) for all pairs of SNPs within 500 kb using PopLDdecay v3.41 [25].
2.4. Genome-Wide Detection of ROH and Estimation of Inbreeding Coefficients
The PLINK v1.9 software command –homozyg was used for the detection of ROH. The parameters for ROH identification were set as follows, based on recommended practices in the literature [26]: –homozyg-kb 100 –homozyg-window-missing 5 –homozyg-window-threshold 0.05 –homozyg-window-het 3 –homozyg-window-snp 50 –homozyg-snp 50 –homozyg-density 50 –homozyg-gap 1000. The package R CMplot [27] was employed to generate visual representations of the results. The identified ROHs were classified into five distinct size categories as follows: 0.1 to 0.3 Mb, 0.3 to 0.6 Mb, 0.6 to 1 Mb, 1 to 2 Mb, and >2 Mb [28]. The total number of ROH for each animal, the length of ROH per population per chromosome, and the number of ROH per length category were calculated.
We employed the two following approaches to estimate inbreeding coefficients: F_ROH_, derived from runs of homozygosity, and F_HOM_, calculated from the comparison of observed versus expected homozygosity. F_ROH_ estimates were obtained using the R package detect RUNS v0.9.6 [29]. F_ROH_ was defined based on the proportion of the total length of the genome that is within ROH, using the total length of the P. teira reference genome (697.98 Mb) [30]. The individual inbreeding coefficient F_HOM_ was estimated using PLINK v1.9 with the –het option (plink –bfile dataset –het –out output_prefix).
2.5. Population Structure Analysis
The genetic differentiation coefficient (FST) between geographical groups was calculated using GENEPOP v.4.5 [31]. The Reynolds’ genetic distance (DR) between geographical groups was estimated by FST value according to DR = −ln (1 − FST). The neighbor-joining (NJ) tree was constructed and visualized using Mega11 software and an identity-by-state (IBS) kinship matrix, and the phylogenetic tree was further refined using the online tool iTOL (https://itol.embl.de/). Principal component analysis (PCA) based on genome-wide SNPs was conducted with PLINK, while population structure was further examined using ADMIXTURE v1.3.0 [32]. The likelihood of ancestral kinship (K) from 2 to 6 was tested using all SNPs. The IBS matrix was calculated by VCF2PCACluster V1.40 [33]. Ten different seeds were selected for 10 repeated analyses, and pong [34] was used to cluster the results 10 times according to the cross-validation error to determine the optimal K value.
3. Results
3.1. Overview of Whole-Genome Resequencing and Variant Detection
Whole-genome resequencing was conducted on 109 fish, generating a total of 10.68 billion reads. Of these, 99.2% successfully aligned to the reference genome using BWA, achieving an average sequencing depth of 19.5× (ranging from 13.0× to 29.3×), providing sufficient coverage for reliable downstream analyses. Following stringent quality filtering, we identified 5,384,029 SNPs across the 109 samples, with the majority residing in intronic (40.66%) and intergenic (30.74%) regions (Table 2). An examination of SNP distribution across the P. teira genome revealed that individual chromosomes contained between 144,120 SNPs (chromosome 5) and 275,445 SNPs (chromosome 11) (Figure 2).
3.2. Genetic Diversity and Linkage Disequilibrium Analysis
We calculated the genetic diversity parameters for four P. teira populations based on the selected SNP loci. As shown in Table 3, Hardy–Weinberg equilibrium values (HW-P), observed heterozygosity (Ho), expected heterozygosity (He), polymorphism information content (PIC), and nucleotide diversity (π) were determined. Hardy–Weinberg equilibrium testing indicated that all farmed populations had HW-P values above 0.05, suggesting that genetic equilibrium was largely preserved across the P. teira populations. The expected heterozygosity (He) values varied from 0.242 in ZP to 0.257 in NA, and the observed heterozygosity (Ho) ranged from 0.253 (ZP) to 0.282 (XL). (PIC) measures the level of polymorphism at each SNP site, ranging from 0.160 (XL) to 0.172 (ZP). (π) was consistently low across populations (0.00155–0.00165), with the highest value observed in XC (π = 0.00165) and the lowest in XL (π = 0.00155). As shown in (Figure 3), all populations exhibited some level of linkage disequilibrium, with relatively similar decay rates. The average pairwise correlation coefficient (r^2^) was higher in XL than in other reference populations.
3.3. Genomic Distribution of ROH and Inbreeding Coefficients
As shown in Table 4, NA exhibited the highest total number of ROH (3522), significantly more than ZP (1900), XC (2000), and XL (1264). All populations showed the highest proportion of short ROH (0.1–0.3 Mb; 68.3–84.3%), with XL displaying the greatest predominance of short ROH (84.3%). Notably, NA had a significantly higher proportion of long ROH segments (>0.6 Mb; 14.1%) compared to XC (6.1%) and XL (5.6%). Particularly striking was the >2 Mb ROH fraction in population NA (1.3%), while XC showed only 0.2%. These results suggest that population NA may have experienced more recent inbreeding events. The sum number of the ROH of segments and the total length of ROH in Mb per fish are shown in Figure 4. Most individuals in NA had the highest number and longest length of ROH. In contrast, the lowest number of ROH and shortest length of ROH were observed in individuals of XL.
F_ROH_ values varied from 0.0172 to 0.0594 (Table 3), with the NA population showing the highest inbreeding levels and the XL population exhibiting the lowest. Consistent with these results, F_HOM_ values followed a similar pattern across the analyzed populations (Figure 5).
3.4. Population Structure
To evaluate the degree of genetic difference between populations, fixation index (FST) values and genetic distances were estimated (Table 5). The highest divergence was observed between XL and NA (FST = 0.065) and the lowest between ZP and XC (FST = 0.021). PCA results (Figure 6a) show that ZP, XC, and NA were divided into three clusters with overlap. The XL distribution appears independent. In the phylogenetic tree (Figure 6b), individuals from the XC, ZP, and NA populations were extensively intermingled, forming a large cluster with poor bootstrap support at internal nodes. This pattern, indicative of high genetic similarity and recent shared ancestry, aligns with our field surveys confirming that ZP and NA initially sourced their broodstock from the XC population. Conversely, the XL population formed a distinct, well-supported monophyletic clade, confirming its independent genetic lineage. This genetic divergence was corroborated by the PCA, where XL formed a separate cluster along PC1. The Delta K results indicated that the optimal number of genetic clusters representing the most similar ancestral populations was at K = 4. The cross-validation error (CV error) reached its minimum at this value (Figure 6c), aligning with the phylogenetic tree analysis results. Moreover, the genetic structure inferred (Figure 6d) closely corresponded with the phylogenetic relationships and showed strong concordance with the clustering patterns observed in the PCA plot.
4. Discussion
The South China Sea represents the main aquaculture region for P. teira. To investigate the genetic diversity, population structure, and inbreeding status of the major cultured populations in Guangdong and Hainan Provinces, this study performed whole-genome resequencing on four populations (NA, ZP, XL, XC) from southern China. By utilizing high-density SNP markers, we provide a comprehensive assessment of the species’ genetic status.
All four populations exhibited low levels of genetic diversity, as indicated by the observed heterozygosity (Ho: 0.253–0.282) and nucleotide diversity (π: 0.00155–0.00165); π values below 0.005 reflect low nucleotide diversity. Similarly, the polymorphism information content (PIC) ranged from 0.160 to 0.172 across the populations. As values below 0.25 are indicative of low polymorphism, these results collectively confirm reduced genetic variation in all four cultured populations. These results indicate that the farmed populations in this study may have been degraded to varying degrees, and effective supplementation of farmed populations are urgently needed. Low genetic diversity has also been observed in other cultured species, such as the Coilia nasus [35]. In all populations, Ho was higher than He (Ho > He). This phenomenon has also been observed in other farmed marine fish species, such as sea bream (Sparus aurata) and European seabass (D. labrax) [36]. In a finite population, it is expected that there will be random differences between the allele frequencies between both sexes, and this generates an excess of Ho with respect to those expected with He [37]. The relatively slow decay of LD in these farmed populations was most likely caused by inbreeding within each strain, although population structure may also have contributed to this. However, the potential cause of the high linkage disequilibrium observed in XL and NA might be attributed to the lower coverage depth.
The number and length of runs of homozygosity (ROH) were markedly higher in NA compared to the other populations, and the F_ROH_ values were also highest in NA (up to 0.0594), indicating strong inbreeding. ROH are reliable indicators of both historical and recent inbreeding [38]. The presence of long ROH (>2 Mb) in NA suggests recent inbreeding events, likely resulting from repeated use of close relatives or a small breeding nucleus. High levels of inbreeding and ROH accumulation have been reported in other aquaculture species, such as westslope cutthroat trout (Oncorhynchus lewisi) [39]; the XL population exhibited the lowest F_ROH_ and F_HOM_ values, with shorter and fewer ROH segments, suggesting lower inbreeding levels and better genetic status. This population may serve as a valuable genetic resource for future selective breeding.
An analysis of the genetic differentiation index revealed a certain degree of genetic differentiation among the populations. The genetic differentiation coefficient (FST = 0.065) and genetic distance (DR = 0.067) between the NA and XL populations were the highest, indicating significant genetic differences between these two populations. FST values between 0.05 and 0.15 represent moderate differentiation, except for NA and XL and XL and ZP populations (FST = 0.051); other pairwise comparisons showed moderate genetic differentiation, with FST values ranging from 0.021 to 0.039. Among them, genetic differentiation among XC, ZP, and NA were particularly low (FST values: 0.021, 0.029, and 0.026, respectively). The observed admixture patterns may reflect historical exchanges of fry among farming sites in Hainan Province; however, the exact sources remain uncertain and require further validation. According to our previous investigation, the initial breeding stock of XC originated from the wild population in the South China Sea near the Hainan Island of China. Similar trends have been reported in cultured populations of other fish (O. niloticus) [40]. In this study, genetic differentiation among three populations was particularly low, probably due to their origin from one lake. The assignment of XC, NA, and ZP in the same cluster according to both PCA and population structure analysis provides further support for the aforementioned hypothesis. In contrast, XL was relatively independent.
This study revealed the urgent need for improved genetic management in P. teira aquaculture. The low genetic diversity, moderate differentiation, and uneven inbreeding levels raise concerns about long-term sustainability, particularly in populations such as NA. To enhance genetic health and support future breeding programs, we recommend the following: (i) increasing broodstock size and ensure periodic introduction of wild individuals or genetically diverse stocks; (ii) establishing regional broodstock management protocols to avoid inbreeding and genetic drift; and (iii) although XL and ZP showed relatively higher genetic diversity, which makes them promising candidates for breeding programs, further validation with larger sample sizes and inclusion of wild populations is necessary before making definitive recommendations. Genomic-assisted breeding strategies, as applied in Oreochromis, Salmo, and Sparus species [41], can be adapted for P. teira to accelerate genetic improvement and maintain population resilience under intensive farming. Despite the insights provided, this study has some limitations. First, the sample size per population was limited, which may affect the accuracy of diversity and structure estimates. Second, only cultured populations were analyzed; the inclusion of wild populations would provide a more comprehensive view of domestication effects. Future research should incorporate wild populations, long-term temporal samples, and integrate phenotypic data to link genomic patterns with economically important traits.
5. Conclusions
This study provides a genome-wide evaluation of the genetic diversity, population structure, and inbreeding levels of four P. teira farmed populations from southern China. Our findings revealed generally low genetic diversity and varying degrees of inbreeding across the groups. Notably, the XL population showed relative genetic independence, while the other three populations (XC, ZP, NA) exhibited significant genetic admixture, which may be attributed to their origin—partial fry stocks were likely derived from Xincun Village in Lingshui City. The NA population showed signs of recent inbreeding, evidenced by a higher number and longer stretches of homozygous regions, while the XL population maintained relatively greater genetic variability and lower inbreeding risk. It should be noted that our sampling did not include wild populations and therefore the extent to which farming practices alone contributed to the reduced diversity cannot be conclusively determined. These results highlighted the need for improved genetic management in P. teira aquaculture. Strategies such as expanding broodstock sources, incorporating wild genetic material, and implementing regular genomic monitoring should be considered to maintain genetic health and breeding efficiency. By integrating genomic tools into selective breeding programs, the long-term sustainability and productivity of P. teira aquaculture can be better secured, contributing to both economic development and conservation of genetic resources in the South China Sea region.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Guo H. Liu M. Gao J. Zhu K. Liu B. Guo L. Zhang N. Sun J. Zeng C. Yang J. Development of vertebral column and appendicular skeleton in larvae and juveniles of (Platax teira)South China Fish. Sci.202218939910.12131/20220058 · doi ↗
- 2Rousset F. Genetic differentiation and estimation of gene flow from F-statistics under isolation by distance Genetics 19971451219122810.1093/genetics/145.4.12199093870 PMC 1207888 · doi ↗ · pubmed ↗
- 3Leu M.-Y. Tai K.-Y. Meng P.-J. Tang C.-H. Wang P.-H. Tew K.S. Embryonic, larval and juvenile development of the longfin batfish, Platax teira (Forsskål, 1775) under controlled conditions with special regard to mitigate cannibalism for larviculture Aquaculture 201849320421310.1016/j.aquaculture.2018.05.006 · doi ↗
- 4Liu M. Guo H. Gao J. Zhu K. Liu B. Guo L. Zhang N. Yang J. Liu B. Zhang D. Embryonic development and morphological characteristics of larvae and juvenile of (Platax teira)South China Fish. Sci.20221810311110.12131/20210251 · doi ↗
- 5Liu M.-J. Gao J. Guo H.-Y. Zhu K.-C. Liu B.-S. Zhang N. Sun J.-H. Zhang D.-C. Transcriptomics Reveal the Effects of Breeding Temperature on Growth and Metabolism in the Early Developmental Stage of Platax teira Biology 202312116110.3390/biology 1209116137759561 PMC 10525949 · doi ↗ · pubmed ↗
- 6Liu B. Guo H.-Y. Zhu K.-C. Liu B.-S. Guo L. Zhang N. Jiang S.-G. Zhang D.-C. Nutritional compositions in different parts of muscle in the longfin batfish, Platax teira (Forsskål, 1775)J. Appl. Anim. Res.20194740340710.1080/09712119.2019.1649680 · doi ↗
- 7Chiu P.-S. Chu Y.-T. Huang C.-H. Ho S.-W. Huang J.-W. Yeh S.-L. Effects of stocking density on growth performance, survival and size heterogeneity of juvenile longfin batfish Platax teira Aquac. Res.2020515269527210.1111/are.14858 · doi ↗
- 8Li S. Xie Z. Chen P. Tang J. Tang L. Chen H. Wang D. Zhang Y. Lin H. The complete mitochondrial genome of the Platax teira (Osteichthyes: Ephippidae)Mitochondrial DNA Part A DNA Mapp. Seq. Anal.20162779679710.3109/19401736.2014.91554224845448 · doi ↗ · pubmed ↗
