Applicability of Non-Invasively Collected Eurasian Goshawk (Astur gentilis) Moulted Feathers for Whole Genome Sequencing Analysis
Ineta Kalnina, Ance Roga, Dita Gudra, Edgars Liepa, Otars Opermanis, Imants Jakovlevs, Janis Klovins, Davids Fridmanis

TL;DR
The study shows that moulted feathers from Eurasian goshawks can be used for whole-genome sequencing, though sample quality varies and careful selection is needed.
Contribution
Demonstrates the feasibility of using non-invasive moulted feathers for genome sequencing in raptors, highlighting factors affecting DNA quality.
Findings
Feathers with blood traces and larger size yielded better DNA quality and sequencing performance.
Approximately 83% of the genome was covered at least once, with millions of genetic variants identified.
About 22.7% of samples failed due to poor DNA quality or high missing data.
Abstract
Background/Objectives: Non-invasive samples offer an attractive alternative to logistically challenging invasive approaches in wildlife genetic studies but often contain low-quality host DNA that limits downstream analyses. Here, we assessed the applicability of moulted Eurasian goshawk feathers as a DNA source for whole-genome re-sequencing. Methods: We analysed 75 moulted feathers collected opportunistically from breeding territories. Each feather was measured from tip to tip, and its condition was visually assessed. Whole-genome re-sequencing was performed with a target coverage of 13× using 150 bp paired-end reads. Results: Feathers yielded an average of 7.19 ± 10.93 ng/μL DNA. DNA yield was positively correlated with feather size and the presence of blood traces in the calamus. On average, feather samples performed well, producing 208.7 ± 59.82 million reads, of which 82.69 ±…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3- —Latvian Council of Science
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSkin and Cellular Biology Research · Reproductive biology and impacts on aquatic species · Genetic diversity and population structure
1. Introduction
The use of genetic tools to complement field studies in wildlife research can provide otherwise inaccessible information on the population structure, demographic history, long-term population fitness, migratory routes, and even various behavioural aspects, as well as insights into adaptive responses to environmental challenges [1,2,3,4,5,6,7,8,9]. However, genetic analysis requires the collection of biological material, traditionally blood and tissue samples that yield high-quality DNA, which involves capturing individual animals. Depending on the species, such procedures can cause significant stress and unacceptable levels of disturbance, potentially leading to behavioural changes and an increased risk of injury for both animals and researchers [1,6,8,10,11,12]. Therefore, growing numbers of studies focusing on avian species switch towards non-invasive sampling and use moulted feathers, faecal material or eggshells as a source of genetic material [4,6,7,8,13,14]. Non-invasive sample collection avoids direct contact with birds and can be less demanding in terms of technical training, easier to implement, less time-consuming or in some cases the only feasible option [1,7,8,11]. Moulted feathers are arguably the most widely used non-invasive DNA source, providing valuable genetic information for studies on population diversity and size, sex ratios, behavioural ecology, migratory routes, mark–recapture analyses, and other aspects vital to conservation [1,2,6,7,8,14,15,16,17,18,19]. More recently, feathers have also proven to be a promising alternative to blood or organ samples for detecting blood parasites [20].
Although DNA obtained non-invasively can yield results comparable to those from invasive methods, such samples come with specific challenges and risks [1,6,7,10,11,21,22]. Environmental conditions such as high humidity, exposure to sunlight, and elevated temperatures accelerate DNA degradation and lead to the accumulation of short fragments, thereby reducing the amount of usable DNA [1,14,16,17,18,20,21,22,23]. Non-invasive samples are often poor sources of DNA owing to low host DNA content, DNA fragmentation, chemical DNA damage, the presence of contaminant DNA, and potential PCR inhibitors that interfere with molecular analyses [1,16,17,18,21,22,24,25]. In marker-based studies, low quality and quantity of input DNA can result in poor marker amplification, erroneous genotype calling, and a high rate of allele dropout, potentially introducing bias into estimates of a population’s genetic diversity [1,3,7,11,24,25,26,27]. Suboptimal DNA quality can reduce the number of restriction sites necessary for methods such as restriction site associated DNA sequencing (RADSeq), leading to the loss of informative genomic regions and increased coverage inconsistency among individuals [3,23,24,25]. Low host-DNA content and contamination with non-host DNA can also increase the sequencing depth required to achieve even genome coverage and adequate genotype calling in whole-genome re-sequencing studies [10,21,24,28]. Thus, high variability in DNA quality among non-invasive samples can make study design less predictable, often requiring the collection of additional samples and extra analyses, thereby increasing resource demands [1,11,18,21,22].
Moulted feathers generally perform better than faecal samples, although species-specific differences may occur [6,7,13]. However, compared to less degraded sources such as buccal swabs, freshly plucked feathers, or feathers collected from dead birds, genotyping success rates from moulted feathers tend to be lower and error rates higher, although in the latter case performance is influenced by carcass freshness [13,18,20,29,30,31]. As is typical for non-invasive samples, the quality of feathers as a DNA source depends on the time spent exposed to detrimental effects of environmental factors such as humidity, temperature and sunlight [15,16,17,20,31]. In addition to the physical condition of feathers, affected by environmental exposure, the DNA yield and performance of moulted feathers also depend on the feather type and the specific parts used for extraction. For large feathers, the tip and superior umbilicus of the shaft, which often contain blood clots, are typically used for DNA extraction [14,15,17,18,31,32,33]. For smaller feathers such as plumulaceous ones, the entire shaft or whole feather may be processed [14]. Due to their low DNA content, feather barbs are rarely used. However, they have nonetheless been shown to yield usable data from ancient and forensic samples [34].
The Eurasian goshawk (Astur gentilis) is a medium-sized bird of prey. Several genetic studies have used moulted feathers to identify individuals and characterise population structure in both Eurasian and American goshawks (Astur atricapillus) using microsatellite markers [19,33,35]. Moulted feathers were also included in the sample set used to confirm the genetic distinctiveness of the Haida Gwaii population of the American goshawk at the genomic level, highlighting its conservation significance. Although the majority of moulted feather samples failed to produce sufficient data for reduced-representation genome sequencing-based analysis, they performed well in single nucleotide polymorphism (SNP) panel genotyping [4].
Here, we present an overview of the performance of DNA derived from moulted feathers in whole genome sequencing (WGS) analysis, using samples collected from adult Eurasian goshawks (A. gentilis). Since the use of moulted goshawk feathers as a DNA source for genome-wide sequencing is less frequently explored, data on the performance of moulted feather-derived DNA in whole-genome sequencing may help inform future genomic studies, particularly of species that are difficult to capture for invasive sampling.
2. Materials and Methods
2.1. DNA Sample Collection and Processing
Shed feathers were collected opportunistically during two breeding seasons (2020–2021, March to August) from 60 known goshawk breeding territories. Of these, 36 territories were located within the city of Riga or in adjacent suburban areas (hereafter “urban”), while 24 territories were distributed across different regions across Latvia (hereafter “rural”) (Table S1). Rural territories were typically visited once or twice per breeding season, whereas urban territories in Riga were visited several times per season as part of long-term monitoring activities. In cases where more than one feather was collected within the same territory, a single sample was selected for whole-genome sequencing based on the overall physical condition of the feather.
All samples collected in 2020 (27 feathers in total: nine from rural and 18 from urban territories) and nine of the samples collected from rural areas in 2021 were stored at room temperature until DNA extraction. All feathers collected during the 2021 breeding season from urban territories (32 feathers) and six collected from rural areas were stored at −20 °C. A single primary feather collected in 2018 was stored in a paper envelope at room temperature. Before DNA extraction feathers were cut above umbilical blood clot, barbs were removed, and the calamus was wiped with 70% ethanol-soaked tissue paper to remove most of the dirt. Cleaned calami were washed by soaking samples in 70% for 30 min, rinsed with Mili-Q water followed by 30 min in Mili-Q water [33]. After washing, approximately 1 to 1.5 cm of calamus tip was cut horizontally and chopped into small pieces with sterile scissors directly into sterile 2 mL tubes. The rest of the shafts were opened vertically to collect the remains of the inner membrane and section containing the umbilical blood clot [15,32]. For small body feathers, most of the barbs were removed before washing, and the whole shaft was chopped into small pieces and used for DNA extraction. All procedures after the washing step were carried out in a laminar-flow cabinet. The equipment was washed with undiluted household bleach solution (sodium hypochlorite < 5% (chlorine concentration 10–20 g/L), sodium hydroxide < 5%, sodium phosphate < 5% and sodium hydroxide < 2%) and exposed to UV for 30 min. Blank controls were included in each sample batch to check for contamination. Extraction of DNA was performed with the QIAamp DNA Mini Kit (Qiagen, Hilden, Germany), following the user-developed protocol for “Purification of total DNA from nails, hair, or feathers using the DNeasy Blood & Tissue Kit” (Qiagen), with minor adjustments suggested by Gebhardt and colleagues [31]. In brief, samples were incubated for 20–24 h at 56 °C under constant agitation in a solution containing ATL buffer, proteinase K, and DTT. Small feathers were incubated in 200 μL ATL buffer, whereas large feathers were incubated in 300 μL ATL buffer to ensure complete submersion of the samples, with 20 μL proteinase K and 20 μL 1 M DTT added to each sample. After addition of AL buffer (200 μL for small feathers, 300 μL for large feathers), samples were incubated at 70 °C for 45 min, followed by centrifugation at 11,000× g for 2 min to remove debris. Next, 96% ethanol was added (200 μL for small feathers, 300 μL for large feathers), and samples were loaded onto silica columns and washed according to the manufacturer’s instructions. DNA was eluted in 64 μL AE buffer after incubation at 70 °C for 15 min. The quality of DNA samples was assessed by 1.2% agarose gel electrophoresis and with a Qubit 2.0 fluorometer using the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Eugene, OR, USA). DNA purity, including the 260/280 and 260/230 ratios, was measured using a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific). DNA fragmentation was retrospectively assessed using the Genomic DNA ScreenTape assay on a TapeStation 4200 system for 30 samples with available residual DNA (Agilent Technologies, Waldbronn, Germany).
2.2. WGS Analysis
For WGS analysis, 75 moulted feathers were selected from a total of 281 collected for population studies, prioritizing samples with the highest DNA quality and quantity and in good visual condition (e.g., clean, undamaged shafts without signs of decay). Libraries were prepared from these DNA samples using the MGIEasy Universal DNA Library Prep Set, following the manufacturer’s guidelines (MGI Tech Co., Ltd., Shenzhen, China). Considering the average DNA yield of our samples, a maximum input of 250 ng per sample was used for the shearing and size-selection step. For samples with lower DNA yields, the maximum available input was used (Table S1). Samples were sheared using a Covaris S2020 sonicator under conditions recommended for a target fragment size of 400 bp (Covaris, Inc., Woburn, MA, USA). Library amplification was performed with eight PCR cycles. The quality of DNA libraries was assessed using a Qubit 2.0 fluorometer with the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific) and an Agilent Bioanalyzer 2100 with the High Sensitivity DNA Kit (Agilent Technologies). Barcoded libraries were pooled to achieve approximately 13× coverage per sample with 150 bp paired-end reads (four libraries per lane). Pooled libraries were circularized using the MGIEasy Circularization Kit V2.0 and sequenced with the DNBSEQ-G400RS High-throughput Sequencing Set (flow cell PE150) on the DNBSEQ-G400RS HTS platform (MGI Tech Co., Ltd.).
2.3. Sequencing Data Processing and Statistical Analysis
Initial quality control of raw sequencing reads was performed with FastQC v0.11.9 [36]. Adapters were removed with Cutadapt v3.5, and reads were quality-trimmed with Trimmomatic v0.39 (quality score < 30) [37,38]. The A. gentilis genome reference (RefSeq assembly accession GCF_929443795.1, bAccGen1.1) was indexed with BWA v0.7.17 using the bwa index algorithm, and an indexed sequence dictionary was created with Picard CreateSequenceDictionary v2.27.4 [39,40,41]. Raw reads were aligned to the indexed reference genome using the BWA-MEM algorithm with the -M flag. The aligned reads were converted to BAM format with Samtools v1.15.1 (samtools view -b) [42]. Further processing, including updating mate-pair information, calculating insert sizes, marking duplicates, and estimating sequencing depth, was also carried out with Samtools. Mapping statistics were obtained from the BAM files using samtools stats and Qualimap v2.2.1 bamqc [43].
Variants, including SNPs and indels with minimal Phred-scaled confidence 30 were called with GATK v.4.2.6.1 HaplotypeCaller [44]. Sample-specific gVCFs were combined using GATK CombineGVCFs, and joint genotyping was performed with GATK GenotypeGVCFs. GPU acceleration was applied using the Clara Parabricks v4.0.0-1 computational genomics toolkit to optimise analysis time [45]. Processing of called variants was performed using bcftools v.1.21 [42]. Variants located on unmapped contigs, as well as those on the Z and W chromosomes, and any variants other than biallelic were removed, retaining only biallelic single nucleotide variants (SNVs) on autosomes. These biallelic SNVs were further subjected to hard filtering using quality parameter thresholds recommended by GATK, retaining only variants with Fisher Strand (FS) > 60.0, Strand Odds Ratio (SOR) > 3.0, RMS Mapping Quality (MQ) < 40.0, Mapping Quality Rank Sum Test (MQRankSum) < −12.5, Quality by Depth (QD) < 2.0, and Read Position Rank Sum Test (ReadPosRankSum) < −8.0 [44]. Low-coverage variants (depth per sample, DP < 5), those with low genotype quality (GQ < 20) and variants high missingness rate (>20%) were then excluded to obtain the final set of SNVs. The PLINK v2.0 KING-robust kinship estimator was used to identify potential duplicate feather samples [46]. Genotype concordance between duplicate samples was assessed using the bcftools v.1.21 gtcheck function, and error and concordance rates were calculated as mismatches or matches divided by the total number of compared sites, respectively.
2.4. Statistical Analysis
Statistical analysis and data visualisation were performed in RStudio v.4.4.1 [47]. Associations between continuous variables were assessed using Spearman’s rank correlation test with the function cor.test, and mean values between different groups were compared using the Mann–Whitney U test (also known as Wilcoxon rank-sum test) with the function wilcox.test(x~y), implemented in the R stats package v.4.1.4 with default settings (paired = FALSE). Data were visualised using packages ggplot2 v.3.5.2, ggpubr v.0.6.0, tidyr v.1.3.1, reshape2 v.1.4.4, dbplyr v.2.5.0, cowplot v.1.2.0. Principal component analysis (PCA) was performed using prcomp function in stats (scale parameter set as TRUE), and biplots were created with factorextra v.1.0.7, ggrepel v.0.9.6 and ggplot2 [47].
3. Results
Moulted feathers selected for whole-genome sequencing ranged from 40 to 313 mm in length, with an average length of 199 mm, reflecting an overrepresentation of larger wing and tail feathers (Figure 1, Table 1). Based on visual assessment, 60% of feathers were classified as being in good condition, 36% in moderate condition, and 4% in poor condition (Figure 1). Detailed information on feather characteristics, DNA quality, and whole-genome sequencing results is provided in Tables S1 and S3.
3.1. DNA Yield from Feather Samples
DNA yield from moulted feathers averaged approximately 7.2 ng/μL but varied substantially among samples, with larger feathers, primarily tail and primary flight feathers, yielding higher DNA amounts than smaller body feathers (Figure 1, Table 1). Of note, the DNA yield from feathers in moderate condition was higher than from those in good condition, 5.2 ng/μL compared to 10.9 ng/μL (Figure 1, Table 1 and Table S2). However, feathers in moderate condition were also, albeit non-significantly, larger than those in good condition, which may have influenced the DNA yield (Table 1 and Table S2). The three feathers in poor condition yielded between 1.58 ng/μL and 3.35 ng/μL of DNA (Table S1). The majority of samples, 55 feathers or 82.5%, had a visible blood clot retained in the superior umbilicus region of the calamus, which resulted in significantly higher DNA yields (Figure S1, Table S2). As expected, umbilical blood clots were observed more frequently in larger feathers (Table S2). Feathers in moderate condition, which were on average larger, contained blood clots more frequently than feathers in good condition (85.2% vs. 71.1%), likely contributing to their higher DNA yield (Figure 1, Table S2). No significant differences in DNA yield were detected between feathers stored at room temperature and those stored at −20 °C (Figure S2). Limiting the analysis to feathers collected from urban areas yielded the same result (Figure S2). Similarly, feather collection area (“urban” vs. “rural”) had no significant effect on DNA yield. However, samples collected from urban areas contained a proportionally higher number of feathers in good condition (72.6%) and tended to be smaller, with correspondingly lower DNA yields, compared to samples from rural areas, which were dominated by larger feathers in moderate condition (62.5%) and associated with higher DNA yields (Figure S2). These trends may reflect differences in survey effort.
DNA concentrations measured with the NanoDrop ND-1000 spectrophotometer were on average seven times higher than those measured with the Qubit 2.0 fluorometer (Table S2), although the two measurements were strongly correlated (Figure S3). Average NanoDrop-derived 260/230 and 260/280 ratios were 0.78 and 1.48, respectively, and were similar across feathers (Table S2, Figures S3 and S4). However, both ratios correlated significantly with DNA concentration, indicating a potential bias, particularly in low-concentration samples (Figure S3). The average DNA fragment size, assessed for a subset of 30 samples using the TapeStation system, was 2513 bp and ranged from 500 to 12,905 bp.
3.2. WGS Performance of Feather-Derived Samples
The amount of DNA available per sample determined the input quantity used for the size-selection step in sequencing library preparation (Table S1). The proportion of DNA lost during size-selection was similar among feathers in good, moderate, and poor condition, 85.7%, 85.5%, and 84.0%, respectively (Figure S5). DNA loss did not correlate with feather size, the presence of detectable blood clots, the amount of DNA available for library preparation, or average DNA fragment size (Table S2, Figure S5). Although the total amount of library produced was, on average, higher for samples derived from large feathers in moderate condition, reflecting greater amounts of input DNA, the yield of library per nanogram of input DNA did not differ among samples (Table S1, Figures S6 and S7).
Raw sequencing read counts from moulted feathers ranged from 8.02 to 310.7 million with an average of 213.0 million (Table 1, Figure S8). On average, 0.25 pmol of library generated 247.7 million raw reads per sample irrespective of feather characteristics (Figures S9 and S10).
The number of reads used for alignment ranged from 7.7 to 300.5 million with an average 208.7 million. Overall, the mapping success rate of sequencing reads to the goshawk genome was high 82.7% (Table 1). Of the feather-derived samples, 50 (66.7%) had more than 90% of reads mapped, and among these, 27 (36%) achieved highly successful alignment with 99% of reads mapped. In contrast, 11 samples (14.7%) had fewer than 50% of reads mapped, with the poorest outcome being only 4.66% of reads successfully aligned (Figure S11). Most reads were aligned with relatively high confidence, with an average mapping quality score of 29 (Table 1, Table S1). Closer quality inspection revealed that the poorest-performing sample, a small feather in poor condition, had an abnormally short insert size (20 bp instead of the expected ≈ 250 bp, Table S1) and was excluded from further analysis. Among the remaining 74 samples, the average read mapping rate was 83.75%, with an average mapping quality of 29 (Table 1).
As expected, the proportion of successfully mapped reads was one of the most significant factors determining genome coverage (Figure S12). Mean genome coverage was 17×. However, with a minimum of 0.24× and a maximum of 31.3×, there was notable variation among samples (Table S1). On average, 83.58% of the genome were covered at 1×. Over 90% of the genome was covered at 1× for 37 feathers, and over 80% for 51 sample, while four samples had less than 50% of genome covered at 1× with the lowest representation of 14.64% (Figure S13).
Larger feathers in general performed better in terms of read mapping, had reads mapped with higher confidence, and had a higher proportion of genome covered at 1× (Table 1, Figure 2 and Figure S14). Although the proportion of genome covered at 1× among feathers in good and moderate condition was similar, feathers in good condition had approximately 14% more reads mapping to the host genome in contrast to those classified as moderate, despite comparable read mapping quality scores (Table 1, Figure 2 and Figure S14). Higher DNA yield from larger feathers may partly explain the positive correlation between feather size and sequencing performance, as both read alignment and genome representation improved with increasing DNA yield and with the presence of detectable umbilical blood clots (Figures S12, S14 and S15). Feathers with detectable blood clots showed higher values for both the proportion of mapped reads, 86.3% vs. 70.4% and mapping quality, 29.51 vs. 27.31 (Table S2, Figure S15). Although the observed association between DNA yield, the presence of blood clots, and the proportion of genome covered at 1× was mostly driven by the amount of input DNA, improved read mapping also contributed (Figures S12 and S15). Apart from these effects, both the read-mapping success rate and the proportion of genome covered at 1× were associated with lower GC content of mapped reads and with longer insert sizes (Figure 2, Figures S16 and S17), which are profiles characteristic of samples with higher DNA yield, predominantly larger feathers containing blood clots, irrespective of feather condition (Figures S18 and S19, Table S2). DNA losses during the size-selection step increased with higher GC content of mapped reads and showed a non-significant tendency to increase with decreasing insert size, potentially indicating an association with DNA quality (Figure S20). Consistent with this, in the subset of 30 samples with available TapeStation measurements, average fragment size tended to increase with DNA concentration and showed a positive, though weak, association with the insert size of mapped reads (Figure S21). Together, these patterns may reflect higher read quality and higher-quality input DNA in samples with greater DNA content. However, average fragment size did not correlate with the proportion of mapped reads, GC content, feather size, or feather condition. Contrary to expectations, fragment sizes tended to be larger in feathers without detectable umbilical blood clots and in samples with <90% mapped reads compared to those with blood clots and higher mapping rates, although these differences were not statistically significant (Figure S21). This pattern may indicate the presence of higher-quality non-host DNA. Nevertheless, these results should be interpreted with caution, as TapeStation measurements were available only for a subset of samples, were performed retrospectively, and some samples had suboptimal DNA concentrations for robust integrity assessment.
3.3. SNV Calling in Feather-Derived Samples
The achieved sequencing depth and genome coverage yielded a total of 130.72 million unfiltered biallelic single-nucleotide variants (SNVs) on the autosomes, with substantial variation among samples ranging from 15.58 million to 130.61 million (Figure 3 and Figure S22, Table 1 and Table S4). Consistent with this variation, the average per-sample genotype missingness ranged between 0.08% and 88.08%. Called genotypes were generally of good quality (GQ) 35.5, with a moderate median sequencing depth per sample (DP) 13.6 reads (Figure S22, Table S4).
Called variants were filtered for quality to obtain a set potentially suitable for downstream analyses. The first step of site-level variant filtration involved hard filtering according to GATK recommendations removed, on average, 5.46% of SNVs per sample. However, the vast majority of SNVs were lost after excluding low-quality variants defined by low GQ, insufficient DP, or high genotype missingness across samples. This step accounted for the removal of 94.2% of all sites failing quality filters (Figure 3, Table S3).
The final SNV count across the sample set was 11.95 million, although poorly performing samples retained as few as 5723 SNVs (Figure 3, Tables S3 and S4). Correspondingly, genotype missingness rates ranged from 0.01% to >99% in failed samples (Table 1 and Table S4). Because GQ and DP were part of the filtering criteria, average GQ and DP for filtered SNV set increased to 67.47 and 26.47, respectively. Variant filtering also improved the average Het/Hom ratio to 2.27 and the Ts/Tv ratio to 3.14 (Tables S3 and S4).
Larger feathers had a significantly higher number of SNVs called and retained more quality-filtered variants compared to smaller ones, irrespective of condition (Table 1 and Table S4, Figures S22 and S23). The same hold true for feathers containing detectable umbilical blood clots compared to those without blood traces in calamus (Figure S22 and S23). The positive association between feather size, the presence of blood traces in the calamus, and the number of called SNVs was largely driven by higher DNA yield and improved genome coverage, as the proportion of SNVs lost during quality filtering did not differ among feathers (Figure 3, Figures S23 and S24; Table S4). Although GQ of unfiltered variants tended to be higher in larger feathers and in feathers containing blood clots, no differences in GQ were observed after quality filtering with respect to feather size, presence of blood clots, or DNA yield (Figures S22–S24; Table S4). Feather condition had no effect on the variant numbers (Figures S22–S24, Table S4). However, GQ was higher in feathers in good condition than in those in moderate condition for unfiltered variants (38.18 vs. 31.85), and this difference persisted after filtering (71.83 vs. 60.39, Table 1, Figures S22–S24).
Ten samples showed abnormal Het/Hom ratios < 1 in the unfiltered SNV set. Three samples retained low ratios even after filtering (0.035–0.35). In one sample, filtering disproportionately removed non-reference homozygous SNVs, resulting in an elevated Het/Hom ratio of 20.75 (Figure 3, Table S2). Except for this outlier (Het/Hom > 20), these samples tended to have relatively low GQ values and were characterised by low DNA yield, low read mapping rate, or both. Taken together, these patterns suggest that deviations in the Het/Hom ratio were likely driven by a combination of low input DNA quality and quantity.
Sample set checking for duplicates identified six pairs of feathers and one triplicate, collectively representing seven different individuals. Feather samples assigned to the same individual by kinship analysis were collected either within the same territory (five duplicate pairs) or from neighbouring territories (two duplicate pairs), consistent with the relatedness analysis (Table S5). This allowed assessment of genotype concordance among biological replicates. The average genotyping error rate estimated across all compared sites was 0.16%, ranging from 0.018% to 0.38%. When error rates were estimated using only heterozygous sites, values were higher, with a mean of 3.27%, a minimum of 0.36%, and a maximum of 8.42%. The replicate set did not include samples showing deviations from expected Het/Hom ratios. Therefore, it was not possible to evaluate genotyping error rates for samples with genotype calling issues. Nevertheless, among the replicate set there were four samples with >20% missing genotypes, and, as expected, duplicate pairs including these samples showed the highest mismatch rates within the replicate set (Table S5).
Because samples with poor quality and high missingness can bias population genetic inference, such samples were excluded. After removing eleven samples with quality issues and six with >20% missing genotypes, the final dataset comprised 58 feathers (77.3% of the initial sample set). The retained feathers averaged 210 mm in length and included 36 in good condition, 20 in moderate condition, and two in poor condition, corresponding to retention rates of 80%, 74%, and 50%, respectively; however, the small number of poor-condition feathers precluded reliable performance assessment. Among feathers containing blood clots in the calamus, 14.04% (8 feathers) were excluded, compared with 57.14% (8 of 14) of feathers without blood clots, a difference likely explained by the higher proportion of low-yielding samples in the latter group.
The final 58 samples retained between 9.82 and 11.95 million quality-filtered SNVs per sample. Frequency filtering yielded 472,493 common SNVs (MAF > 0.05) in the final VCF file, with 382,854–472,228 SNVs per sample. However, allele frequencies may have been underestimated due to uneven genome coverage across moulted feather samples.
4. Discussion
For medium to large-sized birds of prey like goshawks, moulted feathers are among the most accessible samples that can be safely collected from adult individuals without capture-related risks particularly for population monitoring [8,9,17,35,48]. Despite issues with the quality of DNA common among different types of non-invasive samples, moulted feathers have been successfully used as a DNA source to gain information on population genetics for a range of different avian species [2,4,6,7,13,15]. Conducted research has not only demonstrated the undeniable value of moulted feathers as a source of DNA but also highlighted shortcomings one must be aware of to arrive at reliable conclusions [4,6,7,8,18,29]. To assess the practicality of moulted feather samples as a DNA source for WGS and the adequacy of data output, we have completed the WGS analysis of 75 goshawk moulted feather samples. Whole genome sequencing data were generated for all included feathers, but individual samples varied greatly in performance throughout the workflow and in the quality of generated data.
Low DNA quantity and quality are intrinsic problems associated with biological material exposed to suboptimal conditions, also affecting moulted feathers [7,10,15,16,17,20,21,22,31]. For example, freshly shed large flight feathers from corvids yielded, on average, between 64.8 ng/µL and 519.5 ng/µL of DNA, depending on species size, while the average DNA concentration for samples extracted from goose feathers left outdoors for two months was 22.8 ng/µL [20]. As expected, Nanodrop-based measurements of DNA concentration from moulted goshawk feathers were closer to spectrophotometric measurements of DNA yield from aged goose feathers (22.8 ng/µL). Goose feathers were collected in autumn and could have been exposed to similar temperatures and precipitation levels to those observed locally in spring [20]. Nonetheless, DNA concentrations, on average, were lower than those reported in previous studies for moulted tail, primary flight, and small covert goshawk feathers (24.6 ng/µL, 13.8 ng/µL, and 4.3 ng/µL, respectively) [33]. However, there is notable variation among studies using moulted feathers as a DNA source. Moulted feathers collected during field surveys from other large- to medium-sized species yielded, on average, 13.6 ± 2.1 ng/µL of DNA for the Chinese egret [13], 92.1 ± 76.8 ng/µL for the Spanish imperial eagle [17,32], and 144.7 ± 31 ng/µL for painted stork feathers of good quality [49]. In addition to species-specific factors, high heterogeneity among results from different studies could be attributed to differences in collection approaches, storage conditions, DNA extraction protocols, and other methodological aspects [15,17,18,20,31,33,50].
Procedures for DNA extraction followed protocols adapted for feather samples [15,31,32,33], but some of the feathers had been stored at room temperature, which may have affected DNA quality. Feathers kept dry and in the dark at room temperature have previously been shown to yield material suitable for genetic analysis [17,18,33,50]. Nonetheless, storing samples at –20 °C is generally recommended to minimise the risk of DNA degradation, as DNA quality has been shown to decline during prolonged storage at room temperature [20]. We did not observe notable differences in DNA yield between room-temperature and frozen samples, but we cannot rule out detrimental effects on DNA integrity because no control samples were available to assess degradation during storage. Degradation during storage may therefore have contributed to variation in DNA quantity.
However, feather condition and sample type appear to have the greatest influence on variation in feather-sample performance in genetic analyses. Environmental conditions, particularly precipitation and UV exposure from sunlight, accelerate DNA damage [17,20,21,22]. It is usually impossible to determine how long feathers have been exposed to environmental conditions before collection in the field, and therefore to evaluate sample freshness. The visually assessed condition of feathers has often been used as a proxy for estimating their exposure history and, consequently, for predicting DNA quality [15,16,17,31]. Selecting shed feathers with no signs of damage and with an intact calamus and barbs has been shown to significantly increase genotyping success rates [15,17,20,31]. While feathers with clear signs of degradation are more likely to fail, the performance of feathers showing moderate signs of ageing, such as slight calamus discolouration, appears to be less predictable, as moderately aged feathers can yield results comparable to fresh-looking samples [16,17].
Feather performance in genetic analyses, particularly with respect to DNA yield and quality, depends on both feather type and the specific part used for extraction, regardless of methodological differences among studies [14,15,17,18,31,32,33]. Larger feathers, such as primary flight feathers and remiges, generally produce higher DNA yields and show higher marker-amplification success than smaller coverts or contour feathers [15,18,20,31,32,33]. DNA yield has also been shown to be higher when the superior umbilicus, often containing a blood clot, is included rather than only the basal tip of the calamus [15,20,32,33], and improvements have been reported even when no visible blood traces are present [32]. The better performance of larger feathers has also been indirectly linked to higher DNA quantity and quality, particularly in samples extracted from the umbilical blood clot [15,16,32]. Nonetheless, the reliability of moulted feathers in genetic analysis appears to depend on the interplay between intrinsic feather characteristics and sample condition. Some studies highlight feather condition as a superior indicator of performance in genetic analysis, emphasizing that feathers in good condition produce reliable results independently of feather type, and that even small feathers, such as plumulaceous feathers, can yield adequate results [14,31].
Feather size, defined here as length from tip to tip, appeared to be the dominant factor determining DNA yield, whereas the effects of feather condition were less clear. In line with previous reports, larger feathers yielded more DNA [15,17,18,20,31,32,33]. These were mainly primary wing feathers and remiges, most of which contained detectable blood traces within the superior umbilicus, contributing to higher DNA yield. Feathers in moderate condition produced higher DNA yields than those in good condition, which was unexpected given the anticipated DNA degradation in more aged samples [15,16,31]. This pattern was most likely confounded by the fact that feathers in moderate condition were, on average, larger and more frequently contained visible umbilical blood clots than those in good condition. Such variation, together with the subjective nature of visual quality assessments and the limited number of samples in poor condition, prevented a robust estimation of decrease in DNA yield with declining sample condition. Consequently, it remains uncertain whether these trends would hold in a more balanced dataset, particularly given inconsistencies in previous reports regarding mildly aged samples [16,17].
Feather size and condition also correlated with sample performance in the WGS analysis. Beyond quantitative limitations associated with DNA yield, both parameters were related to read-mapping rate and quality, the proportion of the recovered genome, and the quality of called SNVs. Overall, with an average of 83.8% of reads mapping to the goshawk genome and 83.6% of the genome covered at least once, the moulted feather samples performed comparably well. The proportion of mapped reads was more than twice that obtained from moulted feathers collected from the Haida Gwaii goshawk population, which reached only 40%. The lower mapping rate in that study may have reflected the poorer condition of the available feather samples as well as the use of a bald eagle reference genome [4]. Read alignment and the proportion of the recovered genome among the better-performing moulted feather samples in the present study were similar to those reported for high-quality samples. For freshly collected feathers of a passerine species, mapping rates ranged between 85.9% and 90.0%, with 66% of the genome recovered [51]. For tissue samples, the proportion of aligned reads typically reaches at least 95%, recovering over 90% of the host genome [10,24]. Nevertheless, there was a notable contrast between the best- and worst-performing samples in our dataset, with some having fewer than a quarter of reads aligned and less than a third of the genome recovered. Substantial inter-sample variability has also been reported for shed hair and faecal samples, where the proportion of aligned reads ranged from 5% to 97% and from 6% to 60%, and host genome coverage ranged from 25% to 98% and from 74% to 99%, respectively [10,21].
In the present study, several factors may have contributed to the inter-sample variability in the proportions of aligned reads and recovered genomes. The proportion of mapped reads, mapping confidence, and recovered genome increased with feather size and DNA yield, and were higher in feathers containing visible umbilical blood clots. Higher marker-amplification success rates have previously been reported for samples derived from umbilical blood clots [16,32]. Lower host-genome recovery in low-yield samples was partly attributable to the limited DNA available for library preparation, which is associated with reduced sequencing depth and possible underrepresentation of the host DNA pool [25,27,51]. However, increased DNA degradation, particularly in feathers with small calami that are potentially less resistant to environmental conditions, has also been reported as a cause of the higher failure rate of feather-derived samples in marker-based studies [16,17]. In genome-wide studies, low to moderately degraded samples can be expected to perform adequately as long as they contain a sufficient proportion of fragments at or above the target length required for the chosen approach [23,24,52]. Excessive DNA fragmentation associated with chemical damage, by contrast, leads to loss of raw reads due to poor quality scores, insufficient coverage, skewed genome representation, and, at later stages, impaired read alignment and erroneous genotypes [4,10,23,25,28]. Signs of DNA degradation, visible as smearing on agarose gels, were evident to varying degrees in all feather-derived samples included in the present study. In addition, the generally high proportion of input DNA lost during the bead-based size-selection step of library preparation may also reflect increased DNA fragmentation, although DNA losses were similar regardless of feather size or condition. We did not observe excessive loss of raw sequencing reads due to quality issues, as has been reported for extremely degraded samples [23], and average read quality remained acceptable. It is possible that poor-quality DNA fragments were removed during library preparation, thereby indirectly improving data quality [53]. However, the lower mapping quality, higher GC content, and shorter insert sizes observed among mapped reads from smaller feathers, if not attributable to sequencing or data-processing bias, may indicate a higher degree of host-DNA degradation [54,55]. Because these three read-quality measures showed similar associations with lower DNA yield and with the absence of a blood clot in the feather calamus, it is not possible to determine with certainty whether DNA degradation was more pronounced in smaller feathers or whether it simply reflected a general decline in DNA quality as DNA yield decreased. Notably, greater loss of input DNA during the size-selection step was observed in samples with GC-richer and shorter reads, which would be consistent with the increased proportion of highly fragmented DNA [23,52,54].
However, DNA degradation was unlikely to be the only factor compromising WGS outcomes among feather-derived samples. Variation in host-genome recovery from non-invasive samples has largely been attributed to exogenous DNA content, which can outcompete host DNA and lead to lower read-mapping rates and reduced genome recovery, thereby increasing the sequencing depth required to recover the host genome [10,21,24,25]. For example, on average, approximately 60% of reads derived from shed-hair samples were of exogenous origin [10,25]. If samples with fewer than 90% of reads aligning to the goshawk genome were defined as contaminated [24], then 32% of the feather-derived samples in the present study may have contained, on average, 53% exogenous reads.
Given the similar mapping-confidence and mapped-read-quality metrics to those of feathers in good condition, lower read-mapping rates among feathers in moderate condition may indicate that this group contained more samples with elevated exogenous-DNA proportions and a less pronounced effect of host-DNA degradation. Host DNA may have been better preserved in large feathers with blood traces in the calamus, which dominated among the feathers in moderate condition and may have included samples that experienced only comparatively mild DNA degradation despite visible signs of ageing [17,32]. However, visible signs of ageing, such as discolouration, may still be associated with the presence of microorganisms, and microbial DNA can hinder host-genome recovery in WGS [10,25,56]. Furthermore, higher DNA yield among feathers in moderate condition could also indicate a higher proportion of exogenous DNA. Contamination with non-host DNA has been suspected as the main reason for failure in faecal samples that otherwise showed high DNA yields [21]. It is tempting to speculate that, in smaller feathers, DNA degradation combined with contamination played a greater role in driving variation in read alignment and host genome recovery, whereas in feathers with larger calami containing blood clots, mild signs of ageing may have been a stronger indicator of contamination. On average, larger fragment lengths associated with higher DNA yields could indicate either reduced fragmentation of host DNA or increased contamination by higher-quality exogenous DNA. Consistent with the latter possibility, samples with <90% mapped reads tended to have longer DNA fragments, although this difference was non-significant. However, these results did not provide clear support for an association between feather characteristics and DNA degradation or contamination, as no notable differences in fragmentation levels were observed among feathers differing in condition or size. Nevertheless, DNA fragmentation was assessed retrospectively in only a subset of samples and may therefore incompletely capture existing patterns.
Overall, 22.7% of feather samples were considered to have failed due to potential genotyping errors and a high proportion of missing data. This falls within the reported range of marker genotyping failure rates for moulted feathers, which vary between 1.2% and 50% [16,17,31,35]. In genome-wide studies, failure rates for faecal samples were around 21%, whereas shed hair collected from hair traps showed failure rates as high as 75% [24,25].
Although feathers in moderate condition resembled those in good condition based on read-quality metrics, feather condition seemed to have a stronger association with genotype quality than feather size. Differences associated with feather condition persisted even after low-quality variants were removed, whereas quality filtering largely eliminated differences related to feather size. Feather condition has previously been identified as a major factor influencing success rates of marker amplification [16]. However, other studies report little impact, provided that the samples are not in heavily degraded condition [17]. An increased error rate in called genotypes linked to chemical damage associated with DNA degradation, but not to DNA fragmentation, has previously been reported for shed hair-derived DNA samples. Changes associated with chemical damage were not reflected in read numbers, read quality, or mapping rates [25]. Our study did not include matching high-quality samples to directly assess genotype error rates, although genotype concordance was high between moulted feather pairs recognised as duplicates originating from the same bird, with error rates below 0.5%. This is comparable to previously reported marker genotyping error rates ranging from 0.8% to 11.7% in moulted feather samples, as well as error rates derived from whole-genome data from mammalian scat samples [6,10,24,31]. Since the duplicate set did not include samples with noticeable genotyping issues, erroneous genotypes resulting from changes in DNA structure in moulted feather samples, subject to heterogeneous environmental exposure, cannot be ruled out.
However, failed samples included feathers in both good and moderate condition, some of which had either low DNA yield, low percentages of mapped reads, or both. Although a few samples retained abnormal genotype distributions after variant-quality filtering, feather samples with more than 50% genome coverage at least once could likely have been salvaged by increasing sequencing depth [24,51]. Arantes and colleagues effectively used shallow pre-sequencing of low-quality samples to identify the proportion of usable host-DNA and to predict the sequencing depth required to recover sufficient data [24].
The final set of moulted feather samples with at least 80% non-missing genotypes yielded a total of 12 million quality-filtered biallelic SNVs, the vast majority of which were rare variants. Only 455,149 SNVs in the whole data set were common (MAF > 0.05), which is considerably fewer than the 5.74 million reported from fresh passerine bird feathers, or the 1.1 million derived from a combined set of tissue and feathers plucked from predated birds in WGS studies using similar sample sizes [5,51]. Moulted feather samples were characterised by high variability in genome coverage evenness and depth, which may have contributed to an artificial inflation of rare variant frequencies [21,25,27,57].
Nevertheless, despite these general tendencies, individual samples, regardless of feather type, deviated from the expected patterns, suggesting a more sample-specific interplay between DNA degradation and contamination in determining sequencing performance from moulted feathers. Interpreting the impact of DNA degradation or contamination on WGS outcomes based on feather size or condition remains highly speculative without direct assessment of host-DNA quality, which is one of the main limitations of the present study. In addition to the subjective nature of this approach, environmental conditions and stochastic weather events do not always cause feathers to age in line with DNA damage, reducing the reliability of visual predictions of sample performance [16,17].
Sequencing-quality measures, such as GC content and insert size, do not necessarily reflect DNA degradation directly and may be influenced by technical aspects of library preparation and sequencing [54,55]. Direct evaluation of DNA integrity across the entire sample set using tools such as TapeStation or Bioanalyzer would have enabled a more unbiased assessment of DNA fragmentation in relation to feather size or condition [58,59]. We also attempted to assess DNA purity with a Nanodrop, which was low for most samples. However, mean DNA concentrations were below the optimal range for Nanodrop measurements, reducing accuracy due to weak signal strength [60,61]. Total DNA yield can also be misleading, as it may be inflated by contaminant DNA irrespective of host-DNA quality or quantity [21,24,28]. Quantifying the host-DNA fraction within a sample using quantitative polymerase chain reaction or similar approaches could have provided a more objective measure of feather performance [21,24,28]. Moreover, low DNA yield in some samples limited input normalisation, which may have either enhanced or obscured the observed correlations. Finally, the limited diversity of the sample set, dominated by large, relatively well-preserved feathers and with small or highly degraded feathers under-represented, restricted our ability to assess how feather characteristics such as size, condition, blood clot presence, and DNA yield influence host-DNA quality and, consequently, WGS success.
5. Conclusions
Among the non-invasively collected Eurasian goshawk (A. gentilis) moulted feathers, large flight and tail feathers in good condition, particularly those with intact calami containing blood clots, were a better DNA source and performed more reliably throughout whole-genome sequencing workflows. However, substantial variability was observed even among visually similar feathers, limiting the predictive value of visual quality assessment alone and indicating that pre-screening to assess host DNA content would be beneficial. The observed sample failure rate indicates that collecting surplus material would be advantageous for improving the success rate of genomic analyses. Despite some potential limitations, our results support the use of moulted feathers as a valid and accessible DNA source for genome-wide re-sequencing studies of medium- to large-sized raptor species, even when using whole-genome sequencing protocols without specific adjustments.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Carroll E.L. Bruford M.W. De Woody J.A. Leroy G. Strand A. Waits L. Wang J. Genetic and Genomic Monitoring with Minimally Invasive Sampling Methods Evol. Appl.2018111094111910.1111/eva.1260030026800 PMC 6050181 · doi ↗ · pubmed ↗
- 2Bounas A. Tsaparis D. Gustin M. Mikulic K. SaràM. Kotoulas G. Sotiropoulos K. Using Genetic Markers to Unravel the Origin of Birds Converging towards Pre-Migratory Sites Sci. Rep.20188832610.1038/s 41598-018-26669-x 29844462 PMC 5974135 · doi ↗ · pubmed ↗
- 3Duntsch L. Whibley A. Brekke P. Ewen J.G. Santure A.W. Genomic Data of Different Resolutions Reveal Consistent Inbreeding Estimates but Contrasting Homozygosity Landscapes for the Threatened Aotearoa New Zealand Hihi Mol. Ecol.2021306006602010.1111/mec.1606834242449 · doi ↗ · pubmed ↗
- 4Geraldes A. Askelson K.K. Nikelski E. Doyle F.I. Harrower W.L. Winker K. Irwin D.E. Population Genomic Analyses Reveal a Highly Differentiated and Endangered Genetic Cluster of Northern Goshawks (Accipiter gentilis laingi) in Haida Gwaii Evol. Appl.20191275777210.1111/eva.1275430976308 PMC 6439496 · doi ↗ · pubmed ↗
- 5Kersten O. Star B. Leigh D.M. Anker-Nilssen T. Strøm H. Danielsen J. Descamps S. Erikstad K.E. Fitzsimmons M.G. Fort J. Complex Population Structure of the Atlantic Puffin Revealed by Whole Genome Analyses Commun. Biol.2021492210.1038/s 42003-021-02415-434326442 PMC 8322098 · doi ↗ · pubmed ↗
- 6Ramón-Laca A. White D.J. Weir J.T. Robertson H.A. Extraction of DNA from Captive-Sourced Feces and Molted Feathers Provides a Novel Method for Conservation Management of New Zealand Kiwi (Apteryx spp.)Ecol. Evol.201883119313010.1002/ece 3.379529607011 PMC 5869209 · doi ↗ · pubmed ↗
- 7Vallant S. Niederstätter H. Berger B. Lentner R. Parson W. Increased DNA Typing Success for Feces and Feathers of Capercaillie (Tetrao urogallus) and Black Grouse (Tetrao tetrix)Ecol. Evol.201883941395110.1002/ece 3.395129721270 PMC 5916295 · doi ↗ · pubmed ↗
- 8Rudnick J.A. Katzner T.E. Bragin E.A. Dewoody J.A. A Non-Invasive Genetic Evaluation of Population Size, Natal Philopatry, and Roosting Behavior of Non-Breeding Eastern Imperial Eagles (Aquila heliaca) in Central Asia Conserv. Genet.2008966767610.1007/s 10592-007-9397-9 · doi ↗
