Discovery and Genomic Characterisation of Novel Papillomaviruses in Australian Wild Birds
Subir Sarker, Vasilli Kasimov, Md. Mizanur Rahaman, Babu Kanti Nath, Martina Jelocnik

TL;DR
This study discovers and characterizes two new bird papillomaviruses in Australian wild birds, highlighting the need for further research on their potential disease impacts.
Contribution
The study reports the discovery of two novel avian papillomaviruses from Australian wild birds and their genomic characterization.
Findings
Two novel avian papillomaviruses, TsPV1 and CsPV1, were identified from sacred kingfisher and little corella birds in Queensland.
TsPV1 and CsPV1 genomes are nearly identical and phylogenetically grouped with other avian papillomaviruses from related bird species.
The viruses share moderate sequence identity with previously known avian papillomaviruses from North America and Canada.
Abstract
Papillomaviruses are small, circular DNA viruses that infect epithelial and mucosal cells, which have co-evolved with their hosts over time. While certain mammalian papillomaviruses—especially those linked to disease—are well studied, there is limited knowledge about papillomaviruses associated with avian species. In this study, we identified two avian papillomaviruses from eye/choana swabs of the sacred kingfisher (Todiramphus sanctus) and the little corella (Cacatua sanguinea), collected in Queensland, Australia. The genomes of these viruses, designated as todiramphus sanctus papillomavirus 1 (TsPV1) and cacatua sanguinea papillomavirus 1 (CsPV1), were found to be 7883 and 7825 base pairs in length, respectively. The TsPV1 and CsPV1 genomes exhibited the highest nucleotide sequence identity (>56%) with papillomavirus genomes previously sequenced from mallards or wild ducks in the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMolecular Biology Techniques and Applications · Animal Virus Infections Studies · Genetic diversity and population structure
1. Introduction
The Papillomaviridae family comprises a broad and diverse group of non-enveloped, double-stranded DNA viruses that infect a wide variety of vertebrate species, including mammals, birds, reptiles, and fish [1,2,3,4,5,6,7,8]. As classified by the International Committee on Taxonomy of Viruses (ICTV), this family currently consists of more than 50 recognised genera and over 340 species [9,10]. While many papillomavirus (PV) infections are asymptomatic, certain PVs can cause benign epithelial growths, such as warts, which can sometimes develop into malignant lesions, including squamous cell carcinoma and other cancers affecting the skin or mucosa [2,3,4,5,6]. Although human papillomaviruses (HPVs) have been the most thoroughly studied within the Papillomaviridae family, advances in genomic sequencing have identified various PV types affecting other animal species, including birds. Avian papillomaviruses (APVs) represent a distinct and important subgroup within the Papillomaviridae family. They are currently classified within the genera Thetapapillomavirus, Etapapillomavirus, Dyoepsilonpapillomavirus, Dyozetapapillomavirus, Treisepsilonpapillomavirus, and Treiszetapapillomavirus. Despite their taxonomic recognition, APVs remain comparatively underexplored. Infected birds generally develop benign papillomas on their skin or mucosa, with the impact on their health varying according to the size and location of the lesions.
Structurally, APVs share similarities with other PVs, featuring a double-stranded, circular DNA genome typically between 5.7 and 8.6 kilobases (kb), enclosed within an icosahedral capsid [10]. The genome encodes several open reading frames (ORFs) that encode both structural proteins (L1 and L2) and non-structural proteins (E1–E9) involved in viral replication, cellular transformation, and capsid assembly [11,12]. The PV genome also contains a regulatory segment known as the long control region (LCR), with some viruses having an additional non-coding region. Although PVs share a conserved genomic structure, diversity in the E proteins exists across species, while the essential ORFs—E1, E2, L2, and L1—are consistently present across all PV genomes [13]. Despite these genetic similarities, APVs differ in evolutionary history and host specificity, making them valuable for studies in comparative viral genomics.
Traditionally, PVs were considered highly host-specific, evolving in close association with their host species, suggesting a pattern of host-specific codivergence. However, recent studies indicate that cross-species transmission may be more common than previously believed, especially among closely related host species [14,15,16]. For example, cases of cross-species transmission, such as bovine PVs infecting other herbivores and certain felids, suggest that host specificity in PVs may be more flexible than once assumed [17,18].
While much of the research on PV diversity has focused on mammals, there remains limited understanding of APVs. The earliest confirmed APV cases date to the 1970s, with PV-like particles detected in squamous papillomas in chaffinches using electron microscopy [19]. Subsequently, APVs have been found in other bird species, including the African grey parrot and the northern fulmar, in association with cutaneous tumours or lesions [20,21,22]. While the clinical manifestations of APV infections are generally benign, certain strains have shown the capacity for more invasive disease, particularly when co-infections or environmental stressors are present [19,20,21,22]. Recent advances in next-generation sequencing and molecular diagnostics are facilitating the identification, characterisation, and study of APV diversity, shedding light on their evolutionary pathways and potential for cross-species transmission.
This study reports the complete genome sequences and molecular characterization of two novel APVs detected in swabs collected from the sacred kingfisher (Todiramphus sanctus) and the little corella (Cacatua sanguinea). These findings improve our understanding of APV diversity, host–virus interactions, and evolutionary dynamics, which are essential for understanding broader implications for avian health and disease ecology.
2. Materials and Methods
2.1. Sampling and DNA Extraction
A subset of DNA samples (n = 11) from four wild avian groups—parrots, pigeons, kingfishers, and raptors—was selected for further investigation (see Supplementary Table S1 for details). Notably, the majority of these DNA samples harboured multiple avian pathogens, including Chlamydiaceae, beak and feather disease virus (BFDV), avipoxviruses, columbid alphaherpesvirus 1 (CoAHV1), and psittacid alphaherpesvirus 1 (PsAHV1) (Supplementary Table S1) [23]. These birds were admitted to the Australia Zoo Wildlife Hospital (AZWH, Beerwah, QLD, Australia) for various health issues, including clinical diseases and trauma. The attending veterinarians managed the pre-sampling, admission, care, and euthanasia processes. Approval for sampling from the euthanized birds was granted by the University of the Sunshine Coast Animal Research Ethics Committee (ANE1940, ANE2057) [23]. Genomic DNA was extracted from swab samples collected from the eye, liver, or eye/choana (Supplementary Table S1) using the QiaAMP DNA Mini Kit, following the manufacturer’s guidelines (Qiagen, Clayton, VIC, Australia). The eluted DNA was stored at −20 °C for subsequent analysis [23].
2.2. Next-Generation Sequencing
The quantity and quality of the extracted DNA were assessed using a Qubit dsDNA high-sensitivity assay kit with Qubit Fluorometer v4.0 (Thermo Fisher Scientific, Waltham, MA, USA). Library construction was performed using the Illumina DNA Prep (Illumina, San Diego, CA, USA) as per kit instructions, starting with 200 ng of DNA quantified using the Qubit Fluorometer v4.0 (Thermo Fisher Scientific, USA). The quality and quantity of the prepared library were evaluated by the Australian Genome Research Facility, Melbourne, Australia. Cluster generation and the sequencing of the library were performed with 150 bp paired-end reads on the Illumina^®^ NovaSeq chemistry, according to the manufacturer’s instructions.
2.3. Bioinformatic Analyses
The resulting raw sequencing reads were analysed as per the established pipeline [24,25] using Geneious Prime^®^ (version 2023.1.1, Biomatters, Auckland, New Zealand). Briefly, preliminary quality evaluation for all raw reads was generated and pre-processed to remove ambiguous base calls and poor-quality reads, and trimmed to remove the Illumina adapter sequences. Trimmed sequence reads were mapped against the chicken genome Gallus gallus (GenBank accession no. NC006088.5) to remove likely host DNA contamination. In addition, the reads were further mapped to the Escherichia coli bacterial genomic sequence (GenBank accession no. U00096) to remove possible bacterial contamination. The remaining cleaned and unmapped reads were used for de novo assembly using a SPAdes assembler (version 3.10.1) [26] in Geneious. The resulting contigs were compared against the nonredundant nucleotide and protein databases on GenBank using BLASTN and BLASTX [27], respectively, with an E-value threshold of 1 × 10^−5^ to remove potential false positives. Contigs that had significant BLAST (version 2.16.0) hits with bacteria, eukaryotes, or fungi were filtered out to remove non-viral reads. Virus contigs of interest greater than 500 nucleotides (nt) were imported into Geneious Prime^®^ (version 2023.1.1) for further functional analysis. The detected viruses were annotated using Geneious Prime^®^ (version 2023.1.1), where genus-specific published viruses were used as a reference guideline.
2.4. Comparative Genomics and Phylogenetic Analysis
The genomic comparisons of the newly sequenced complete viral genomes of papillomaviruses were visualised using clinker [28], Base-by-Base [29], and Geneious (version 2023.1.1). The pairwise sequence similarities between the selected PVs sequences were identified against representative avian papillomaviruses sequences by Base-by-Base and MAFFT software (Version 11.0.11) [29,30,31].
Phylogenetic analysis of the novel PV genome sequences identified in this study was conducted alongside selected papillomavirus genome sequences available in the GenBank database. APV sequences were downloaded from GenBank in July 2024 (Table 1). The amino acid sequences of the L1 gene and the nucleotide sequences of the selected complete PVs genomes were aligned using MAFFT (version 7.450) with the G-INS-i algorithm (gap open penalty 1.53; offset value 0.123) within Geneious. Maximum likelihood (ML) phylogenetic trees were constructed using LG (L1 gene) and GTR (complete genome) substitution models with 1000 bootstrap replicates in Geneious. Human papillomavirus 41 (GenBank accession number, X56147) was used as an outgroup.
3. Results
3.1. Genomes of Two Novel Avian Papillomaviruses
Two complete papillomavirus (PV) genomes were detected in DNA extracted from the eye tissue of a sacred kingfisher (T. sanctus) and a little corella (C. sanguinea), corresponding to a prevalence rate of 18.18%. The samples were collected from separate locations in Queensland—Peachester (sacred kingfisher) in October 2020 and Burpengary (little corella) in November 2020. The genomes measured 7883 bp (average coverage of 260.68×) and 7825 bp (average coverage of 10.20×) in length, respectively. The GC content for the todiramphus sanctus papillomavirus 1 (TsPV1) and cacatua sanguinea papillomavirus 1 (CsPV1) complete genome is 57.8%. The genomes of TsPV1 and CsPV1 sequenced in this study showed the highest nucleotide sequence identity (>56%) with a PV genome sequenced from a mallard in the United States (GenBank accession no. PP057987) [7], followed by a black-legged Kittiwake (>54%) from Newfoundland Canada (GenBank accession no. MK620305) [6] and an Atlantic puffin from Newfoundland Canada (>54%) (GenBank accession no. MK620302) (Table 1). In addition, the two PV genomes sequenced in this study were almost identical (99.69% nucleotide identity at the genomic level).
3.2. Comparative Analyses
Like other papillomaviruses, APVs identified in this study exhibited the characteristic genome structure, including the four core ORFs encoding the proteins L1, L2, E1, and E2. The ORFs for the E6 and E9 proteins, commonly found in most papillomaviruses, were also present. Additionally, an extra ORF encoding the hypothetical protein was identified (Figure 1A and Table 2).
The hexameric DNA helicase E1, the only enzyme encoded by papillomaviruses, is also the most conserved protein across these viruses [32]. In APVs, the E1 proteins range from 587 to 722 amino acids in length, slightly larger than those found in mammalian papillomaviruses, which typically span 600–650 amino acids [6]. Similarly to other papillomaviruses, the E1 proteins of TsPV1 and CsPV1 were 702 amino acids long, sharing the highest sequence identity with duck papillomavirus 3 (64.84%, GenBank accession no. QBR99468.1). Notably, the E1 proteins of TsPV1 and CsPV1 are identical, showing 100% amino acid identity. The E1 gene in both TsPV1 and CsPV1 encodes a typical bipartite nuclear localisation signal within the N-terminal region (sequence: 187_RSKNSMPKRNAAGAIQVHGHDAAAPKRVRGP_217), which has a predicted score of 6.1 as determined by cNLS mapper [33]. As illustrated in Figure 1B, the E1 gene also encodes several conserved motifs common to the AAA+ (ATPases associated with diverse cellular activities) protein family, including Walker A (phosphate-binding loop: GXXXXGK[T/S]; specifically, GVPDSGKS), Walker B (ATP-binding domain: XXDD; represented as AIDD), and Walker C (sensor 1: XX[T/S][T/S]N; represented as XXSSN). These motifs—highlighted by their conserved residues (shown in bold)—are well conserved across APV helicase domains.
Additionally, the DNA-binding domain, located within the C-terminal region of the E2 protein, serves as a key regulator of viral transcription and replication [34]. This sequence (GXTXQ[L/V]KTIRXR; position in TsPV1 and CsPV1, 325_GYTGQLKTIRHR_336) is also highly conserved across all papillomaviruses.
The L2 protein of the TsPV1 and CsPV1 showed the highest identity with duck papillomavirus (>44%, GenBank accession no. ANN29878.1) sequenced from India in 2014 [35]. Distinctive conserved motifs were identified within the L2 structural protein across all APVs (Figure 1C). Among these were the furin cleavage motif (RX[K/R]R) located in the N-terminal region of both TsPV1 and CsPV1, along with a variable number of transmembrane GXXXG domains and a sorting nexin 17 (SNX17)-binding site ([F/Y]XNPX[F/Y]). The furin cleavage site is crucial for viral entry, while the transmembrane domains and SNX17-binding motif are likely involved in endosomal escape [12]. Additionally, a syntaxin 18-binding site (D[Q/K]xL[Q/K]), which facilitates viral transport toward the nucleus [12], was also found in both TsPV1 and CsPV1.
The major capsid protein (L1) of papillomavirus in both TsPV1 and CsPV1 was 1551 bp in nucleotide length, displaying the highest amino acid similarity with duck papillomavirus (>65%, GenBank accession no. QBR99472), which was sequenced in India in 2014 [35]. The L1 proteins of TsPV1 and CsPV1 were found to be 100% identical.
3.3. Evolutionary Relationships of APVs
Phylogenetic analysis, using both the individual L1 gene (Figure 2A) and the complete genome sequences of the selected PVs (Figure 2B), provides support for the classification of the newly identified APVs in this study. The phylogenetic tree based on the individual L1 gene sequences of the selected APVs (Figure 2A) demonstrated a similar topology to the maximum likelihood (ML) tree generated from the complete genome sequences of the selected papillomaviruses (Figure 2B). As shown in Figure 2, both trees supported the formation of a subclade comprising papillomaviruses infecting the mallard (AplaPV1 and AplaPV3), African grey parrot (PePV1), common chaffinch (FcPV1), and Atlantic canary (ScPV1), reflecting their potential phylogenetic relationships. However, the phylogenetic positions of the TsPV1 and CsPV1 detected in this study lacked strong bootstrap support (maximum of 63%), indicating no obvious close relationship with other known papillomaviruses. This suggests that these viruses may represent an intermediate evolutionary lineage distinct from the previously identified avian papillomaviruses.
3.4. Taxonomic Identification
According to the International Committee for the Taxonomy of Viruses (ICTV), papillomavirus classification is determined through a combination of nucleotide identity thresholds for the L1 gene and the corresponding phylogenetic analysis. The ICTV has established distinct phylogenetic cut-offs for demarcating taxa—60% for genera, 70% for species, and 90% for types [10]. The average pairwise identities between the L1 nucleotide sequences of each APV type were calculated and are shown in Table 3.
The nucleotide sequence identity of the L1 gene of TsPV1 and CsPV1 was 100%, but they were found to be highly divergent compared to other APVs (Table 3 and Figure 3). Specifically, the L1 gene of TsPV1 and CsPV1 shared approximately 64% pairwise identity with AplaPV1 and AplaPV3, and around 60% identity with members of the genus Etapapillomavirus, including ScPV1 and FcPV1. \Based on these sequence similarities, TsPV1 and CsPV1 likely belong to the same genus as AplaPV1 and AplaPV3 (which is yet to be assigned). However, the detected papillomaviruses likely meet the criteria to be classified under a novel papillomavirus genus, with the highest pairwise identity being approximately 64%.
4. Discussion
This study reports the characterisation of two novel avian papillomaviruses (APVs), TsPV1 and CsPV1, detected from swabs taken from the eye tissues of the sacred kingfisher and little corella, respectively. Both viruses exhibit typical papillomavirus genomic structures, comprising four core open reading frames (ORFs) for L1, L2, E1, and E2, as well as the additional E6 and E9 ORFs found in most papillomaviruses. The presence of conserved motifs and structural elements across both genomes, along with the high nucleotide identity observed between them (99.69%), suggests these viruses are closely related and may represent a unique evolutionary lineage within the avian papillomaviruses.
Although papillomaviruses have been extensively studied in humans and certain domestic animals, their prevalence and ecological dynamics in wild avian species remain relatively underexplored. This study’s finding of a relatively high prevalence rate (18.18%) of papillomaviruses in the sampled avian population is particularly noteworthy. The observed prevalence aligns with earlier reports suggesting that papillomaviruses may be more widespread among avian species than previously recognised [6,13]. Several factors may contribute to these elevated prevalence rates, including the host species’ ecological characteristics, environmental conditions, and the mechanisms of virus transmission within bird populations. For instance, gregarious or communal species such as the sacred kingfisher (T. sanctus) and the little corella (C. sanguinea) may facilitate viral spread through close physical contact or shared habitats [36].
In terms of genomic sequence identity, TsPV1 and CsPV1 demonstrated the highest similarity to papillomaviruses found in other avian species, with >56% nucleotide identity to a virus isolated from a mallard (wild duck) in the United States [7], and approximately 54% similarity to viruses in black-legged kittiwakes and Atlantic puffins from Canada [6]. The relatively low sequence identities compared to other APVs and their high similarity to each other suggest these viruses may be members of a distinct, previously unrecognized genus within the Papillomaviridae family. Moreover, the L1 gene of TsPV1 and CsPV1 shared approximately 64% pairwise identity with AplaPV1 and AplaPV3. Based on these sequence similarities, TsPV1 and CsPV1 are likely members of distinct species within the same genus as AplaPV1 and AplaPV3.
Our analysis highlights the E1 protein as a critical component for both TsPV1 and CsPV1, aligning with its role as the most conserved enzyme in papillomaviruses [32]. The E1 proteins of both viruses are 702 amino acids long, closely matching the sizes of other avian E1 proteins [6,7,37], which are generally longer than those of mammalian papillomaviruses [38]. The presence of conserved motifs, including the Walker A, B, and C domains, indicates that these viruses retain key functional domains essential for viral replication and helicase activity, which is necessary for efficient viral DNA replication.
The DNA-binding domain within the E2 protein, which regulates viral transcription and replication [34], is also highly conserved across TsPV1 and CsPV1, with the sequence 325_GYTGQLKTIRHR_336 being consistent with known APVs [6,7,37]. These conserved sequences highlight the evolutionary importance of E2’s regulatory function in the viral life cycle, particularly for maintaining control over viral genome replication and transcription [34].
In the L2 protein, conserved motifs such as the furin cleavage site, transmembrane GXXXG domains, and the SNX17- and syntaxin 18-binding motifs are consistent with those found in other APVs, suggesting conserved mechanisms for host cell entry, endosomal escape, and nuclear transport [12]. These findings underscore the critical role of the L2 protein in facilitating multiple steps of the infection process, from entry into host cells to trafficking within the host cytoplasm and nucleus.
Phylogenetic analyses, based on both the L1 gene and complete PVs genome sequences, reveal that TsPV1 and CsPV1 likely occupy an intermediate evolutionary position among APVs. The lack of strong bootstrap support in their phylogenetic positioning reflects their divergence from established avian papillomaviruses, which may indicate that these viruses represent an early branch or a novel sub-lineage. This is further supported by their high pairwise identity to each other and relatively low identity to other APVs, with the highest similarity in the L1 gene at ~64%—significantly below the threshold for species demarcation but close to the genus boundary. Moreover, according to the ICTV’s classification criteria [10], the L1 gene nucleotide divergence of TsPV1 and CsPV1 supports their designation as members of a new papillomavirus genus as AplaPV1, AplaPV3, distinct from currently known avian PV lineages. These findings contribute valuable insights into the diversity and evolutionary history of APVs and highlight the existence of yet-unknown viral lineages in avian species. This study also acknowledges the limitation of not including traditional viral isolation experiments. This constraint primarily stems from the initial limited sampling strategy, which was specifically designed for viral metagenomic analysis rather than for culture-based virus isolation.
5. Conclusions
The identification of TsPV1 and CsPV1 enriches our understanding of APVs, particularly those infecting non-traditional avian hosts such as the sacred kingfisher and little corella. The close genetic relationship between TsPV1 and CsPV1, coupled with their unique phylogenetic placement, suggests they may represent a novel papillomavirus genus. Future studies involving a broader range of avian hosts and geographic regions are essential to clarify the diversity and evolutionary pathways of APVs and to further elucidate the functional roles of conserved motifs in viral pathogenesis and host adaptation.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Varsani A. Kraberger S. Jennings S. Porzig E.L. Julian L. Massaro M. Pollard A. Ballard G. Ainley D.G. A novel papillomavirus in Adélie penguin (Pygoscelis adeliae) faeces sampled at the Cape Crozier colony, Antarctica J. Gen. Virol.2014951352136510.1099/vir.0.064436-024686913 · doi ↗ · pubmed ↗
- 2Nicholls P.K. Stanley M.A. The immunology of animal papillomaviruses Vet. Immunol. Immunopathol.20007310112710.1016/S 0165-2427(99)00165-810690928 · doi ↗ · pubmed ↗
- 3Campo M.S. Papillomavirus and disease in humans and animals Vet. Comp. Oncol.2003131410.1046/j.1476-5829.2003.00001.x 19379325 · doi ↗ · pubmed ↗
- 4Cubie H.A. Diseases associated with human papillomavirus infection Virology 2013445213410.1016/j.virol.2013.06.00723932731 · doi ↗ · pubmed ↗
- 5Mifsud J.C.O. Hall J. Van Brussel K. Rose K. Parry R.H. Holmes E.C. Harvey E. A novel papillomavirus in a New Zealand fur seal (Arctocephalus forsteri) with oral lesions NPJ Viruses 202421010.1038/s 44298-024-00020-w 40295655 PMC 11721157 · doi ↗ · pubmed ↗
- 6Canuti M. Munro H.J. Robertson G.J. Kroyer A.N.K. Roul S. Ojkic D. Whitney H.G. Lang A.S. New Insight into Avian Papillomavirus Ecology and Evolution from Characterization of Novel Wild Bird Papillomaviruses Front. Microbiol.20191070110.3389/fmicb.2019.0070131031718 PMC 6473165 · doi ↗ · pubmed ↗
- 7Olivo D. Kraberger S. Varsani A. New duck papillomavirus type identified in a mallard in Missouri, USA Arch. Virol.20241697710.1007/s 00705-024-06006-638517556 · doi ↗ · pubmed ↗
- 8Sarker S. Talukder S. Athukorala A. Whiteley P.L. The Spleen Virome of Australia’s Endemic Platypus Is Dominated by Highly Diverse Papillomaviruses Viruses 20251717610.3390/v 1702017640006931 PMC 11860646 · doi ↗ · pubmed ↗
