The Crystal Structure of the Domain of Unknown Function 1480 (DUF1480) From Klebsiella pneumoniae
Dhruvin H. Patel, Nobuhiko Watanabe, Alexei Savchenko, Cameron Semper

TL;DR
This paper reports the first crystal structure of the DUF1480 protein from Klebsiella pneumoniae, offering insights into its potential role in antibiotic response.
Contribution
The paper provides the first experimental crystal structure of the DUF1480 domain, revealing its structural features and potential functional relevance.
Findings
DUF1480 has a six-stranded split barrel fold with a small alpha helix capping one end.
DUF1480 is monomeric in solution and structurally similar to response regulators.
The structure reveals conserved features that may be important for its function.
Abstract
Domains of unknown function (DUFs) continue to comprise a significant portion of bacterial proteomes, with more than 20% of bacterial proteins remaining annotated as DUFs. The characterization of their molecular structure can provide valuable insight that is not captured by the primary sequence analysis, thus providing a segue into the identification of the molecular function of DUF representatives. Here, we present the crystal structure of KPN_02352 from Klebsiella pneumoniae subsp. pneumoniae, a DUF1480 domain‐containing protein, which was determined to be 1.75 Å resolution. Representatives of the DUF1480 family are found broadly across Enterobacterales and have been previously shown to contribute to the antibiotic response. Our structural analysis suggests that DUF1480 is comprised of a six‐stranded split barrel fold featuring a small alpha helix that is positioned to cap one end of…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
FIGURE 1
FIGURE 2| KPN_02352 (PDB: 8VVA) | |
|---|---|
| Data collection | |
| Space group |
|
| Cell dimensions | |
|
| 78.591, 78.591, 62.492 |
|
| 90.0, 90.0, 90.0 |
| Wavelength | 0.9792 |
| Resolution (Å) | 41.53–1.75 |
| (1.78–1.75) | |
|
| 0.108 (2.173) |
|
| 0.036 (0.667) |
|
| 14.1 (1.7) |
| Completeness (%) | 99.7 (96.7) |
| Redundancy | 11.1 (10.7) |
| CC1/2 | 0.971 (0.320) |
| Refinement | |
| Resolution (Å) | 33.27–1.75 |
| No. reflections | 20 271 |
|
| 0.2049/0.2391 |
| No. of atoms | |
| Protein | 1221 |
| Water | 65 |
| Average B‐factors (Å2) | 31.88 |
| Protein | 31.58 |
| Water | 37.51 |
| R.m.s deviations | |
| Bond lengths (Å) | 0.0087 |
| Bond angles (°) | 1.18 |
| Ramachandran (%) | |
| Favored | 99.33 |
| Allowed | 0.67 |
| Disallowed | 0 |
- —National Institute of Allergy and Infectious Diseases 10.13039/100000060
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Bacterial Genetics and Biotechnology · RNA and protein synthesis mechanisms
Introduction
1
The rise in available genomic and proteomic data has led to the identification of a vast array of conserved protein domains and proteins for which the biological function is unknown. These domains of unknown function (DUFs) can be found across all domains of life, and it is estimated that approximately 20% of all proteins are categorized as DUFs [1]. DUFs are particularly common in bacterial species where there are currently more than 2500 known bacterial DUFs. A recent analysis of 16 model bacterial species found that 355 essential gene products contained a DUF or were solely comprised of a DUF, suggesting these uncharacterized proteins play important roles in critical biological processes [1]. Moreover, a number of studies have shown that DUFs or DUF‐containing proteins can play important roles in virulence [2], toxin‐antitoxin systems [3], pathogenesis [4], and antimicrobial resistance [5]. This highlights the need for continued investigation into DUFs including strategies that can help uncover insights into their biological functions, such as structural and biochemical characterization.
DUF1480 is represented by 78–80 amino acid proteins found in 831 species across the bacterial kingdom, including pathogenic species in Enterobacterales. While the members of this family have seldom been investigated outside high‐throughput screening studies, the available data hints at a diverse and important biological function of this protein. In single‐gene knockout experiments in the KEIO collection, E.coli K‐12 harboring a YebV/DUF1480 deletion was shown to display increased sensitivity to metronidazole, ampicillin, triclosan, and cefoxitin [5]. DUF1480 was shown to be tightly controlled by RpoS/σ^S^, a master regulator of general stress response, in the human pathogen Salmonella enterica serovar Typhimurium [6]. The DUF1480‐containing protein YebV was found enriched in the secretome of polymyxin B‐resistant E.coli, further implicating this family of proteins in antimicrobial response [7]. High‐throughput protein–protein interaction studies in E. coli identified a number of transcriptional regulators, including FIS, kdgR, purR, and yciT, as interactors of YebV/DUF1480 [8]. Taken together, these data point to a role for DUF1480 in cell stress response; however, further investigation is required to elucidate the role of these conserved proteins.
Here, we present the high‐resolution crystal structure of KPN_02352, a DUF1480 domain‐containing protein from the human pathogen Klebsiella pneumoniae subsp. pneumoniae. Our structure reveals that KPN_02352 consists of a fold primarily comprised of β‐strands that form a split barrel arrangement. A broader analysis of the DUF1480 family showed that the protein is highly conserved, with even the most distant orthologs sharing ~87% sequence identity. Using our crystal structure, we identified structural similarities between DUF1480 and other cellular response regulators, pointing to a potential functional role for this family of proteins.
Materials and Methods
2
Cloning, Expression, and Purification of DUF1480
2.1
The KPN_02352 ORF was PCR amplified from K. pneumoniae subsp. pneumoniae genomic DNA and cloned into the pMCSG53 [9] vector via T4 ligation‐independent cloning. The plasmid was transformed into BL21 Gold (Agilent) for protein expression. Protein purification proceeded as described in a previous study [10]. Briefly, the cells were grown at 37°C and 175 rpm until the OD_600_ reached 0.8, after which the cells were induced by the addition of IPTG to a final concentration of 0.5 mM. Cells were incubated at 20°C for 16 h, then harvested via centrifugation and resuspended in binding buffer [300 mM NaCl, 50 mM HEPES pH 7.5, 5 mM imidazole, 5% glycerol]. Cells were lysed via sonication, clarified via centrifugation, and then incubated with Ni‐NTA resin for 1 h. KPN_02352 was eluted in elution buffer [300 mM NaCl, 50 mM HEPES pH 7.5, 250 mM imidazole, 5% glycerol], and then incubated with tobacco‐etch virus (TEV) protease to cleave the 6His tag while being dialyzed to remove imidazole. The protein was passed over a second Ni‐NTA column and then dialyzed into a crystallization buffer [300 mM NaCl, 10 mM HEPES pH 7.5]. For SelMet‐derivatized protein, the protein was expressed in M9 selenomethionine medium for E. coli (Shanghai Medicilon) without deviation from the manufacturer's protocol. Purification of SelMet‐DUF1480 proceeded as described above.
Crystallization
2.2
Crystals of the native and Selenomethionine‐derivatized protein were grown at 298 K in 25% PEG 3350/0.1 M citric acid pH 3.5 via the vapor diffusion sitting‐drop method. Prior to data collection, crystals were soaked in 25% PEG 3350/0.1 M citric acid pH 3.5/30% glycerol and then frozen in liquid nitrogen.
Data Collection, Structure Determination, and Refinement
2.3
The data set for the Selenomethionine‐derivatized crystals of KPN_02352 was collected at 100 K on the 21‐1D‐F beamline at the Advanced Photon Source at Argonne National Laboratory. Diffraction data was processed using HKL‐3000 [11] and the structure was solved using single‐wavelength anomalous diffraction (SAD) using PHENIX.solve [12]. The model was built using PHENIX.autobuild [12] and manually refined using Coot [13]. PyMOL was used to generate the figures. Structure factors and coordinates for DUF1480 were deposited to the Protein Data Bank under the identifier 8VVA.
Structure Analysis
2.4
Searches for structural homologs of KPN_02352 were performed using Foldseek [14] and DALI [15] server. Electrostatic surface potential was calculated using APBS [16] in PyMol. Consurf [17] was used to map amino acid conservation onto the structure, which was then visualized using PyMol.
Results
3
In an effort to structurally characterize DUF1480 representatives, we purified the DUF1480 domain‐containing proteins YebV from E. coli and KPN_02352 from Klebsiella pneumoniae. The native K. pneumoniae KPN_02352 protein was crystallized in 25% PEG 3350/0.1 M citric acid pH 3.5. Interestingly, we were unable to produce crystals of the E. coli YebV protein despite a high level (87%) of shared amino acid sequence identity between the two DUF1480 orthologs. We used a SelMet‐derivatized protein sample, which crystallized under the same conditions, to determine the structure of KPN_02352 by the SAD approach. The structure was further refined to a resolution of 1.75 Å. The crystallographic statistics are summarized in Table 1.
The overall structure of KPN_02352 features an antiparallel β sheet consisting of six strands with a small α‐helix formed on the loop connecting the β‐strands three and four (Figure 1A). The β‐strands are arranged into a split barrel arrangement allowing for the formation of a hydrophobic core. The split barrel initiates in two short β‐strands (β1 and β2) in an antiparallel arrangement separated by a short hairpin (Figure 1B). A long coil spans the split portion of the barrel leading into β3 which establishes the other “wall” of the split barrel. Just prior to β3, one of two loops that protrude outward from the structure is present. A short coil after β3 leads to α1, the lone alpha helix found in the KPN_02352 structure. The α‐helix is found adjacent to one end of the split barrel motif and is anchored in place by a disulfide bridge formed between C32 and C39 residues (Figure 1C). After α1, a coil leads to the β4‐strand, which interacts with β1 and β5 to establish the contiguous “back” portion of the split barrel. A short hairpin turn separates β4 from β5, the longest strand in the structure, and another turn leads to β6 which is sandwiched between β4 and β5. The loop between β5 and β6 is the other loop found protruding outward from the core structure of the protein. Of note is the rotated arrangement of β5 which allows for it to form hydrogen bonds with both β3 and β6 and serve as the “linchpin” of the split barrel arrangement. Analysis of the electrostatic potential of the surface of the KPN_02352 structure showed it to be a primarily electroneutral molecule. The one exception is a small electronegative patch found along the split portion of the barrel near the alpha helix (Figure 1D). The electronegative patch was comprised of the sidechains of residues E12, D14, D15, S34, D35 and D37 (Figure 1D).
The structure of KPN_02352/DUF1480. (A) Topological arrangement of KPN_02352/DUF1480. (B) Cartoon representation of KPN_02352 colored from N‐terminus (blue) to C‐terminus (red). (C) Disulfide bond found in KPN_02352. The protein is shown in the cartoon representation, while C32 and C39 are shown in the stick representation. The Fo‐Fc map corresponding to the electron density around the disulfide bond is shown in magenta. (D) Electrostatic surface potential of KPN_02352.
KPN_02352 crystallized with two molecules in the asymmetric unit, arranged in a head‐to‐head manner. To investigate the potential biological significance of this arrangement, we analyzed the KPN_02352 oligomerization state in solution using size exclusion chromatography (SEC). Our results showed that KPN_02352 is monomeric in solution, producing a monodisperse peak corresponding to the expected monomeric size of 8.96 kDa (Figure 2A). Of note, both monomers found in the asymmetric unit contained the C32–C39 disulfide bridge. While this feature is expected to represent an artifact of crystallization, given the protein is expected to be localized to the reductive environment of the bacterial cell's cytoplasm, it locks the α1‐helix in place where it occludes one end of the split barrel.
Structure analysis of KPN_02352/DUF1480. (A) Size exclusion chromatogram of KPN_02352. Fractions collected are demarcated in red on the chromatogram, and the SDS‐PAGE inset numbers correspond to these fractions. The result shows KPN_02352 is monomeric in solution. (B) The distribution of DUF1480 in Enterobacterales. (C) Consurf analysis of KPN_02352/DUF1480 based on a multiple sequence alignment, showing the distribution of conserved residues throughout the protein. (D) Structural comparison of a pilZ domain from P. aeruginosa (PDB: 4XRN) with KPN_02352 (PDB: 8VVA).
As mentioned above, the representatives of DUF1480 are broadly distributed among Enterobacterales species and are found in a number of human pathogens including K. pneumoniae, S. enterica, Yersinia pestis, Shigella sp., and Proteus mirabilis (Figure 2B). Analysis of primary sequence across DUF1480 representatives revealed significant conservation; a pairwise alignment of two of the most phylogenetically diverse orthologs found within this protein family (K. pneumoniae and Mezorhizobium sp.) showed that these two proteins still share 87.3% amino acid sequence identity. To further analyze the conservation within this protein family we mapped the positional conservation onto our KPN_02352 crystal structure using Consurf [17]. The results of this analysis show high levels of conservation throughout the protein (Figure 2C). Many of the amino acids localized to key secondary structural elements are highly conserved or invariant, including 7/8 of the residues forming the α1‐helix. Both the cysteine residues involved in disulfide bond formation (C32 and C39) were highly conserved, but not invariant, suggesting that the disulfide bridge used to stabilize the position of α1 relative to the split barrel may not be essential for the maturation of the protein into its tertiary structure. Our analysis identified a small tract of variability localized to the loop region immediately prior to β3. This loop protrudes outward from the structure along the same face as one of the “walls” of the barrel.
To further examine KPN_02352 and DUF1480 more broadly we performed searches for structural homologs to the specific fold found in our crystal structure. The majority of the hits identified were uncharacterized proteins harboring a split barrel domain. One functionally annotated family of proteins that were identified as structurally similar to DUF1480 was that of the pilZ domain. Direct comparison between KPN_02352 and a pilZ domain from Pseudomonas aeruginosa (PDB: 4XRN) showed that despite the proteins sharing low (~21%) amino acid sequence identity, they share a similar split barrel structure that aligned with an RMSD value of 2.06 Å across 78 Cα residues (Figure 2D). Proteins that contain pilZ domains have been shown to be response regulators, binding to c‐di‐GMP and then commonly influencing bacterial motility [18, 19]. The pilZ domain typically spans ~110 residues allowing for additional secondary structure elements lacking in DUF1480. Notable features present in KPN_02352, such as the disulfide bond and the localized patch of electronegativity on the surface, are not found in the pilZ crystal structure. Most importantly, the canonical c‐di‐GMP binding motif appears to be lacking in KPN_02352 and DUF1480 in general, indicating the two protein families very likely diverge functionally, at least with respect to metabolite binding.
Discussion and Conclusions
4
Despite increasingly sophisticated tools for in silico analysis of genomic data, a significant proportion of protein families remain uncharacterized and annotated as DUFs. Here, we report the high‐resolution crystal structure of K. pneumoniae KPN_02352/DUF1480, a conserved DUF found widely within Enterobacterales. Our structure revealed that KPN_02352 is comprised of a compact split barrel domain capped on one end by a small α‐helix. The barrel featured a hydrophobic core with a sole α‐helix anchored in place via a disulfide bond formed between two cysteine residues (C32 and C39). A broader analysis of annotated DUF1480 sequences suggested that the members of this family retain a high level of primary sequence conservation, suggesting that the overall fold is critical for the function of this protein.
The crystal structure of DUF1480 did not provide direct hints about the molecular function of this protein. Our analysis did not reveal any obvious candidates for functionally important residues. The existing literature suggested that STM14_2239 (S. enterica DUF1480 domain‐containing protein) gene expression is controlled by RpoS/σ^S^ which is involved in stress response, and DUF1480 has been linked to bacterial response to antibiotics and interactions with a number of transcriptional regulators in other species [5, 6, 8]. Based on this information, we hypothesized that DUF1480 may be functioning as a cell stress response regulator. Our search for structural homologs identified the pilZ domain as the closest functionally characterized structural homolog. PilZ domains are split barrel domains that bind c‐di‐GMP, an important second messenger in cells, resulting in changes to bacterial motility. The identification of a structural homolog playing a role as a response regulator supports our hypothesis that DUF1480 may function as a stress response regulator; however, additional experimentation is required to further characterize the specific role of DUF1480.
Author Contributions
Dhruvin H. Patel: investigation, formal analysis. Nobuhiko Watanabe: formal analysis. Alexei Savchenko: supervision, funding acquisition, writing – review and editing, conceptualization. Cameron Semper: conceptualization, investigation, writing – original draft, writing – review and editing, formal analysis, supervision.
Conflicts of Interest
The authors declare no conflicts of interest.
Peer Review
The peer review history for this article is available at https://www.webofscience.com/api/gateway/wos/peer‐review/10.1002/prot.26752.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1N. F. Goodacre , D. L. Gerloff , and P. Uetz , “Protein Domains of Unknown Function Are Essential in Bacteria,” M Bio 5, no. 1 (2013): e 00713–e 00744.10.1128/m Bio.00744-13PMC 388406024381303 · doi ↗ · pubmed ↗
- 2G. Condemine and B. Le Derout , “Identification of New Dickeya Dadantii Virulence Factors Secreted by the Type 2 Secretion System,” P Lo S One 17, no. 4 (2022): e 0265075.35417462 10.1371/journal.pone.0265075 PMC 9007343 · doi ↗ · pubmed ↗
- 3T. Kurata , C. K. Saha , J. A. Buttress , et al., “A Hyperpromiscuous Antitoxin Protein Domain for the Neutralization of Diverse Toxin Domains,” Proceedings of the National Academy of Sciences of the United States of America 119, no. 6 (2022): e 2102212119.35121656 10.1073/pnas.2102212119 PMC 8832971 · doi ↗ · pubmed ↗
- 4B. Lobb , B. J. Tremblay , G. Moreno‐Hagelsieb , and A. C. Doxey , “Path Fams: Statistical Detection of Pathogen‐Associated Protein Domains,” BMC Genomics 22, no. 1 (2021): 663.34521345 10.1186/s 12864-021-07982-8PMC 8442362 · doi ↗ · pubmed ↗
- 5A. Liu , L. Tran , E. Becket , et al., “Antibiotic Sensitivity Profiles Determined With an Escherichia Coli Gene Knockout Collection: Generating an Antibiotic Bar Code,” Antimicrobial Agents and Chemotherapy 54, no. 4 (2010): 1393–1403.20065048 10.1128/AAC.00906-09PMC 2849384 · doi ↗ · pubmed ↗
- 6M. Lago , V. Monteil , T. Douche , et al., “Proteome Remodelling by the Stress Sigma Factor Rpo S/Sigma(S) in Salmonella: Identification of Small Proteins and Evidence for Post‐Transcriptional Regulation,” Scientific Reports 7, no. 1 (2017): 2127.28522802 10.1038/s 41598-017-02362-3PMC 5437024 · doi ↗ · pubmed ↗
- 7D. H. Yang , S. Liu , L. Cao , et al., “Quantitative Secretome Analysis of Polymyxin B Resistance in Escherichia Coli,” Biochemical and Biophysical Research Communications 530, no. 1 (2020): 307–313.32828304 10.1016/j.bbrc.2020.07.010 · doi ↗ · pubmed ↗
- 8M. Babu , R. Arnold , C. Bundalovic‐Torma , et al., “Quantitative Genome‐Wide Genetic Interaction Screens Reveal Global Epistatic Relationships of Protein Complexes in Escherichia Coli,” P Lo S Genetics 10, no. 2 (2014): e 1004120.24586182 10.1371/journal.pgen.1004120 PMC 3930520 · doi ↗ · pubmed ↗
