# KANN: estimation of genetic ancestry profiles by nearest neighbor regression

**Authors:** Juha Riikonen, Sini Kerminen, Aki Havulinna, Matti Pirinen

PMC · DOI: 10.1093/nar/gkag209 · Nucleic Acids Research · 2026-03-12

## TL;DR

KANN is a fast and efficient method for estimating genetic ancestry using nearest neighbor regression, suitable for large-scale genomic studies.

## Contribution

KANN introduces a novel approach for ancestry estimation using k-nearest neighbor regression with continuous ancestry profiles.

## Key findings

- KANN's ancestry estimates align well with haplotype-based methods like SOURCEFIND in Finnish biobank data.
- KANN produces results comparable to ADMIXTURE in the 1000 Genomes Project dataset.
- KANN is computationally efficient and suitable for biobank-scale analyses.

## Abstract

State-of-the-art methods for inferring individual-level genetic ancestry are based on statistical models for haplotype data. Unfortunately, these methods are computationally demanding, making them impractical for biobank-scale analyses. In this paper, we describe KANN, an efficient k-nearest neighbor regression method for individual-level ancestry estimation with respect to predefined source populations using only principal components of genetic structure. Contrary to the existing tools that can only use reference samples with discrete source population assignment, KANN enables the use of reference samples with continuous ancestry profiles across multiple source populations. We observe that KANN’s ancestry estimates agree well with the haplotype-based method SOURCEFIND when estimating ancestry profiles across up to 10 Finnish source populations on a dataset of 18 125 Finnish samples from THL Biobank. In the 1000 Genomes Project data containing globally diverse genetic backgrounds, KANN produces highly similar results to the ADMIXTURE software. Based on our results, KANN is a promising tool for ancestry estimation in large-scale genomic studies.

Graphical Abstract

## Full-text entities

- **Genes:** PYDC2 (pyrin domain containing 2) [NCBI Gene 152138] {aka POP2, cPOP2}, POP1 (POP1 ribonuclease P/MRP subunit) [NCBI Gene 10940] {aka ANXD2}, POP4 (POP4 ribonuclease P/MRP subunit) [NCBI Gene 10775] {aka RPP29}, POP5 (POP5 ribonuclease P/MRP subunit) [NCBI Gene 51367] {aka HSPC004, RPP2, RPP20, hPop5}, PYDC5 (pyrin domain containing 5) [NCBI Gene 107181291] {aka POP3}
- **Diseases:** burn (MESH:D002056)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12980074/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12980074/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/PMC12980074/full.md

---
Source: https://tomesphere.com/paper/PMC12980074