# Relativistic triangle–curvature computing for federated HIV-1 protein-sequence monitoring

**Authors:** Javier Villalba-Díez, Ana González-Marcos

PMC · DOI: 10.1038/s41598-025-32889-9 · Scientific Reports · 2026-01-03

## TL;DR

This paper introduces a new privacy-preserving method for analyzing HIV-1 protein sequences across distributed data sources without compromising privacy or accuracy.

## Contribution

A novel federated learning framework using relativistic triangle–curvature computing to improve clustering of HIV-1 protein sequences while preserving privacy.

## Key findings

- The curvature-aware model achieves strong global separation with a silhouette score of 0.826.
- The method attains tight clusters with a Davies–Bouldin score of 0.373.
- Communication overhead is minimal, sharing only public-set latents and one scalar per batch.

## Abstract

Sequence-only surveillance of rapidly evolving pathogens must extract clinically meaningful structure from protein sequences without labels, central data pooling, or strong assumptions about data homogeneity. Most existing sequence autoencoders either assume centralized, IID data or rely on heavy cryptographic protocols; in federated deployments they can leak geometric information through latents or gradients, suffer from client-specific rotations and sign flips of the latent basis, and ignore curvature of the latent manifold, which together degrade clustering quality and make privacy guarantees opaque. We introduce a relativistic triangle–curvature computing framework for unsupervised embeddings of full-length HIV-1 proteins under federated training. The method combines three linear-algebraic components: (i) radii attenuation, a controlled contraction \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$z\leftarrow d\,z$$\end{document} that lowers \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\ell _2$$\end{document}-sensitivity and provides an explicit information-retained ledger; (ii) triangle–curvature decoding, which estimates a batch-level scalar K from the (squared) Menger curvature of random latent triples and rescales \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$z\mapsto (1+\alpha _c K)z$$\end{document} to preserve inter-cluster geometry in curved regions; and (iii) align-then-average aggregation via orthogonal Procrustes on a small public reference set, followed by distillation of a central encoder on the aligned latent mean so that no private sequences are shared. Applied to 173,750 Los Alamos National Laboratory HIV-1 amino-acid sequences spanning nine proteins (Env, Gag, Pol, Nef, Rev, Tat, Vif, Vpr, Vpu), our curvature-aware model achieves the strongest global separation (silhouette 0.826) with low reconstruction error, while a simple radii schedule attains the tightest clusters (Davies–Bouldin 0.373, Calinski–Harabasz \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$9.72\times 10^{5}$$\end{document}). Eight proteins form near-perfect clusters; only the short accessory pair Tat/Vpr exhibits recurring overlap, which we flag for targeted downstream classifiers. Communication overhead is minimal because only public-set latents and one scalar K per batch are shared, making the approach suitable for privacy-preserving, federated sequence surveillance.

## Linked entities

- **Proteins:** ERVW-1 (endogenous retrovirus group W member 1, envelope), gag (Pr55(Gag)), ERVW-4 (endogenous retrovirus group W member 4), S100B (S100 calcium binding protein B), Rev (Revolute), TAT (tyrosine aminotransferase), vif (Vif), vpr (Vpr), SGTA (small glutamine rich tetratricopeptide repeat co-chaperone alpha)

## Full-text entities

- **Genes:** vif (Vif) [NCBI Gene 155459], Vpu [NCBI Gene 155945], vpr (Vpr) [NCBI Gene 155807], Env [NCBI Gene 155971], Tat [NCBI Gene 6898;155871], Nef [NCBI Gene 156110], gag-pol (Gag-Pol) [NCBI Gene 155348], gag (Pr55(Gag)) [NCBI Gene 155030], Rev [NCBI Gene 155908]
- **Species:** Human immunodeficiency virus 1 (no rank) [taxon 11676]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12830662/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12830662/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/PMC12830662/full.md

---
Source: https://tomesphere.com/paper/PMC12830662