# A mathematical model for universal semantics

**Authors:** Weinan E, Yajun Zhou

arXiv: 1907.12293 · 2022-02-07

## TL;DR

This paper introduces a universal, language-independent mathematical model that captures word meanings through numerical fingerprints, enabling cross-lingual understanding and semantic analysis without external resources.

## Contribution

It presents a novel Markov process-based semantic model that derives low-dimensional, interpretable vectors representing concepts across multiple languages.

## Key findings

- Effective extraction of topics and synonyms from texts.
- Successful cross-language text matching and translation.
- Quantification of word meanings across 14 languages.

## Abstract

We characterize the meaning of words with language-independent numerical fingerprints, through a mathematical analysis of recurring patterns in texts. Approximating texts by Markov processes on a long-range time scale, we are able to extract topics, discover synonyms, and sketch semantic fields from a particular document of moderate length, without consulting external knowledge-base or thesaurus. Our Markov semantic model allows us to represent each topical concept by a low-dimensional vector, interpretable as algebraic invariants in succinct statistical operations on the document, targeting local environments of individual words. These language-independent semantic representations enable a robot reader to both understand short texts in a given language (automated question-answering) and match medium-length texts across different languages (automated word translation). Our semantic fingerprints quantify local meaning of words in 14 representative languages across 5 major language families, suggesting a universal and cost-effective mechanism by which human languages are processed at the semantic level. Our protocols and source codes are publicly available on https://github.com/yajun-zhou/linguae-naturalis-principia-mathematica

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.12293/full.md

## Figures

41 figures with captions in the complete paper: https://tomesphere.com/paper/1907.12293/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1907.12293/full.md

---
Source: https://tomesphere.com/paper/1907.12293