Thermodynamics of Information Retrieval
Kostadin Koroutchev, Jian Shen, Elka Koroutcheva, Manuel Cebrian

TL;DR
This paper models word frequencies in text using thermodynamic concepts, proposing a novel approach to information retrieval that distinguishes keywords from common words based on thermodynamic properties.
Contribution
It introduces a thermodynamic framework for information retrieval using a gamma distribution model of word frequencies, providing new insights into word usage and retrieval efficiency.
Findings
Different words exhibit distinct thermodynamic signatures.
Thermodynamic properties can differentiate keywords from common words.
The approach offers potential advantages over traditional retrieval methods.
Abstract
In this work, we suggest a parameterized statistical model (the gamma distribution) for the frequency of word occurrences in long strings of English text and use this model to build a corresponding thermodynamic picture by constructing the partition function. We then use our partition function to compute thermodynamic quantities such as the free energy and the specific heat. In this approach, the parameters of the word frequency model vary from word to word so that each word has a different corresponding thermodynamics and we suggest that differences in the specific heat reflect differences in how the words are used in language, differentiating keywords from common and function words. Finally, we apply our thermodynamic picture to the problem of retrieval of texts based on keywords and suggest some advantages over traditional information retrieval methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Text Analysis Techniques · Topic Modeling
