Tracing the Genealogies of Ideas with Large Language Model Embeddings

Lucian Li

arXiv:2402.01661·cs.CL·November 20, 2024·1 cites

Tracing the Genealogies of Ideas with Large Language Model Embeddings

Lucian Li

PDF

Open Access

TL;DR

This paper introduces a new computational method using large language model embeddings to trace intellectual influence and idea evolution across extensive textual corpora, capturing semantic and structural similarities.

Contribution

The paper presents a novel ensemble approach combining semantic and structural embeddings to detect ideas and influence in large, diverse textual datasets, including 19th-century publications.

Findings

01

Effective detection of ideas across 400,000 texts

02

Capable of identifying Darwinian influence in texts

03

Robust to paraphrasing and structural variations

Abstract

In this paper, I present a novel method to detect intellectual influence across a large corpus. Taking advantage of the unique affordances of large language models in encoding semantic and structural meaning while remaining robust to paraphrasing, we can search for substantively similar ideas and hints of intellectual influence in a computationally efficient manner. Such a method allows us to operationalize different levels of confidence: we can allow for direct quotation, paraphrase, or speculative similarity while remaining open about the limitations of each threshold. I apply an ensemble method combining General Text Embeddings, a state-of-the-art sentence embedding method optimized to capture semantic content and an Abstract Meaning Representation graph representation designed to capture structural similarities in argumentation style and the use of metaphor. I apply this method to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods