Synthesizing Proteins on the Graphics Card. Protein Folding and the Limits of Critical AI Studies
Fabian Offert, Paul Kim, Qiaoyu Cai

TL;DR
This paper critically examines the use of transformer architectures in protein folding, highlighting their non-linguistic processing and the implications for understanding AI's role in scientific knowledge creation.
Contribution
It challenges the language paradigm in computational biology and reveals the non-linguistic, epistemological aspects of transformer-based protein folding models.
Findings
Transformers utilize non-linguistic, high-dimensional vector representations.
The analogy between language and proteins is historically and conceptually limited.
Transformer architecture creates a new epistemological space in scientific knowledge-making.
Abstract
This paper investigates the application of the transformer architecture in protein folding, as exemplified by DeepMind's AlphaFold project, and its implications for the understanding of so-called large language models. The prevailing discourse often assumes a ready-made analogy between proteins, encoded as sequences of amino acids, and natural language, which we term the language paradigm of computational (structural) biology. Instead of assuming this analogy as given, we critically evaluate it to assess the kind of knowledge-making afforded by the transformer architecture. We first trace the analogy's emergence and historical development, carving out the influence of structural linguistics on structural biology beginning in the mid-20th century. We then examine three often overlooked preprocessing steps essential to the transformer architecture, including subword tokenization, word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Cellular Automata and Applications · Modular Robots and Swarm Intelligence
