Let Models Speak Ciphers: Multiagent Debate through Embeddings

Chau Pham; Boyi Liu; Yingxiang Yang; Zhengyu Chen; Tianyi Liu; Jianbo; Yuan; Bryan A. Plummer; Zhaoran Wang; Hongxia Yang

arXiv:2310.06272·cs.CL·February 27, 2024

Let Models Speak Ciphers: Multiagent Debate through Embeddings

Chau Pham, Boyi Liu, Yingxiang Yang, Zhengyu Chen, Tianyi Liu, Jianbo, Yuan, Bryan A. Plummer, Zhaoran Wang, Hongxia Yang

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces CIPHER, a novel communication protocol for LLMs that uses embeddings instead of natural language, enabling richer information exchange and improving reasoning performance across multiple tasks.

Contribution

CIPHER removes token sampling and uses embedding expectations for communication, outperforming natural language debate methods without modifying model weights.

Findings

01

Outperforms state-of-the-art natural language debate methods by 0.5-5.0%.

02

Enables broader spectrum of information encoding among LLMs.

03

Works across various open-source LLM sizes and reasoning tasks.

Abstract

Discussion and debate among Large Language Models (LLMs) have gained considerable attention due to their potential to enhance the reasoning ability of LLMs. Although natural language is an obvious choice for communication due to LLM's language understanding capability, the token sampling step needed when generating natural language poses a potential risk of information loss, as it uses only one token to represent the model's belief across the entire vocabulary. In this paper, we introduce a communication regime named CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue. Specifically, we remove the token sampling step from LLMs and let them communicate their beliefs across the vocabulary through the expectation of the raw transformer output embeddings. Remarkably, by deviating from natural language, CIPHER offers an advantage of encoding a…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

Allowing networks to communicate with each by sharing token-embeddings rather than raw tokens is an interesting idea, allowing for higher-bandwidth information transmission. This method shows performance improvements shown on GSM8k, MMLU, and Arithmetic benchmarks over the more direct debate method of Du et al.

Weaknesses

Although the high level ideas of the paper are interesting and potentially performance-boosting, the lack of detailed explanations and unusual formatting and presentation makes it hard to understand exactly what the authors are doing, and whether the performance improvements are actually due to their vector-sharing approach or something else. Various technical explanations were unclear or lacking, in particular those having to do with temperature-selection: * It is unclear how the Convert-and-A

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. A good idea to directly use embedding vectors to communicate between LLMs. 2. The paper provides a rigorous and comprehensive evaluation of CIPHER on five diverse reasoning datasets across multiple domains. The result showed that CIPHER consistently outperforms natural language debate. 3. The paper also conducts ablation studies and sensitivity analysis to investigate the mechanisms and factors that contribute to the performance of CIPHER.

Weaknesses

1. Limited Generalizability. As the authors described in the limitations, this method is only applicable to LLMs that share a common vocabulary. For different types of LLMs, aligning embeddings is a difficult task. 2. From Figure 10, the language of CIPHER is still difficult to analyze.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

* The paper proposes a novel communication protocol for large language models (LLMs) that use embeddings instead of natural language. * The paper provides a clear and detailed description of the CIPHER method and its implementation. * The paper also conducts extensive experiments on five reasoning tasks and compares CIPHER with the state-of-the-art natural language debate methods. The paper shows that CIPHER outperforms the baselines by a large margin on all tasks. * The paper also performs an

Weaknesses

See Questions

Videos

Let Models Speak Ciphers: Multiagent Debate through Embeddings· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Ferroelectric and Negative Capacitance Devices