LlaMaVAE: Guiding Large Language Model Generation via Continuous Latent   Sentence Spaces

Yingji Zhang; Danilo S. Carvalho; Ian Pratt-Hartmann; Andr\'e Freitas

arXiv:2312.13208·cs.CL·December 21, 2023·1 cites

LlaMaVAE: Guiding Large Language Model Generation via Continuous Latent Sentence Spaces

Yingji Zhang, Danilo S. Carvalho, Ian Pratt-Hartmann, Andr\'e Freitas

PDF

Open Access

TL;DR

LlaMaVAE introduces a novel VAE-based framework combining large language models with continuous sentence latent spaces, enhancing controllability and semantic coherence in text generation.

Contribution

The paper presents LlaMaVAE, integrating expressive encoder-decoder models with VAE architecture and flow-based invertible networks for improved controllability and performance.

Findings

01

Outperforms previous VAE models like Optimus in multiple tasks

02

Enhances semantic clustering and geometric consistency in generated text

03

Enables better control over language generation through latent space manipulation

Abstract

Deep generative neural networks, such as Variational AutoEncoders (VAEs), offer an opportunity to better understand and control language models from the perspective of sentence-level latent spaces. To combine the controllability of VAE latent spaces with the state-of-the-art performance of recent large language models (LLMs), we present in this work LlaMaVAE, which combines expressive encoder and decoder models (sentenceT5 and LlaMA) with a VAE architecture, aiming to provide better text generation control to LLMs. In addition, to conditionally guide the VAE generation, we investigate a new approach based on flow-based invertible neural networks (INNs) named Invertible CVAE. Experimental results reveal that LlaMaVAE can outperform the previous state-of-the-art VAE language model, Optimus, across various tasks, including language modelling, semantic textual similarity and definition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis