Efficient Knowledge Feeding to Language Models: A Novel Integrated Encoder-Decoder Architecture
S Santosh Kumar, Rishi Gottimukkala, Supriya Devidutta, Karthikeyan S

TL;DR
This paper presents a new integrated encoder-decoder architecture using in-context vectors to efficiently feed knowledge into language models, overcoming token limits and improving performance across various tasks.
Contribution
The novel ICV approach recasts in-context learning by embedding task information into latent vectors, enhancing knowledge integration without extensive prompt modifications.
Findings
ICV outperforms standard in-context learning and fine-tuning.
ICV reduces computational costs and memory usage.
ICV surpasses token limitations and is easy to control.
Abstract
This paper introduces a novel approach to efficiently feeding knowledge to language models (LLMs) during prediction by integrating retrieval and generation processes within a unified framework. While the Retrieval-Augmented Generation (RAG) model addresses gaps in LLMs' training data and knowledge limits, it is hindered by token limit restrictions and dependency on the retrieval system's accuracy. Our proposed architecture incorporates in-context vectors (ICV) to overcome these challenges. ICV recasts in-context learning by using latent embeddings of LLMs to create a vector that captures essential task information. This vector is then used to shift the latent states of the LLM, enhancing the generation process without adding demonstration examples to the prompt. ICV directly integrates information into the model, enabling it to process this information more effectively. Our extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Neural Networks and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Byte Pair Encoding · WordPiece · Layer Normalization · Residual Connection · Dense Connections
