PISCO: Pretty Simple Compression for Retrieval-Augmented Generation

Maxime Louis; Herv\'e D\'ejean; St\'ephane Clinchant

arXiv:2501.16075·cs.CL·January 28, 2025

PISCO: Pretty Simple Compression for Retrieval-Augmented Generation

Maxime Louis, Herv\'e D\'ejean, St\'ephane Clinchant

PDF

Open Access 1 Models 1 Datasets 1 Video

TL;DR

PISCO is a novel document compression method for Retrieval-Augmented Generation that achieves high compression rates with minimal accuracy loss, requiring no pretraining and enabling efficient fine-tuning of large language models.

Contribution

PISCO introduces a pretraining-free, sequence-level knowledge distillation approach for document compression in RAG, significantly improving scalability and efficiency.

Findings

01

Achieves 16x compression with only 0-3% accuracy loss.

02

Outperforms existing models by 8% in accuracy.

03

Enables fine-tuning of large LLMs in 48 hours on a single GPU.

Abstract

Retrieval-Augmented Generation (RAG) pipelines enhance Large Language Models (LLMs) by retrieving relevant documents, but they face scalability issues due to high inference costs and limited context size. Document compression is a practical solution, but current soft compression methods suffer from accuracy losses and require extensive pretraining. In this paper, we introduce PISCO, a novel method that achieves a 16x compression rate with minimal accuracy loss (0-3%) across diverse RAG-based question-answering (QA) tasks. Unlike existing approaches, PISCO requires no pretraining or annotated data, relying solely on sequence-level knowledge distillation from document-based questions. With the ability to fine-tune a 7-10B LLM in 48 hours on a single A100 GPU, PISCO offers a highly efficient and scalable solution. We present comprehensive experiments showing that PISCO outperforms existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
naver/pisco-mistral
model· 371 dl· ♡ 8
371 dl♡ 8

Datasets

maxoul/pisco_finetuning_data
dataset· 39 dl
39 dl

Videos

PISCO: Pretty Simple Compression for Retrieval-Augmented Generation· underline

Taxonomy

TopicsAlgorithms and Data Compression

MethodsKnowledge Distillation