OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit
Arun S. Maiya

TL;DR
OnPrem.LLM is a versatile, privacy-focused toolkit enabling secure, local deployment of large language models for document processing tasks, with flexible backend support and user-friendly interfaces.
Contribution
It introduces a comprehensive, privacy-preserving LLM toolkit with multi-backend support, hybrid deployment options, and an accessible no-code web interface.
Findings
Supports multiple LLM backends including llama.cpp, Ollama, vLLM, and Hugging Face.
Enables privacy-preserving document processing in restricted environments.
Provides seamless backend switching and hybrid cloud integration.
Abstract
We present OnPremLLM, a Python-based toolkit for applying large language models (LLMs) to sensitive, non-public data in offline or restricted environments. The system is designed for privacy-preserving use cases and provides prebuilt pipelines for document processing and storage, retrieval-augmented generation (RAG), information extraction, summarization, classification, and prompt/output processing with minimal configuration. OnPremLLM supports multiple LLM backends -- including llamacpp, Ollama, vLLM, and Hugging Face Transformers -- with quantized model support, GPU acceleration, and seamless backend switching. Although designed for fully local execution, OnPremLLM also supports integration with a wide range of cloud LLM providers when permitted, enabling hybrid deployments that balance performance with data control. A no-code web interface extends accessibility to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Big Data and Digital Economy · Computational Physics and Python Applications
