From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)

Suyash Mishra; Qiang Li; Srikanth Patil; Anubhav Girdhar

arXiv:2601.05059·cs.CV·January 9, 2026

From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)

Suyash Mishra, Qiang Li, Srikanth Patil, Anubhav Girdhar

PDF

Open Access

TL;DR

This paper presents a personalized, efficient framework for generating pharmacy-related video clips using vision and audio language models, improving speed, cost, and clip quality for medical content processing.

Contribution

The authors introduce a novel Video to Video Clip Generation framework that combines ALMs and VLMs with personalization, smooth transition algorithms, and a cost-effective pipeline for pharmacy videos.

Findings

01

3-4x faster clip generation

02

4x cost reduction

03

Improved clip coherence and informativeness scores

Abstract

Vision Language Models (VLMs) are poised to revolutionize the digital transformation of pharmacyceutical industry by enabling intelligent, scalable, and automated multi-modality content processing. Traditional manual annotation of heterogeneous data modalities (text, images, video, audio, and web links), is prone to inconsistencies, quality degradation, and inefficiencies in content utilization. The sheer volume of long video and audio data further exacerbates these challenges, (e.g. long clinical trial interviews and educational seminars). Here, we introduce a domain adapted Video to Video Clip Generation framework that integrates Audio Language Models (ALMs) and Vision Language Models (VLMs) to produce highlight clips. Our contributions are threefold: (i) a reproducible Cut & Merge algorithm with fade in/out and timestamp normalization, ensuring smooth transitions and audio/visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Topic Modeling