Visual Editing with LLM-based Tool Chaining: An Efficient Distillation   Approach for Real-Time Applications

Oren Sultan; Alex Khasin; Guy Shiran; Asnat Greenstein-Messica; Dafna; Shahaf

arXiv:2410.02952·cs.CL·October 11, 2024

Visual Editing with LLM-based Tool Chaining: An Efficient Distillation Approach for Real-Time Applications

Oren Sultan, Alex Khasin, Guy Shiran, Asnat Greenstein-Messica, Dafna, Shahaf

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a distillation method to fine-tune smaller LLMs for real-time visual editing tasks, enabling efficient tool invocation based on natural language requests with reduced cost and latency.

Contribution

A novel distillation approach that fine-tunes smaller LLMs to match larger models' performance for real-time visual editing applications.

Findings

01

Student models match teacher model performance

02

Significant reduction in cost and latency

03

Improved fine-tuning with data augmentation

Abstract

We present a practical distillation approach to fine-tune LLMs for invoking tools in real-time applications. We focus on visual editing tasks; specifically, we modify images and videos by interpreting user stylistic requests, specified in natural language ("golden hour"), using an LLM to select the appropriate tools and their parameters to achieve the desired visual effect. We found that proprietary LLMs such as GPT-3.5-Turbo show potential in this task, but their high cost and latency make them unsuitable for real-time applications. In our approach, we fine-tune a (smaller) student LLM with guidance from a (larger) teacher LLM and behavioral signals. We introduce offline metrics to evaluate student LLMs. Both online and offline experiments show that our student models manage to match the performance of our teacher model (GPT-3.5-Turbo), significantly reducing costs and latency. Lastly,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

orensultan/AIRecolor
noneOfficial

Videos

Visual Editing with LLM-based Tool Chaining: An Efficient Distillation Approach for Real-Time Applications· underline

Taxonomy

TopicsDistributed and Parallel Computing Systems · Semantic Web and Ontologies · Web Data Mining and Analysis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Cosine Annealing · Layer Normalization · Dense Connections · Linear Warmup With Cosine Annealing · Adam · Linear Layer · Residual Connection · Weight Decay