SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning

Yuze Zhao; Jintao Huang; Jinghan Hu; Xingjun Wang; Yunlin Mao; Daoze Zhang; Hong Zhang; Zeyinzi Jiang; Zhikai Wu; Baole Ai; Ang Wang; Wenmeng Zhou; Yingda Chen

arXiv:2408.05517·cs.CL·May 20, 2025·5 cites

SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning

Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Hong Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, Wenmeng Zhou, Yingda Chen

PDF

Open Access 3 Repos 2 Models 1 Datasets

TL;DR

SWIFT is a comprehensive, open-source infrastructure supporting the fine-tuning, evaluation, and deployment of over 350 large language and multimodal models, enabling faster adaptation and improved performance across diverse tasks.

Contribution

It introduces SWIFT, the first systematic framework supporting fine-tuning and post-training processes for both LLMs and MLLMs, with extensive utilities and benchmark capabilities.

Findings

01

Achieved 5.2%-21.8% improvement on ToolBench leaderboard

02

Reduced hallucination by 1.6%-14.1% in fine-tuned models

03

Enhanced average performance by 8%-17% across tasks

Abstract

Recent development in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) have leverage Attention-based Transformer architectures and achieved superior performance and generalization capabilities. They have since covered extensive areas of traditional learning tasks. For instance, text-based tasks such as text-classification and sequence-labeling, as well as multi-modal tasks like Visual Question Answering (VQA) and Optical Character Recognition (OCR), which were previously addressed using different models, can now be tackled based on one foundation model. Consequently, the training and lightweight fine-tuning of LLMs and MLLMs, especially those based on Transformer architecture, has become particularly important. In recognition of these overwhelming needs, we develop SWIFT, a customizable one-stop infrastructure for large models. With support of over $300 +$ LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

eve1f/ckpt
dataset· 1.1k dl
1.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemiconductor Lasers and Optical Devices · Photonic and Optical Devices

MethodsLinear Layer · Residual Connection · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections