Efficient Multi-Task Inferencing with a Shared Backbone and Lightweight Task-Specific Adapters for Automatic Scoring

Ehsan Latif; Xiaoming Zhai

arXiv:2412.21065·cs.CL·June 24, 2025

Efficient Multi-Task Inferencing with a Shared Backbone and Lightweight Task-Specific Adapters for Automatic Scoring

Ehsan Latif, Xiaoming Zhai

PDF

Open Access

TL;DR

This paper introduces an efficient multi-task AI framework for automated student response scoring, using shared models and lightweight adapters to reduce resource use while maintaining high accuracy.

Contribution

It proposes a novel shared backbone architecture with lightweight adapters for multi-task scoring, significantly reducing computational costs compared to fully fine-tuned models.

Findings

01

Achieves an average QWK of 0.848, close to fully fine-tuned models' 0.888.

02

Reduces GPU memory consumption by 60%.

03

Lowers inference latency by 40%.

Abstract

The integration of Artificial Intelligence (AI) in education requires scalable and efficient frameworks that balance performance, adaptability, and cost. This paper addresses these needs by proposing a shared backbone model architecture enhanced with lightweight LoRA adapters for task-specific fine-tuning, targeting the automated scoring of student responses across 27 mutually exclusive tasks. By achieving competitive performance (average QWK of 0.848 compared to 0.888 for fully fine-tuned models) while reducing GPU memory consumption by 60% and inference latency by 40%, the framework demonstrates significant efficiency gains. This approach aligns with the workshop's focus on improving language models for educational tasks, creating responsible innovations for cost-sensitive deployment, and supporting educators by streamlining assessment workflows. The findings underscore the potential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Service-Oriented Architecture and Web Services · Constraint Satisfaction and Optimization

MethodsFocus