Can Custom Models Learn In-Context? An Exploration of Hybrid   Architecture Performance on In-Context Learning Tasks

Ryan Campbell; Nelson Lojo; Kesava Viswanadha; Christoffer Grondal; Tryggestad; Derrick Han Sun; Sriteja Vijapurapu; August Rolfsen; Anant Sahai

arXiv:2411.03945·cs.LG·November 7, 2024

Can Custom Models Learn In-Context? An Exploration of Hybrid Architecture Performance on In-Context Learning Tasks

Ryan Campbell, Nelson Lojo, Kesava Viswanadha, Christoffer Grondal, Tryggestad, Derrick Han Sun, Sriteja Vijapurapu, August Rolfsen, Anant Sahai

PDF

Open Access 1 Repo

TL;DR

This paper investigates how different hybrid architectures of language models affect in-context learning performance, revealing architectural impacts on efficiency and proposing a new performance metric.

Contribution

It extends previous work to hybrid GPT-2/LLaMa and LLaMa/Mamba models, analyzing architectural effects on in-context learning and introducing the ICL regression score metric.

Findings

01

Certain architectural changes degrade ICL accuracy and training efficiency.

02

Some hybrid models show improved ICL performance, indicating potential for architecture optimization.

03

The ICL regression score provides a comprehensive performance measure.

Abstract

In-Context Learning (ICL) is a phenomenon where task learning occurs through a prompt sequence without the necessity of parameter updates. ICL in Multi-Headed Attention (MHA) with absolute positional embedding has been the focus of more study than other sequence model varieties. We examine implications of architectural differences between GPT-2 and LLaMa as well as LlaMa and Mamba. We extend work done by Garg et al. (2022) and Park et al. (2024) to GPT-2/LLaMa hybrid and LLaMa/Mamba hybrid models - examining the interplay between sequence transformation blocks and regressive performance in-context. We note that certain architectural changes cause degraded training efficiency/ICL accuracy by converging to suboptimal predictors or converging slower. We also find certain hybrids showing optimistic performance improvements, informing potential future ICL-focused architecture modifications.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

in-context-learning-2024/in-context
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Cosine Annealing · Dense Connections · Layer Normalization · Residual Connection · Focus · Linear Warmup With Cosine Annealing · Adam · Attention Is All You Need