Scaling Granite Code Models to 128K Context

Matt Stallone; Vaibhav Saxena; Leonid Karlinsky; Bridget McGinn; Tim; Bula; Mayank Mishra; Adriana Meza Soria; Gaoyuan Zhang; Aditya Prasad; Yikang; Shen; Saptha Surendran; Shanmukha Guttula; Hima Patel; Parameswaran Selvam,; Xuan-Hong Dang; Yan Koyfman; Atin Sood; Rogerio Feris; Nirmit Desai; David D.; Cox; Ruchir Puri; Rameswar Panda

arXiv:2407.13739·cs.AI·July 19, 2024·1 cites

Scaling Granite Code Models to 128K Context

Matt Stallone, Vaibhav Saxena, Leonid Karlinsky, Bridget McGinn, Tim, Bula, Mayank Mishra, Adriana Meza Soria, Gaoyuan Zhang, Aditya Prasad, Yikang, Shen, Saptha Surendran, Shanmukha Guttula, Hima Patel, Parameswaran Selvam,, Xuan-Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris

PDF

Open Access 1 Repo 4 Models

TL;DR

This paper presents a method to scale Granite code models to support 128K token contexts, enabling better long-context understanding without sacrificing short-context performance.

Contribution

The authors introduce a lightweight pretraining approach and release long-context Granite models, significantly extending context length capabilities for code models.

Findings

01

Long-context models outperform short-context models on long tasks.

02

No performance loss on standard code benchmarks.

03

Models are released under an open license.

Abstract

This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ibm/data-prep-kit
none

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHigh-Velocity Impact and Material Behavior · Advanced Surface Polishing Techniques · Metallurgy and Material Forming

MethodsBalanced Selection