# ColiFormer: A Transformer-Based Codon Optimization Model Balancing Multiple Objectives for Enhanced E. coli Gene Expression

**Authors:** Saketh Baddam, Omar Emam, Abdelrahman Elfikky, Francesco Cavarretta, George Luka, Ibrahim Farag, Yasser Sanad

PMC · DOI: 10.3390/bioengineering13010114 · Bioengineering · 2026-01-19

## TL;DR

ColiFormer is a new codon optimization tool that improves gene expression in E. coli by balancing multiple biological factors.

## Contribution

ColiFormer introduces a transformer-based framework that balances multiple codon optimization objectives using a novel mathematical approach.

## Key findings

- ColiFormer improved CAI and tAI values while maintaining optimal GC content in silico.
- The model reduced inhibitory cis-regulatory motifs compared to existing methods.
- ColiFormer maintains competitive runtime performance while achieving these improvements.

## Abstract

Codon optimization is widely used to improve heterologous gene expression in Escherichia coli. However, many existing methods focus primarily on maximizing the codon adaptation index (CAI) and neglect broader aspects of biological context. In this study, we present ColiFormer, a transformer-based codon optimization framework fine-tuned on 3676 high-expression E. coli genes curated from the NCBI database. Built on the CodonTransformer BigBird architecture, ColiFormer employs self-attention mechanisms and a mathematical optimization method (the augmented Lagrangian approach) to balance multiple biological objectives simultaneously, including CAI, GC content, tRNA adaptation index (tAI), RNA stability, and minimization of negative cis-regulatory elements. Based on in silico evaluations on 37,053 native E. coli genes and 80 recombinant protein targets commonly used in industrial studies, ColiFormer demonstrated significant improvements in CAI and tAI values, maintained GC content within biologically optimal ranges, and reduced inhibitory cis-regulatory motifs compared with established codon optimization approaches, while maintaining competitive runtime performance. These results represent computational predictions derived from standard in silico metrics; future experimental work is anticipated to validate these computational predictions in vivo. ColiFormer has been released as an open-source tool alongside the benchmark datasets used in this study.

## Linked entities

- **Species:** Escherichia coli (taxon 562)

## Full-text entities

- **Species:** Escherichia coli (E. coli, species) [taxon 562]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12838208/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12838208/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/PMC12838208/full.md

---
Source: https://tomesphere.com/paper/PMC12838208