GPT-4o System Card
OpenAI: Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman,, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec, Radford, Aleksander M\k{a}dry, Alex Baker-Whitcomb, Alex Beutel, Alex, Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol

TL;DR
GPT-4o is a versatile, multimodal autoregressive model capable of processing and generating text, audio, and images with high speed, improved multilingual performance, and comprehensive safety and societal impact evaluations.
Contribution
Introduction of GPT-4o, a unified multimodal model that processes multiple input and output types end-to-end, with enhanced speed, multilingual capabilities, and safety assessments.
Findings
Matches GPT-4 Turbo performance on English text and code
Significantly improves non-English language understanding
Achieves response times comparable to human conversation
Abstract
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCardiovascular Function and Risk Factors · Hyperglycemia and glycemic control in critically ill and hospitalized patients
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Layer Normalization · Residual Connection · Byte Pair Encoding · Absolute Position Encodings · Multi-Head Attention · Softmax
