Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges
Minghao Shao, Abdul Basit, Ramesh Karri, Muhammad Shafique

TL;DR
This survey reviews recent advancements in Large Language Models and Multimodal LLMs, highlighting their architectures, capabilities, benchmarks, and the challenges faced in their development and deployment.
Contribution
It provides a comprehensive overview and comparative analysis of recent LLM and MLLM architectures, including their technical features, strengths, and limitations.
Findings
MLLMs extend LLM capabilities to multiple data modalities.
Recent LLMs achieve state-of-the-art performance on various benchmarks.
Challenges include model complexity, data requirements, and ethical considerations.
Abstract
Large Language Models (LLMs) represent a class of deep learning models adept at understanding natural language and generating coherent responses to various prompts or queries. These models far exceed the complexity of conventional neural networks, often encompassing dozens of neural network layers and containing billions to trillions of parameters. They are typically trained on vast datasets, utilizing architectures based on transformer blocks. Present-day LLMs are multi-functional, capable of performing a range of tasks from text generation and language translation to question answering, as well as code generation and analysis. An advanced subset of these models, known as Multimodal Large Language Models (MLLMs), extends LLM capabilities to process and interpret multiple data modalities, including images, audio, and video. This enhancement empowers MLLMs with capabilities like video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
