PLDR-LLM: Large Language Model from Power Law Decoder Representations
Burc Gokden

TL;DR
PLDR-LLM introduces a novel power law graph attention mechanism for language modeling, demonstrating competitive zero-shot and few-shot performance and the potential for improved deductive reasoning through DAG-based regularization.
Contribution
The paper presents a new power law graph attention mechanism in LLMs, with detailed architecture, pretraining methods, and insights into deductive output optimization.
Findings
PLDR-LLMs achieve competitive performance in zero-shot and few-shot tasks.
Deductive outputs can be enhanced using DAG loss as a metric and regularizer.
Initial learning rate and warm-up steps significantly influence deductive reasoning capabilities.
Abstract
We present the Large Language Model from Power Law Decoder Representations (PLDR-LLM), a language model that leverages non-linear and linear transformations through Power Law Graph Attention mechanism to generate well-defined deductive and inductive outputs. We pretrain the PLDR-LLMs of varying layer sizes with a small batch size of 32 and 8B tokens from the RefinedWeb dataset, and show that they achieve competitive performance in zero-shot and few-shot settings compared to scaled dot-product LLMs of similar model size reported in the literature. We show that deductive outputs of PLDR-LLMs can be used to compare model characteristics or improve the performance by introducing the Directed Acyclic Graph (DAG) loss as a metric and regularizer. Our results indicate that the initial maximum learning rate and warm-up steps have a lasting impact on deductive outputs throughout the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗fromthesky/pldrllmv5-1-104Mmodel· 5 dl5 dl
- 🤗fromthesky/pldrllmv5-2-110Mmodel· 2 dl2 dl
- 🤗fromthesky/pldrllmv5-3-144Mmodel· 9 dl9 dl
- 🤗fromthesky/pldrllmv5-4-260Mmodel· 9 dl9 dl
- 🤗fromthesky/pldrllmv9-1-114Mmodel· 11 dl11 dl
- 🤗fromthesky/pldrllmv9-2-147Mmodel· 15 dl15 dl
- 🤗fromthesky/pldrllmv5-DAG-1-110Mmodel· 1 dl1 dl
- 🤗fromthesky/pldrllmv5-DAG-2-110Mmodel· 3 dl3 dl
- 🤗fromthesky/pldrllmv5-DAG-3-110Mmodel
- 🤗fromthesky/pldrllmv5-DAG-4-110Mmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
MethodsSoftmax · Attention Is All You Need
