PLDR-LLM: Large Language Model from Power Law Decoder Representations

Burc Gokden

arXiv:2410.16703·cs.CL·October 23, 2024

PLDR-LLM: Large Language Model from Power Law Decoder Representations

Burc Gokden

PDF

Open Access 2 Repos 10 Models

TL;DR

PLDR-LLM introduces a novel power law graph attention mechanism for language modeling, demonstrating competitive zero-shot and few-shot performance and the potential for improved deductive reasoning through DAG-based regularization.

Contribution

The paper presents a new power law graph attention mechanism in LLMs, with detailed architecture, pretraining methods, and insights into deductive output optimization.

Findings

01

PLDR-LLMs achieve competitive performance in zero-shot and few-shot tasks.

02

Deductive outputs can be enhanced using DAG loss as a metric and regularizer.

03

Initial learning rate and warm-up steps significantly influence deductive reasoning capabilities.

Abstract

We present the Large Language Model from Power Law Decoder Representations (PLDR-LLM), a language model that leverages non-linear and linear transformations through Power Law Graph Attention mechanism to generate well-defined deductive and inductive outputs. We pretrain the PLDR-LLMs of varying layer sizes with a small batch size of 32 and $\sim$ 8B tokens from the RefinedWeb dataset, and show that they achieve competitive performance in zero-shot and few-shot settings compared to scaled dot-product LLMs of similar model size reported in the literature. We show that deductive outputs of PLDR-LLMs can be used to compare model characteristics or improve the performance by introducing the Directed Acyclic Graph (DAG) loss as a metric and regularizer. Our results indicate that the initial maximum learning rate and warm-up steps have a lasting impact on deductive outputs throughout the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques

MethodsSoftmax · Attention Is All You Need