Power Law Graph Transformer for Machine Translation and Representation Learning
Burc Gokden

TL;DR
This paper introduces the Power Law Graph Transformer, a novel model that learns graph structures with power law distributions for improved machine translation, demonstrating competitive BLEU scores on Turkish-English and Portuguese-English datasets.
Contribution
The paper proposes a new transformer architecture incorporating power law graph learning for both deductive and inductive tasks, enhancing representation learning in translation tasks.
Findings
Achieved BLEU scores of 17.79 and 28.33 on Turkish-English and Portuguese-English translation.
Demonstrated the model's ability to learn dataset-level and instance-level graph structures.
Showed how duality between quantization and manifold representations enables transformation between local and global outputs.
Abstract
We present the Power Law Graph Transformer, a transformer model with well defined deductive and inductive tasks for prediction and representation learning. The deductive task learns the dataset level (global) and instance level (local) graph structures in terms of learnable power law distribution parameters. The inductive task outputs the prediction probabilities using the deductive task output, similar to a transductive model. We trained our model with Turkish-English and Portuguese-English datasets from TED talk transcripts for machine translation and compared the model performance and characteristics to a transformer model with scaled dot product attention trained on the same experimental setup. We report BLEU scores of and on the Turkish-English and Portuguese-English translation tasks with our model, respectively. We also show how a duality between a quantization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Layer Normalization · Byte Pair Encoding · Dropout · Label Smoothing
