KSAT: Knowledge-infused Self Attention Transformer -- Integrating Multiple Domain-Specific Contexts
Kaushik Roy, Yuxin Zi, Vignesh Narayanan, Manas Gaur, Amit Sheth

TL;DR
KSAT is a novel transformer architecture that integrates multiple domain-specific contexts using external knowledge sources, enhancing interpretability and outperforming traditional fine-tuning methods in domain-specific language understanding.
Contribution
Introduces KSAT, a self-attention transformer with knowledge-guided biases for integrating multiple domain contexts and controlling knowledge-data trade-offs.
Findings
KSAT provides human-understandable visualization of domain contributions.
KSAT outperforms fine-tuning baselines on domain-specific tasks.
KSAT effectively combines multiple knowledge sources in language models.
Abstract
Domain-specific language understanding requires integrating multiple pieces of relevant contextual information. For example, we see both suicide and depression-related behavior (multiple contexts) in the text ``I have a gun and feel pretty bad about my life, and it wouldn't be the worst thing if I didn't wake up tomorrow''. Domain specificity in self-attention architectures is handled by fine-tuning on excerpts from relevant domain specific resources (datasets and external knowledge - medical textbook chapters on mental health diagnosis related to suicide and depression). We propose a modified self-attention architecture Knowledge-infused Self Attention Transformer (KSAT) that achieves the integration of multiple domain-specific contexts through the use of external knowledge sources. KSAT introduces knowledge-guided biases in dedicated self-attention layers for each knowledge source to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Softmax · Label Smoothing · Adam · Dense Connections
