Loading paper
LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention | Tomesphere