SHARE: Social-Humanities AI for Research and Education
Jo\~ao Gon\c{c}alves, Sonia de Jager, Petr Knoth, David Pride, Nick Jelicic

TL;DR
This report presents the SHARE family of causal language models tailored for social sciences and humanities, along with the MIRROR interface for ethical text review, achieving performance comparable to larger models.
Contribution
Introduction of the first SSH-specific causal language models and a novel interface that enables ethical review without text generation.
Findings
SHARE models perform close to larger general-purpose models on SSH texts
MIRROR interface allows review of inputs while maintaining SSH norms
Models are pretrained specifically for social sciences and humanities texts
Abstract
This intermediate technical report introduces the SHARE family of base models and the MIRROR user interface. The SHARE models are the first causal language models fully pretrained by and for the social sciences and humanities (SSH). Their performance in modelling SSH texts is close to that of general purpose models (Phi-4) which use 100 times more tokens, as shown by our custom SSH Cloze benchmark. The MIRROR user interface is designed for reviewing text inputs from the SSH disciplines while preserving critical engagement. By prototyping a generative AI interface that does not generate any text, we propose a way to harness the capabilities of the SHARE models without compromising the integrity of SSH principles and norms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
