Position: The Pre/Post-Training Boundary Should Govern IP in Industry-Academia ML Collaborations
Dirk Bergemann, Soheil Ghili, and Nitzan Mekel-Bobrov

TL;DR
The paper proposes a clear legal boundary for industry-academia ML collaborations, advocating for pre-training artifacts to be open science and post-training artifacts as business IP, to resolve collaboration tensions.
Contribution
It introduces PBOS, a contract template based on a technically grounded boundary, enabling better legal and scientific alignment in ML collaborations.
Findings
PBOS boundary is technically meaningful and legally auditable.
Adopting PBOS can resolve incentive misalignments in collaborations.
The boundary is grounded in the nature of ML artifacts pre- and post-training.
Abstract
Industry-academia ML collaborations routinely fail to launch -- not for scientific reasons, but because academics must publish while companies must protect models trained on proprietary data, and no standard contract framework resolves this tension. Because contracts are negotiated by legal departments alone, many apparent legal disputes are incentive misalignment problems that only scientists at the table can correctly diagnose. We propose PBOS (Protect-the-Business / Open-Source-the-Science), a community-adoptable contract template anchored to a single technically-grounded boundary: pre-training artifacts (architectures, training code, benchmarks, untrained weights) are open science; post-training artifacts (weights trained on proprietary data) are business IP. This boundary is technically meaningful, legally clean, and auditable -- and could not have been drawn correctly without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
