Doctoral Theses in France (1985-2025): A Linked Dataset of PhDs, Academic Networks, and Institutions
William Aboucaya, Dastan Jasim

TL;DR
This paper introduces a detailed dataset of French doctoral theses from 1985 to 2025, combining multiple sources to support research on academic careers, networks, and institutional collaborations.
Contribution
It provides a comprehensive, enriched, and linked dataset of French PhDs, including metadata, career trajectories, and institutional data, enabling diverse longitudinal analyses.
Findings
Dataset covers 1985-2025 French theses with high completeness.
Enriched data includes academic careers, jury participation, and institutional affiliations.
Supports research on doctoral education, academic networks, and research community evolution.
Abstract
This paper presents a comprehensive dataset of doctoral theses defended in France between 1985 and 2025, constructed from multiple national academic metadata sources. The dataset is primarily based on data from the French national thesis platform and is enriched using additional authority and bibliographic databases to improve data quality, completeness, and interoperability. The data production pipeline includes the aggregation of heterogeneous sources, the correction of inconsistent identifiers, the enrichment of person and institution records, and the construction of derived variables describing academic careers, jury participation, institutional affiliations, and thesis characteristics. Additional identifiers from major academic repositories and library catalogues are integrated to facilitate linkage with external data sources and future dataset extensions. The resulting dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
