Recategorising research: Mapping from FoR 2008 to FoR 2020 in Dimensions
Simon J Porter, Lezan Hawizy, Daniel W Hook

TL;DR
This paper details how the Dimensions team developed a machine learning-based method to accurately map research classification codes from the 2008 version to the updated 2020 version, facilitating system updates.
Contribution
It introduces an improved machine learning training set and a mapping approach for transitioning between FoR 2008 and FoR 2020 codes in research classification systems.
Findings
Successful creation of a machine learning training set for code mapping
Effective mapping from FoR 2008 to FoR 2020 codes
Enhanced classification accuracy in Dimensions system
Abstract
In 2020 the Australia New Zealand Standard Research Classification Fields of Research Codes (ANZSRC FoR codes) were updated by their owners. This has led the sector to need to update their systems of reference and has caused suppliers working in the research information sphere to need to update both systems and data. This paper describes the approach developed by Digital Science's Dimensions team to the creation of an improved machine learning training set, and the mapping of that set from FoR 2008 codes to FoR 2020 codes so that Dimensions classification approach for the ANZSRC codes could be improved and updated.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management
