A Pipeline for Analysing Grant Applications
Shuaiqun Pan, Sergio J. Rodr\'iguez M\'endez, Kerry Taylor

TL;DR
This paper presents a data mining pipeline that analyzes grant applications to predict innovation scores and identify vocabulary associated with innovative proposals, demonstrating the effectiveness of a Random Forest classifier with unigram features.
Contribution
It introduces a novel pipeline for analyzing grant applications, including a modified TF-IDF encoding and a Random Forest model for predicting innovation scores.
Findings
Random Forest classifier achieved best performance.
Unigram features with modified TF-IDF are effective.
Pipeline proves feasible for analyzing grant applications.
Abstract
Data mining techniques can transform massive amounts of unstructured data into quantitative data that quickly reveal insights, trends, and patterns behind the original data. In this paper, a data mining model is applied to analyse the 2019 grant applications submitted to an Australian Government research funding agency to investigate whether grant schemes successfully identifies innovative project proposals, as intended. The grant applications are peer-reviewed research proposals that include specific ``innovation and creativity'' (IC) scores assigned by reviewers. In addition to predicting the IC score for each research proposal, we are particularly interested in understanding the vocabulary of innovative proposals. In order to solve this problem, various data mining models and feature encoding algorithms are studied and explored. As a result, we propose a model with the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems
