Evaluating Digital Inclusiveness of Digital Agri-Food Tools Using Large Language Models: A Comparative Analysis Between Human and AI-Based Evaluations
Githma Pewinya, Carolina Martins, Garcia Mariangel

TL;DR
This paper investigates the potential of large language models to rapidly and effectively evaluate the digital inclusiveness of agricultural tools, comparing their performance to traditional human assessments.
Contribution
It introduces a comparative analysis of four LLMs against expert evaluations for assessing digital inclusiveness in agri-food tools, highlighting their potential and limitations.
Findings
LLMs can approximate expert judgments in some evaluation dimensions.
Model performance varies across different LLMs and contexts.
AI-based assessments could complement traditional evaluation workflows.
Abstract
Ensuring digital inclusiveness is a critical priority in agri-food systems, particularly in the Global South, where digital divides persist. The Multidimensional Digital Inclusiveness Index (MDII) offers a comprehensive, human-led framework to assess how inclusive digital agricultural tools (agritools) are. However, the current evaluation process is resource intensive, often requiring months to complete. This study explores whether large language models (LLMs) can support a rapid, AI-enabled assessment of digital inclusiveness, complementing the MDII's existing workflow. Using a comparative analysis, the research benchmarks the performance of four LLMs (Grok, Gemini, GPT-4o, and GPT-5) against prior expert-led evaluations. The study investigates model alignment with human scores, sensitivity to temperature settings, and potential sources of bias. Findings suggest that LLMs can generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
