From PDFs to Structured Data: Utilizing LLM Analysis in Sports Database Management
Juhani Merilehto

TL;DR
This paper explores using Large Language Models like GPT-4 to automate converting semi-structured PDF data into structured formats, demonstrating high accuracy and efficiency in sports database management.
Contribution
It introduces an AI-assisted method utilizing LLMs for processing semi-structured data, showing practical application in sports data management with high success rates.
Findings
90% success rate in automated data processing
Handled over 7,900 data rows from 72 reports
Potential to reduce processing time by 90%
Abstract
This study investigates the effectiveness of Large Language Models (LLMs) in processing semi-structured data from PDF documents into structured formats, specifically examining their application in updating the Finnish Sports Clubs Database. Through action research methodology, we developed and evaluated an AI-assisted approach utilizing OpenAI's GPT-4 and Anthropic's Claude 3 Opus models to process data from 72 sports federation membership reports. The system achieved a 90% success rate in automated processing, successfully handling 65 of 72 files without errors and converting over 7,900 rows of data. While the initial development time was comparable to traditional manual processing (three months), the implemented system shows potential for reducing future processing time by approximately 90%. Key challenges included handling multilingual content, processing multi-page datasets, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance · Natural Language Processing Techniques
MethodsLinear Layer · Dense Connections · Multi-Head Attention · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing · Byte Pair Encoding · Layer Normalization
