A Generative AI–Based Technical Data Extraction Tool for IoT Application Systems
Dezheng Kong, Nobuo Funabiki, Htoo Htoo Sandi Kyaw, I Nyoman Darma Kotama, Zihao Zhu, Alfiandi Aulia Rahmadani

TL;DR
A new AI tool helps extract technical data from IoT device datasheets, improving setup reliability for non-experts.
Contribution
A generative AI-based tool that extracts and structures technical data from IoT datasheets using RAG and schema-based methods.
Findings
The tool improves Recall from 0.636 to 0.926 and Accuracy from 0.595 to 0.807 compared to ChatPDF.
It reliably extracts key specifications from sensor and device datasheets.
A local vector database enables semantic similarity retrieval for RAG-based answering.
Abstract
Nowadays, Internet of Things (IoT) application systems play an essential role in smart cities, industry, healthcare, agriculture, and smart homes. For non-expert users, designing and implementing IoT application systems remains challenging, especially when configuring sensors, edge devices, and server platforms. To support configuration tasks of IoT application systems, we have developed an AI-based setup assistance tool. However, AI models still fail to reliably support newly released or previously unseen devices, sometimes producing incomplete or erroneous outputs that may lead to configuration failures. Incorporating their technical-document information into Retrieval-Augmented Generation (RAG) is an effective way to supplement AI knowledge and improve reliability. In this paper, we propose a generative AI-based technical data extraction tool to address the challenges. It extracts…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Spreadsheets and End-User Computing
