A Structural Query System for Han Characters
Matthew Skala

TL;DR
This paper introduces a structural query system for Han characters that uses a novel data model, syntax, and indexing method to improve querying efficiency and support various applications like font development and language learning.
Contribution
It presents a new system combining EIDS-based data modeling, a specialized query language, and a bit vector index for faster searches in Han character databases.
Findings
The system enables efficient querying of Han characters based on their structure.
Experimental results show improved performance over existing software.
The implementation supports format translation from various character databases.
Abstract
The IDSgrep structural query system for Han character dictionaries is presented. This system includes a data model and syntax for describing the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes) based on the Unicode IDS syntax; a language for querying EIDS databases, designed to suit the needs of font developers and foreign language learners; a bit vector index inspired by Bloom filters for faster query operations; a freely available implementation; and format translation from popular third-party IDS and XML character databases. Experimental results are included, with a comparison to other software used for similar applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Web Data Mining and Analysis · Handwritten Text Recognition Techniques
