Towards Khmer Scene Document Layout Detection
Marry Kong, Rina Buoy, Sovisal Chenda, Nguonly Taing, Masakazu Iwamura, Koichi Kise

TL;DR
This paper introduces a new framework for Khmer scene document layout detection, including a dataset, augmentation tools, and YOLO-based models, addressing the lack of resources and challenges posed by Khmer script complexities.
Contribution
It provides the first comprehensive study with a dedicated dataset, augmentation tools, and baseline models for Khmer scene document layout analysis.
Findings
Created a benchmark dataset for Khmer scene layouts
Developed an augmentation tool for realistic scene document synthesis
Established YOLO-based layout detection baselines with oriented bounding boxes
Abstract
While document layout analysis for Latin scripts has advanced significantly, driven by the advent of large multimodal models (LMMs), progress for the Khmer language remains constrained because of the scarcity of annotated training data. This gap is particularly acute for scene documents, where perspective distortions and complex backgrounds challenge traditional methods. Given the structural complexities of Khmer script, such as diacritics and multi-layer character stacking, existing Latin-based layout analysis models fail to accurately delineate semantic layout units, particularly for dense text regions (e.g., list items). In this paper, we present the first comprehensive study on Khmer scene document layout detection. We contribute a novel framework comprising three key elements: (1) a robust training and benchmarking dataset specifically for Khmer scene layouts; (2) an open-source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
