Decoding Complexity: Intelligent Pattern Exploration with CHPDA (Context Aware Hybrid Pattern Detection Algorithm)
Lokesh Koli, Shubham Kalra, Karanpreet Singh

TL;DR
This paper evaluates regex and exact-match algorithms for sensitive data detection, proposing a hybrid AI + Regex method that improves accuracy and efficiency in data security applications.
Contribution
It introduces a hybrid AI and regex-based pattern detection algorithm that enhances detection accuracy and scalability for sensitive data identification.
Findings
Google RE2 offers optimal speed and accuracy among regex engines.
Aho-Corasick outperforms other exact match algorithms in large datasets.
Hybrid AI + Regex approach achieves a 91.6% F1 score, balancing recall and precision.
Abstract
Detecting sensitive data such as Personally Identifiable Information (PII) and Protected Health Information (PHI) is critical for data security platforms. This study evaluates regex-based pattern matching algorithms and exact-match search techniques to optimize detection speed, accuracy, and scalability. Our benchmarking results indicate that Google RE2 provides the best balance of speed (10-15 ms/MB), memory efficiency (8-16 MB), and accuracy (99.5%) among regex engines, outperforming PCRE while maintaining broader hardware compatibility than Hyperscan. For exact matching, Aho-Corasick demonstrated superior performance (8 ms/MB) and scalability for large datasets. Performance analysis revealed that regex processing time scales linearly with dataset size and pattern complexity. A hybrid AI + Regex approach achieved the highest F1 score (91. 6%) by improving recall and minimizing false…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Industrial Vision Systems and Defect Detection · Neural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus
