Using Semantic Similarity for Input Topic Identification in Crawling-based Web Application Testing
Jun-Wei Lin, Farn Wang

TL;DR
This paper introduces a semantic similarity-based method to automatically identify input field topics during web application crawling, reducing manual rule configuration and improving accuracy.
Contribution
It presents a novel natural-language approach for input topic identification that outperforms traditional rule-based methods and enhances their accuracy.
Findings
Comparable performance to rule-based methods in real-world tests
Improves rule-based accuracy by up to 19% when combined
Reduces manual effort in configuring input field rules
Abstract
To automatically test web applications, crawling-based techniques are usually adopted to mine the behavior models, explore the state spaces or detect the violated invariants of the applications. However, in existing crawlers, rules for identifying the topics of input text fields, such as login ids, passwords, emails, dates and phone numbers, have to be manually configured. Moreover, the rules for one application are very often not suitable for another. In addition, when several rules conflict and match an input text field to more than one topics, it can be difficult to determine which rule suggests a better match. This paper presents a natural-language approach to automatically identify the topics of encountered input fields during crawling by semantically comparing their similarities with the input fields in labeled corpus. In our evaluation with 100 real-world forms, the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Web Data Mining and Analysis · Software System Performance and Reliability
