Using Semantic Similarity for Input Topic Identification in   Crawling-based Web Application Testing

Jun-Wei Lin; Farn Wang

arXiv:1608.06549·cs.SE·August 24, 2016·1 cites

Using Semantic Similarity for Input Topic Identification in Crawling-based Web Application Testing

Jun-Wei Lin, Farn Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a semantic similarity-based method to automatically identify input field topics during web application crawling, reducing manual rule configuration and improving accuracy.

Contribution

It presents a novel natural-language approach for input topic identification that outperforms traditional rule-based methods and enhances their accuracy.

Findings

01

Comparable performance to rule-based methods in real-world tests

02

Improves rule-based accuracy by up to 19% when combined

03

Reduces manual effort in configuring input field rules

Abstract

To automatically test web applications, crawling-based techniques are usually adopted to mine the behavior models, explore the state spaces or detect the violated invariants of the applications. However, in existing crawlers, rules for identifying the topics of input text fields, such as login ids, passwords, emails, dates and phone numbers, have to be manually configured. Moreover, the rules for one application are very often not suitable for another. In addition, when several rules conflict and match an input text field to more than one topics, it can be difficult to determine which rule suggests a better match. This paper presents a natural-language approach to automatically identify the topics of encountered input fields during crawling by semantically comparing their similarities with the input fields in labeled corpus. In our evaluation with 100 real-world forms, the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jwlin/arxiv-160430
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Web Data Mining and Analysis · Software System Performance and Reliability