Web Application Testing: Using Tree Kernels to Detect Near-duplicate States in Automated Model Inference
Anna Corazza, Sergio Di Martino, Adriano Peron, Luigi Libero Lucio, Starace

TL;DR
This paper introduces a novel approach using Tree Kernel functions to detect near-duplicate web pages based on their DOM trees, improving model inference for web application testing.
Contribution
The paper presents a new near-duplicate detection method for web pages using Tree Kernel functions, enhancing the quality of state models in web testing.
Findings
Outperforms existing near-duplicate detection techniques
Effective in classifying web page pairs as near-duplicates
Promising results motivate further research in this area
Abstract
In the context of End-to-End testing of web applications, automated exploration techniques (a.k.a. crawling) are widely used to infer state-based models of the site under test. These models, in which states represent features of the web application and transitions represent reachability relationships, can be used for several model-based testing tasks, such as test case generation. However, current exploration techniques often lead to models containing many near-duplicate states, i.e., states representing slightly different pages that are in fact instances of the same feature. This has a negative impact on the subsequent model-based testing tasks, adversely affecting, for example, size, running time, and achieved coverage of generated test suites. As a web page can be naturally represented by its tree-structured DOM representation, we propose a novel near-duplicate detection technique to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software System Performance and Reliability
