Introduction to OXPath

Ruslan R. Fayzrakhmanov; Christopher Michels; Mandy Neumann

arXiv:1806.10899·cs.PL·June 29, 2018

Introduction to OXPath

Ruslan R. Fayzrakhmanov, Christopher Michels, Mandy Neumann

PDF

Open Access

TL;DR

The paper introduces OXPath, a language and framework for extracting structured data from complex, dynamic web applications by simulating user interactions and integrating with web technologies.

Contribution

It presents OXPath, a novel extension of XPath that enables interaction with sophisticated web interfaces and efficient data extraction for web data acquisition tasks.

Findings

01

OXPath effectively interacts with dynamic web pages.

02

It supports multiple data output formats like XML, JSON, CSV.

03

Demonstrates efficiency through comprehensive experiments.

Abstract

Contemporary web pages with increasingly sophisticated interfaces rival traditional desktop applications for interface complexity and are often called web applications or RIA (Rich Internet Applications). They often require the execution of JavaScript in a web browser and can call AJAX requests to dynamically generate the content, reacting to user interaction. From the automatic data acquisition point of view, thus, it is essential to be able to correctly render web pages and mimic user actions to obtain relevant data from the web page content. Briefly, to obtain data through existing Web interfaces and transform it into structured form, contemporary wrappers should be able to: 1) interact with sophisticated interfaces of web applications; 2) precisely acquire relevant data; 3) scale with the number of crawled web pages or states of web application; 4) have an embeddable programming API…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Mobile and Web Applications · Service-Oriented Architecture and Web Services