Tag-Pag: A Dedicated Tool for Systematic Web Page Annotations
Anton Pogrebnjak, Julian Schelb, Andreas Spitz, Celina Kacperski,, Roberto Ulloa

TL;DR
Tag-Pag is a specialized tool that streamlines the process of annotating entire web pages with predefined topics, aiding researchers in web content analysis and machine learning training.
Contribution
It introduces a dedicated system for page-level annotations, integrating content extraction and URL indicators for efficient web page categorization.
Findings
Facilitates quick and accurate web page annotations
Supports multiple topic labeling per page
Provides export options for research workflows
Abstract
Tag-Pag is an application designed to simplify the categorization of web pages, a task increasingly common for researchers who scrape web pages to analyze individuals' browsing patterns or train machine learning classifiers. Unlike existing tools that focus on annotating sections of text, Tag-Pag systematizes page-level annotations, allowing users to determine whether an entire document relates to one or multiple predefined topics. Tag-Pag offers an intuitive interface to configure the input web pages and annotation labels. It integrates libraries to extract content from the HTML and URL indicators to aid the annotation process. It provides direct access to both scraped and live versions of the web page. Our tool is designed to expedite the annotation process with features like quick navigation, label assignment, and export functionality, making it a versatile and efficient tool for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Semantic Web and Ontologies · Biomedical Text Mining and Ontologies
MethodsUmbrella Reinforcement Learning · Focus
