GROWN+UP: A Graph Representation Of a Webpage Network Utilizing Pre-training
Benedict Yeoh, Huijuan Wang

TL;DR
This paper introduces GROWN+UP, a pre-trained graph neural network model that effectively captures webpage structures, enabling improved performance on tasks like boilerplate removal and genre classification through self-supervised learning.
Contribution
The paper presents a novel pre-trained graph neural network model for webpages, filling a gap in web information retrieval with a flexible, self-supervised approach.
Findings
Achieves state-of-the-art results on webpage boilerplate removal.
Outperforms existing methods on genre classification benchmarks.
Demonstrates versatility across different webpage analysis tasks.
Abstract
Large pre-trained neural networks are ubiquitous and critical to the success of many downstream tasks in natural language processing and computer vision. However, within the field of web information retrieval, there is a stark contrast in the lack of similarly flexible and powerful pre-trained models that can properly parse webpages. Consequently, we believe that common machine learning tasks like content extraction and information mining from webpages have low-hanging gains that yet remain untapped. We aim to close the gap by introducing an agnostic deep graph neural network feature extractor that can ingest webpage structures, pre-train self-supervised on massive unlabeled data, and fine-tune to arbitrary tasks on webpages effectually. Finally, we show that our pre-trained model achieves state-of-the-art results using multiple datasets on two very different benchmarks: webpage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Web Data Mining and Analysis
MethodsGraph Neural Network
