Learning Context-Aware Representations of Subtrees
Cedric Cook

TL;DR
This thesis introduces context-aware extensions to Tree-LSTM for classifying web page elements, leveraging surrounding structural information to improve representation quality and classification accuracy.
Contribution
It proposes novel context-aware Tree-LSTM models that enhance subtree representations for web element classification tasks.
Findings
Achieved an average F1-score of 0.7973 on web classification.
Demonstrated improved subtree representations over existing models.
Potential applications include element classification and reinforcement learning on the Web.
Abstract
This thesis tackles the problem of learning efficient representations of complex, structured data with a natural application to web page and element classification. We hypothesise that the context around the element inside the web page is of high value to the problem and is currently under exploited. This thesis aims to solve the problem of classifying web elements as subtrees of a DOM tree by also considering their context. To achieve this, first we discuss current expert knowledge systems that work on structures, such as Tree-LSTM. Then, we propose context-aware extensions to this model. We show that the new model achieves an average F1-score of 0.7973 on a multi-class web classification task. This model generates better representations for various subtrees and may be used for applications such element classification, state estimators in reinforcement learning over the Web and more.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Topic Modeling · Web Data Mining and Analysis
