Learning to Navigate Wikipedia by Taking Random Walks
Manzil Zaheer, Kenneth Marino, Will Grathwohl, John Schultz, Wendy, Shang, Sheila Babayan, Arun Ahuja, Ishita Dasgupta, Christine Kaeser-Chen,, Rob Fergus

TL;DR
This paper presents a method for training navigation policies on large graph-structured data like Wikipedia by learning from random walks, enabling efficient information retrieval and supporting downstream fact verification and question answering tasks.
Contribution
It introduces a simple behavioral cloning approach from random trajectories to learn effective link navigation policies on massive graphs like Wikipedia.
Findings
Achieves 96% success in navigating 5 steps apart and 92% in 20 steps.
Embeddings and policies are competitive in fact verification and question answering.
Method scales to large graphs with 38 million nodes and 387 million edges.
Abstract
A fundamental ability of an intelligent web-based agent is seeking out and acquiring new information. Internet search engines reliably find the correct vicinity but the top results may be a few links away from the desired target. A complementary approach is navigation via hyperlinks, employing a policy that comprehends local content and selects a link that moves it closer to the target. In this paper, we show that behavioral cloning of randomly sampled trajectories is sufficient to learn an effective link selection policy. We demonstrate the approach on a graph version of Wikipedia with 38M nodes and 387M edges. The model is able to efficiently navigate between nodes 5 and 20 steps apart 96% and 92% of the time, respectively. We then use the resulting embeddings and policy in downstream fact verification and question answering tasks where, in combination with basic TF-IDF search and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsWikis in Education and Collaboration · Topic Modeling · Natural Language Processing Techniques
