Milgram’s experiment in the knowledge space: individual navigation strategies

Manran Zhu; János Kertész

PMC · DOI:10.1140/epjds/s13688-025-00558-6·June 5, 2025

Milgram’s experiment in the knowledge space: individual navigation strategies

Manran Zhu, János Kertész

PDF

Open Access

TL;DR

This study explores how people navigate through information spaces like Wikipedia, finding that individuals use different strategies based on age, gender, and race.

Contribution

The paper identifies two distinct navigation strategies in knowledge spaces and links them to demographic factors.

Findings

01

Older, white, and female participants tend to use a proximity-driven strategy.

02

Younger participants prefer a hub-driven strategy for navigation.

03

The study connects social navigation tendencies to knowledge space strategies.

Abstract

Data deluge characteristic for our times has led to information overload, posing a significant challenge to effectively finding our way through the digital landscape. Addressing this issue requires an in-depth understanding of how we navigate through the abundance of information. Previous research has discovered multiple patterns in how individuals navigate in the geographic, social, and information spaces, yet individual differences in strategies for navigation in the knowledge space has remained largely unexplored. To bridge the gap, we conducted an online experiment where participants played a navigation game on Wikipedia and completed questionnaires about their personal information. Utilizing the hierarchical structure of the English Wikipedia and a graph embedding trained on it, we identified two navigation strategies and found that there are significant individual differences in…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Figures5

Click any figure to enlarge with its caption.

Visualization of the successful navigation paths

Distribution of the hub-driven and proximity-driven scores

Comparison of the geographical and non-geographical successful navigation paths

Funding6

—http://dx.doi.org/10.13039/100005156Alexander von Humboldt-Stiftung
—http://dx.doi.org/10.13039/100010663H2020 European Research Council
—http://dx.doi.org/10.13039/100020668European Research Executive Agency
—http://dx.doi.org/10.13039/100010661Horizon 2020 Framework Programme
—http://dx.doi.org/10.13039/100010662H2020 Excellent Science
—Corvinus University of Budapest

Keywords

NavigationOnline experimentWikipediaGraph embedding

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWikis in Education and Collaboration · Misinformation and Its Impacts · Digital Games and Media

Full text

Introduction

Navigating from one place to another is a crucial ability for animals, enabling them to locate essential resources such as food, mates, and habitats [1, 2]. Seeking resources occurs not only in the physical space but also in more abstract spaces, such as when we look for the right person for assistance in the social space [3, 4], or when searching for an answer to a question online in the knowledge space [5]. With the accumulation of massive information online in the past decades, information overload has become a significant challenge for our generation, making efficient way-finding in the information space crucial [6]. To tackle this challenge, we first need to understand how we navigate the information space.

Studies of navigation behavior originated in the physical domain, where the cognitive map theory [7, 8] was developed to explain how humans and animals mentally represent spatial environments and determine routes within them. In recent years, this theory has been extended beyond physical space to encompass navigation in abstract domains, including social and informational spaces [1]. Notably, research has shown that “concept cells” in the hippocampus and entorhinal cortex—originally thought to encode spatial locations—also represent abstract concepts and social relationships, suggesting shared neural substrates across domains [9]. Social navigation (e.g., inferring social hierarchies or choosing allies) and information navigation (e.g., exploring knowledge networks or digital content) rely on similar cognitive processes, such as mapping, orientation, and decision-making under uncertainty. Understanding the connections between these forms of navigation can thus reveal fundamental principles of how the brain organizes, accesses, and utilizes complex, high-dimensional information. Such insights are crucial for designing better information systems, enhancing learning, and modeling human behavior in increasingly digital environments.

Previous research have found that our efficient navigation ability in social space is linked to the structure of the social network [10, 11]. The way we are connected socially is highly structured: we all possess different identities [12] and belong to groups characterized by specific social attributes [13, 14]. These group structures naturally form hierarchies, akin to the departmental organization in universities or companies, where individuals belong to groups, which in turn belong to larger groups. Watts et al. [10] and Kleinberg [11] proved that networks equipped with a hierarchical structure are navigable: utilizing a greedy decentralized search algorithm where one always chooses the next step to be the one that’s closest to the target, one can navigate to any target person in a small number of steps. This theory was later empirically confirmed by Adamic et al. [15] who demonstrated that given the email logs a greedy decentralized search could effectively leverage the organizational hierarchy to find short paths to the target. In fact, different hierarchies can be utilized: in the context of social navigation, individuals typically rely on either geographical or occupational hierarchies to facilitate their navigation [16, 17].

Does social navigation theory applies to navigation in the knowledge space? Targeted navigation in the information space similar to the Milgram’s experiment has been implemented and studied in the game setting on the Wikipedia [18], noted for its wide topic range and significant user interaction. In the popular Wikipedia navigation games such as Wikispeedia [19] and the Wiki Game [20], players are challenged to move from one Wikipedia article (source page) to another (target page) along a chain of hyperlinks of the visited Wikipedia articles. Researchers have uncovered intriguing patterns in the navigation game. Analysing 30,000 instances of players’ navigation paths, West et. al. [21] discovered that the players’ navigation typically consists of two phases: the zoom-out phase in the early game when players tend to visit high degree node (hub); and the home-in phase when players constantly decrease the conceptual distance to the target. Looking at the navigation decision players made at each step, studies have found that players’ navigation decisions are biased and stochastic: their current navigation decisions are biased by their previous decisions on the topical level [22], and they sometimes randomly select their next moves, particularly in the early stages of the game [23], which presents a trade-off of exploration and exploitation in searching behavior [24–26].

While previous research has shed light on how we navigate in general, a comprehensive understanding of how we navigate differently remains elusive. Studies on the cognitive social structure [27–29] have shown that we humans are very good at “filling in the blanks”: from our observation of social interactions, we tend to infer the interactions that are not directly observed and form an abstract representation (schema) of the social network that is highly structural, with categories and hierarchies [30]. These schemas are biased across individuals [31, 32] and can lead to different social navigation behaviors [33], and even social status differences: researchers found that people with a more accurate cognition of the advice network in the firm are rated as more powerful by their peers [34]. Aside from the schema differences, our status, power, and emotions all affect our navigation behavior: faced with a job threat, people experiencing low status and negative emotions tend to activate smaller and denser social networks to search for a job, while people experiencing the opposite tend to activate larger and sparser networks [35].

Moving from social navigation to navigation in the information space, understanding individual differences faces several challenges: although data for information foraging on the World Wide Web is abundant, raw web request logs with user’s information such as their IP address are considered very sensitive and therefore usually not available [36]. Online games such as the Wikispeedia and Wikigame produce massive amount of user navigation trails, but the absence of participants’ demographic information hampers the exploration of how individual traits impact navigation patterns. What’s more, despite extensive research on social navigation and knowledge navigation in their respective domain, an integrated understanding of the navigation strategies adopted in both processes is still lacking. To overcome these limitations, we conducted an online experiment where we hired 802 participants online from the United States to play nine rounds of the Wikipedia navigation games and then complete a survey (for details, see Sect. 3.1) that included questions about their demographic information and other factors potentially relevant to their navigation behavior. Building on the insights gained from previous studies on social navigation, we tailored our experiment to focus on navigation between social persons within the information space: the source and target pages for the navigation games in our study were selected to be well-known individuals from various professions, genders, and historical periods (see Fig. 1 for a list of the source and target pages of each game). Figure 1. Visualization of the successful navigation paths

Following previous observations that we utilize both the distance and hierarchy structure of the knowledge space to navigate, we trained a graph embedding on the English Wikipedia network to quantify the pairwise distance among the Wikipedia pages and calculated a hierarchical score for each Wikipedia page to measure its position in the knowledge hierarchy. We found that the split between hub-driven and proximity-driven tendencies is not only present within a single navigation game characterized by the zoom-out and home-in phases [21], but also at the individual level. This individual variance is statistically significant and cannot be overlooked. Demographic factors influence not only navigation performance, as demonstrated in our previous work [37], but also navigation strategies. Our study further connects social navigation to knowledge network navigation: individuals’ differing tendencies to use geographical and occupational information about the target person can be understood as different choices between hub-driven and proximity-driven strategies.

Results

Navigation paths in the knowledge space

Previous studies have shown that we utilize the semantic and hierarchical structure of the knowledge space to navigate [23]. Here we provide a more detailed picture of the participants’ navigation patterns with regards to the two structures. To quantify the distance between two Wikipedia articles, we trained a 64-dimensional embedding using the DeepWalk algorithm [38], which maps each Wikipedia article $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ a_{i} $\end{document}$ to a vector $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \vec{v}{i} $\end{document}$ in the embedding space where more related articles are placed closer together (see Sect. 3.2 for more details). This method allows us to measure how “close” two articles $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ a{i} $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ a_{j} $\end{document}$ are by the cosine similarity of their respective vectors $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \vec{v}{i} $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ \vec{v}{j} $\end{document}$ in the embedding space (Eq. (1)). In particular, the closeness score $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ c(a_{i}) $\end{document}$ of the article $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ a_{i} $\end{document}$ relative to the target Wikipedia page $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ a_{target} $\end{document}$ of the game is calculated as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ c(a_{i}) = c(a_{i}, a_{target}) $\end{document}$ .

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ c(a_{i}, a_{j}) = 1 + cos(\vec{v}_{i}, \vec{v}_{j}) $$\end{document}

Aside from distance, hierarchy can also be extracted from a network. Muchnik et al. [39] developed a local hierarchy measure, h, which represents an article’s position within Wikipedia’s knowledge hierarchy. Calculated from the in-degree $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ k_{in}(i) $\end{document}$ and out-degree $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ k_{out}(i) $\end{document}$ of the article $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ a_{i} $\end{document}$ on the Wikipedia network (Eq. (2)), this local hierarchy measure was validated as an invariant across different language editions of Wikipedia and performs comparably to more global hierarchy measures, such as hierarchical intermediacy and the attraction-basin hierarchy measure [39].

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ h(a_{i}) = \frac{k_{in}^{3/2}(i)+k_{out}^{3/2}(i)}{k_{in}(i) + k_{out}(i)} $$\end{document}

Figure 1 illustrates all successful navigation paths for the game with the source page “Barack Obama” and the target page “Vincent van Gogh” in terms of the hierarchical score h and closeness score c of the articles in the paths. As shown, some navigation paths are more “hub-driven” by ascending higher in the hierarchy to reach hubs before descending towards the target, while others are more “proximity-driven”, by maintaining a lower profile, aiming to minimize their distance to the target. This closely parallels the two navigation phases discovered previously [21]: a zoom-out phase in the early period of the game, characterized by the players’ tendency to visit high-degree nodes (hubs), and a home-in phase, when players consistently decrease the conceptual distance to the target. In the next section, we will show quantitatively that this difference in tendency present during the two phases in each game is also present at the individual level.

Hub-driven and proximity-driven strategies

The distinction between the hub-driven and proximity-driven navigation paths can be elucidated better through an analogy with transportation networks. In road networks, where shortcuts between destinations are limited, travelers often rely on a proximity-driven strategy, targeting at locations that are close to their final destination due to the necessity of traversing adjacent locations. Conversely, in networks rich with shortcuts, such as airline networks, a hub-driven strategy becomes more viable. Hubs, despite not necessarily being close to the final destination, offer extensive connections across numerous locations.

To quantify to what extent a navigation path is proximity-driven or hub-driven, we calculated a hub-driven score H and proximity-driven score C for each navigation path as the average hierarchical score h or closeness to the target score c of each article in the path. To be more specific, for a navigation path j consisting of articles $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ A_{j} = {a_{k}} $\end{document}$ , the hub-driven score $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ H(j) $\end{document}$ and proximity-driven score $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ C(j) $\end{document}$ are defined as:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{aligned} H_{j} &= \frac{1}{N_{j}} \sum _{a_{k} \in A_{j}} h(a_{k}) \end{aligned}$$ \end{document}

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{aligned} C_{j} &= \frac{1}{N_{j}} \sum _{a_{k} \in A_{j}} c(a_{k}, target) \end{aligned}$$ \end{document}

Where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ N_{j} = |A_{j}| $\end{document}$ is the number of the articles in the path, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ h(a_{k}) $\end{document}$ the hierarchical score (Eq. (2)) of article $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ a_{k} $\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ c(a_{k}, target) $\end{document}$ the closeness to the target page (Eq. (1)). To make the scores comparable across all nine games, we linearly scaled H and C to the range $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ [0, 1] $\end{document}$ using the min-max scaler.

Figure 2 visualizes the distribution and the relationship of the hub-driven and proximity-driven scores for each navigation path, with successful paths marked in orange and failed paths in gray. Among successful paths, we observe a negative correlation between the two scores across all nine games (M = −0.55, SD = 0.10), suggesting a trade-off between the two navigation strategies: while both approaches can lead to success, prioritizing one typically requires compromising the other. For failed paths, we identify two distinct patterns. A substantial proportion of failures correspond to paths with both low H and low C scores, indicating that players did not find useful clues during navigation. Other failed paths exhibit score distributions similar to those of successful paths, suggesting that these attempts were close to completion but ultimately unsuccessful due to constraints on time or number of steps. Figure 2. Distribution of the hub-driven and proximity-driven scores

Is hub-driven approach more effective than proximity-driven approach? To address this question, we implemented linear regression models to predict the performance of the players measured by the time (in seconds) and steps saved in the Speed-race and Least-clicks respectively. Here, we focus solely on successful navigation paths, positing that minimizing time and steps reflects superior performance. To control for other factors affecting players’ performance, we included the following covariates: a categorical variable $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ Game $\end{document}$ , indicating which of the nine games was played; $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ Round $\end{document}$ , an integer (1–9) representing the game round; two variables for participants’ self-reported prior knowledge of the source and target articles; and two performance measures, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ Steps $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ Seconds $\end{document}$ , representing the total number of steps taken and the duration of the game, respectively.

Table 1 shows that the effectiveness of navigation strategies is moderated by the game’s timing conditions. Specifically, in the Least-clicks games lacking a time restriction, both hub-driven and proximity-driven approaches can significantly improve performance. Conversely, in timed Speed-race games, while the hub-driven strategy remains beneficial, the proximity-driven strategy has the opposite effect. This difference may stem from the fact that it takes more time to identify pages closely related to the target, as opposed to directly jumping to a highly connected Wikipedia page. Given the negative correlation between the hub-driven and proximity-driven scores, we calculated the variance inflation factor (VIF) for each predictor in both models. Our analysis indicates that multicollinearity is not a significant concern with the maximum VIF value being 3.03. Table 1. Regression results for the fitness of the hub-driven and proximity-driven navigation strategies, measured as the seconds saved in the Speed-race games or steps saved in the Least-clicks gamesDependent variable:Seconds savedSteps savedSpeed-race gamesLeast-click gamesSteps−7.066^∗∗∗^ (0.345)Seconds−0.002^∗∗∗^ (0.0001)Hub-driven score3.248^∗∗^ (1.051)0.405^∗∗∗^ (0.041)Proximity-driven score−3.259^∗∗^ (1.103)0.200^∗∗∗^ (0.039)Source page knowledge3.268^∗∗^ (1.116)0.030 (0.037)Target page knowledge1.436 (1.032)0.034 (0.036)Game Round1.027^∗∗^ (0.321)−0.012 (0.011)Constant107.372^∗∗∗^ (4.010)3.209^∗∗∗^ (0.110)Observations11741707R^2^0.3670.203Adjusted R^2^0.3590.196Residual Std. Error27.904 (df = 1159)1.182 (df = 1692)F Statistic47.984^∗∗∗^ (df = 14; 1159)30.765^∗∗∗^ (df = 14; 1692)Note: ^∗^p<0.05; ^∗∗^p<0.01; ^∗∗∗^p<0.001

Individual differences

To understand whether participants exhibit a persistent tendency towards a hub-driven or proximity-driven strategy across the nine games, we fit a fixed effects model for the hub-driven score H and the proximity-driven score C of the navigation paths, with participants’ id ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ Participant $\end{document}$ ) as the fixed effect. To account for other factors that might affect these scores, we included the covariates specified in Sect. 2.2, along with two additional variables: a binary variable $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ Won $\end{document}$ , indicating whether the game was won, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ Type $\end{document}$ , specifying whether the game was Speed-race or Least-clicks. Lastly, we included an interaction term between $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ Game $\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ Won $\end{document}$ to account for potential correlations between these variables. To determine whether individual tendencies across the nine games are statistically significant, we also fitted a linear regression model using the same set of predictors, excluding the individual effect ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ Participant $\end{document}$ ).

Table 2 presents the regression results of the linear regression model (1) and (3) and the fixed effects model (2) and (4) predicting the hub-driven score and proximity-driven score, respectively. As shown in the table, for both scores, the fixed effects model fits the data better than the linear model without individual effects, resulting in a 12.4% and 11.4% increase in the adjusted $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ R^{2} $\end{document}$ values for H and C, respectively. An F-test was conducted to assess whether the inclusion of fixed effects significantly improved the model. The results indicate that the fixed effects were statistically significant, with $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ F(23, 5787) = 77.74 $\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ p < 0.001 $\end{document}$ for the fixed effects model (2) predicting H and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ F(23, 5787) = 79.29 $\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ p < 0.001 $\end{document}$ for the fixed effects model (4) predicting C, suggesting that accounting for individual effects significantly improves the model fit.

The results highlight individual differences at multiple levels. At the game level, participants who reported greater knowledge of the target article were more likely to adopt a proximity-driven strategy. In contrast, the hub-driven tendency was significant only in Speed-race games, which involve a time constraint, and not in Least-clicks games. Additionally, as participants played more rounds of the game, they were increasingly likely to adopt the hub-driven strategy, while no such learning effect was observed for the proximity-driven strategy.

At the individual level, a further analysis of individual differences suggests that personal characteristics influence navigation tendencies as well. Table 3 presents the results of a linear regression model predicting individual tendencies to use the hub-driven and proximity-driven strategies (estimated in the fixed effects model above) using participant demographic data. The results indicate that older participants tend to adopt a proximity-driven strategy, while younger participants prefer a hub-driven strategy. Additionally, white participants and female participants show a stronger inclination towards the proximity-driven strategy but do not exhibit a significant tendency—either positive or negative—towards the hub-driven strategy. In contrast, language proficiency and political stance do not appear to have a significant impact on navigation strategy. Note that the model includes the total number of games won and the number of Speed-race games played by each participant as covariates to control for their potential correlation with navigation strategy tendencies.

Interplay of geography and occupation

Previous research suggests that individuals primarily rely on geographic and occupational information when navigating social networks [15, 17]. In this study, we examine whether similar strategies apply to navigation within knowledge networks, specifically when the target is a Wikipedia article about a person. To identify navigation paths that leverage geographic information, we mapped the visited Wikipedia articles to their corresponding Wikidata entries. Wikidata is a free, collaborative, multilingual secondary knowledge base that collects structured data to support Wikipedia and other applications [40]. We classified a Wikipedia article as geography-related if its corresponding Wikidata entry contained a coordinate location specified by the P625 property. Using this approach, we successfully mapped 11,434 (99.3%) of the 11,511 visited articles to Wikidata, identifying 1715 as geography-related. A navigation path is classified as geographical if it includes at least one geography-related article in its sequence of visited pages; otherwise, it is categorized as non-geographical.

Figure 3A shows the percentage of successful navigation paths that are geographical. As shown, people’s tendency to use geographical information to navigate or not varies across nine games, Among the successful navigation paths, a maximum 92.1% of participants getting to the page “Pyotr Ilyich Tchaikovsky” from “Donald Trump” using geography-related pages, and a minimum of 20.9% from “Steve Jobs” to “Charlie Chaplin”. Among the paths that failed, we see similar patterns (Fig. 4): the ratio of geographical paths ranges from 59.3% to 26.2% across nine games. We also observed a strong correlation between the ratio of geographical paths across the nine games among the successful and failed navigation paths (Pearson r=0.92). Figure 3. Comparison of the geographical and non-geographical successful navigation paths

To further understand the distinction between geographical and non-geographical navigation paths, we analyzed each navigation sequence and classified a visited Wikipedia article as a hub or proximity clue if it had the highest hierarchical or proximity score among all visited articles. These clues reveal how participants leveraged Wikipedia’s hierarchical structure and semantic distances to navigate. Figure 5 shows the five most frequently visited hub and proximity clues in each game, along with the cumulative percentage of participants who used them, for both geographical and non-geographical paths. In geographical paths, more than half of the navigation sequences (M = 0.669, SD = 0.144) can be characterized by just five hub clues, which are predominantly well-known countries and cities, such as the “United States”. The proximity clues in geographical paths vary: some participants’ navigation rely heavily on geographical information, never reaching a proximity clue related to the target person’s occupation, while others combined geographical context with occupational information to reach the target. In non-geographical paths where no geography-related articles were visited, hub and proximity clues were primarily related to the target person’s profession. However, there were exceptions. For example, some participants reached Vincent van Gogh through “Protestantism”, leveraging the artist’s religious background, while others reached “Kanye West” not only through music-related articles but also via “Twitter” and “Kim Kardashian”, reflecting his presence in public discourse.

How do the geographical and occupational navigation strategies observed in social navigation relate to the hub-driven and proximity-driven strategies in knowledge network navigation? Fig. 3B and 3C show the proportion of navigation paths with a proximity-driven score exceeding C as a function of C (Fig. 3B) and the proportion of paths with a proximity-driven score above H as a function of H (Fig. 3C). These analyses include all successful navigation paths across nine games, for geographical and non-geographical paths respectively. Our results indicate that geographical paths are generally more hub-driven, whereas non-geographical paths tend to be proximity-driven. However, this distinction is not absolute, particularly for geographical paths. Some geographical navigation sequences also exhibit high proximity-driven scores, likely reflecting a mixed strategy that incorporates both geographical and occupational information. For failed navigation paths, Fig. 4 shows that the differences between geographical and non-geographical paths in terms of hub-driven and proximity-driven tendencies are less pronounced. This is likely because a considerable number of paths failed to reach effective hub or proximity clues, resulting in both low hub-driven and low proximity-driven scores.

Methods

The experiment

Our longitudinal study comprises two rounds of online experiments, the first conducted in January 2020 and the second in October 2023. Participants were sourced from Prolific [41], a well-regarded crowdsourcing platform for behavioral studies [42]. The experiments were conducted on the Qualtrics [43] platform, where we embedded Wikipedia navigation games into the Qualtrics survey using custom JavaScript, followed by a survey. We utilized the 20190820 English Wiki Dump [44] for the navigation games in both experiment rounds. This Wikipedia snapshot includes 5.9 million nodes and 133.6 million edges.

In our experiment, each participant plays nine rounds of the Wikipedia navigation game, followed by a survey. The source and target Wikipedia articles for each game are listed in Fig. 1. In each round, participants can choose between two types of games: (1) the Speed-race game, where they must navigate to the target page within 150 seconds, and (2) the Least-clicks game, where they must reach the target in no more than seven clicks. Before the game starts, participants have 60 seconds to read a brief introduction to both the source and target Wikipedia articles. After this, they must decide whether to play the Speed-race or Least-clicks game. During each game, the interface displays pages visited earlier in the current game on the left margin, allowing players to backtrack to any of those pages (backclick). Following the game session, the survey sessions commence with a Big Five personality test [45], assessing participants’ five personality traits: openness to experience, conscientiousness, extroversion, agreeableness, and neuroticism. Following this, we pose six categories of questions to gather information about participants’ i) employment status, ii) educational background, iii) spatial navigation habits, and their previous experience with iv) the Wikipedia navigation game, v) the Wikipedia website, and vi) computer games. We also inquire about demographic details, including age, gender, ethnicity, political stance, and language skills. An attention check question is included at the survey’s end, requiring participants to slide a bar to the left.

Embedding of the Wikipedia articles

To quantify the similarity between Wikipedia articles, we trained a 64-dimensional graph embedding for each Wikipedia page $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ a_{i} $\end{document}$ across the English Wikipedia graph G using the DeepWalk algorithm [38]. The Wikipedia network was first converted into an undirected graph, where two articles were linked if a directed link existed between them in either directions in the original directed graph. We used the default parameters of the DeepWalk algorithm for training, setting the representation size to 64, the window size to 5, the walk length to 40, and the number of walks to 10. Unless otherwise specified, all embedding vectors were subsequently normalized to have unit length for further calculations.

To assess the effectiveness of our embedding, we conducted tests using the WikipediaSimilarity 353 Test [46], an adaptation of the earlier dataset, WordSimilarity 353 Test [47], designed to evaluate semantic relatedness among words. Our graph embedding achieved a Spearman rank correlation score of 0.667 with the WikipediaSimilarity 353 test, demonstrating performance on par with the current best measures of semantic relatedness for Wikipedia pages [48].

Discussion

Our study investigated the navigation strategies employed by participants during navigation tasks on the Wikipedia network. We extended existing research on navigation in the information space by emphasizing individual differences—an aspect largely overlooked in prior studies, which primarily focused on aggregated navigation behavior. In addition, we drew a novel connection between information space navigation and social navigation by demonstrating that the choice to navigate by the occupational or geographical information of the target person in social navigation corresponds to the proximity-driven and hub-driven strategies identified in our analysis.

Using a graph embedding trained on the English Wikipedia network capturing the semantic distances among articles and a local hierarchical score measuring the articles’ position in the Wikipedia knowledge hierarchy, we identified two navigation strategies that the participants adopted to win the game. In the hub-driven strategy participants tend to “zoom out”, looking for articles higher in the Wikipedia hierarchy to navigate, while in the proximity-driven strategy participants “home in” at each step to articles that’s semantically closer to the target. We found that such split of strategies previously discovered as the zoom-out and home-in phases in [21] is not only present within one navigation game, but also at the individual level. Such individual variance is statistically significant and cannot be ignored. We analyzed the impact of demographic factors on individual tendencies to adopt hub-driven versus proximity-driven navigation strategies. Our findings indicate that age, gender, and ethnicity significantly influence these preferences. Our work connects the findings of social navigation to our navigation tasks on Wikipedia and shows that people’s different tendencies to use occupation or geography information of the target person to navigate can be understood as different choices between the hub-driven and proximity-driven strategies.

In our experiment, we implemented social navigation within the information space, where participants’ navigation trajectories reflect their thought processes rather than the people in their social networks. We observed that the division of occupational and geographical navigation paths discovered in previous work on social network navigation [16, 17, 49] also exist in information space navigation, suggesting that representation of the geographical origin and occupation of people may be foundational to our cognitive map of the social world. Indeed, prior research has indicated that our hippocampus is capable of representing abstract quantities, such as a person’s affiliations and power within social encounters [3], facilitating the search for suitable assistance in finding accommodation or employment.

Previous research on wayfinding in the information network [21] has studied the interplay between the degree and proximity of the nodes on the network within a single navigation trajectory. Our findings extend this by showing that this interplay occurs not only in the navigation process of individual players but also at a macro level across different players. Previous models of human navigation behavior typically adopt an aggregated approach, treating individuals uniformly [23]. We hypothesize that incorporating individual variability in navigation strategies could provide a more accurate explanation of the empirical data on knowledge navigation.

Our study has several limitations that should be considered. While our experiment focused on renowned individuals as the source and target of the navigation tasks, these can be extended to lesser-known individuals or even non-human concepts, such as objects, events, or theoretical ideas. The extent to which navigation strategies differ in such contexts remains an open question. Furthermore, our navigation tasks are specific to the targeted navigation scenario, which occurs less frequently in real life compared to more general information searches, potentially limiting the generalizability of our findings. Other research has examined more realistic “navigation in the wild” scenarios, particularly within Wikipedia, by analyzing web server logs or clickstream data from online users [36, 50].

Our study advances prior research by revealing individual differences in information space navigation strategies and linking these strategies to mechanisms observed in social navigation. A logical progression would be to introduce navigation tasks where source and target pages are not limited to well-known individuals, but instead include lesser-known individuals or non-human concepts such as objects, events, or theoretical ideas. Furthermore, investigating algorithms to enhance online navigation support presents a promising research direction.

Bibliography8

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Wikipedia, the free encyclopedia (2024) https://en.wikipedia.org/. Accessed 2024
2Wikispeedia (2024) https://dlab.epfl.ch/wikispeedia/play/. Accessed 2024
3The Wiki Game (2024) Wikipedia Game - Explore Wikipedia! https://www.thewikigame.com/. Accessed 2024
4Zhu M, Yasseri T, Kertész J (2023) Individual differences in knowledge network navigation. ar Xiv preprint. ar Xiv:2303.1003610.1038/s 41598-024-58305-2PMC 1137993138594309 · doi ↗ · pubmed ↗
5Wikidata (2024) https://www.wikidata.org/wiki/Wikidata:Introduction. Accessed 2024
6Prolific (2024) Quickly find research participants you can trust. https://www.prolific.com/. Accessed 2024
7Qualtrics XM (2024) Experience Management Software. https://www.qualtrics.com/uk/. Accessed 2024
8Wikimedia Downloads (2024) https://dumps.wikimedia.org/. Accessed 2024