HiT-JEPA: A Hierarchical Self-supervised Trajectory Embedding Framework for Similarity Computation

Lihuan Li; Hao Xue; Shuang Ao; Yang Song; Flora Salim

arXiv:2507.00028·cs.LG·July 2, 2025

HiT-JEPA: A Hierarchical Self-supervised Trajectory Embedding Framework for Similarity Computation

Lihuan Li, Hao Xue, Shuang Ao, Yang Song, Flora Salim

PDF

Open Access 3 Reviews

TL;DR

HiT-JEPA introduces a hierarchical self-supervised framework that learns multi-scale urban trajectory representations, effectively capturing both local details and global semantics for improved similarity computation.

Contribution

It presents a novel hierarchical architecture that unifies fine-grained and high-level trajectory features in a single model, advancing trajectory representation learning.

Findings

01

Outperforms existing methods on real-world datasets

02

Provides richer, multi-scale trajectory embeddings

03

Demonstrates effectiveness in similarity computation

Abstract

The representation of urban trajectory data plays a critical role in effectively analyzing spatial movement patterns. Despite considerable progress, the challenge of designing trajectory representations that can capture diverse and complementary information remains an open research problem. Existing methods struggle in incorporating trajectory fine-grained details and high-level summary in a single model, limiting their ability to attend to both long-term dependencies while preserving local nuances. To address this, we propose HiT-JEPA (Hierarchical Interactions of Trajectory Semantics via a Joint Embedding Predictive Architecture), a unified framework for learning multi-scale urban trajectory representations across semantic abstraction levels. HiT-JEPA adopts a three-layer hierarchy that progressively captures point-level fine-grained details, intermediate patterns, and high-level…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

S1. The paper is easy to read, and the argument that existing methods lack hierarchical representations is both interesting and important. S2. The methodology, although based on techniques used in previous studies, is sound and well-justified.

Weaknesses

W1. Although it is interesting to consider different levels of trajectory semantics, the authors do not provide sufficient experimental evidence to show the effectiveness of their approach. For example, it is unclear which representation layers (S1, S2, S3, or their combination) are used for the trajectory similarity search experiments. These details appear to be missing. If the proposed framework indeed works as claimed, the different representations should capture distinct semantic meanings of

Reviewer 02Rating 2Confidence 5

Strengths

S1. HiT-JEPA proposes an explicit three-layer architecture that learns and aligns representations at point, segment, and trajectory levels. S2. The method implements a novel attention propagation strategy between abstraction levels. And it upsamples higher-level attention maps to guide feature extraction at lower levels. S3. HiT-JEPA demonstrates exceptional zero-shot performance across heterogeneous datasets. Experimental design is rigorous and covers critical scenarios like self-similarity,

Weaknesses

W1. This manuscript has presented limited novelty comparing to prior works. The JEPA and multi-scale structure build heavily on established elements in self-supervised learning, attention propagation, and urban trajectory modeling. The direct design leap over T-JEPA, is incremental rather than a decisive paradigm shift, especially given already cited work such as HIBERT and HiCLRE in NLP and CV. So, this work simply applies these models to the trajectory dataset without making more targeted desi

Reviewer 03Rating 6Confidence 4

Strengths

- Address single-scale bias by introducing an explicit multi-scale hierarchy with information flow across levels. - Propose a JEPA-style training reduces reliance on heavy data augmentation; ablations indicate the benefit of hierarchical interactions. - Extensive experiments across datasets and zero-shot settings supports generalization claims.

Weaknesses

- Only two contrastive learning baselines are compared in the experiments. There are many trajectory representation learning works should be compared. - The interpretations of the results of HiT-JEPA are quite confusing. It would be better to explain why the two showcases can illustrate the interpretation of the proposed method. - For different trajectory abstraction method, the results would change. From equation (3)-(5), the sampling rate are 50% and 25% of orginal trajectories which may not

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Human Mobility and Location-Based Analysis · Automated Road and Building Extraction