LifeIR at the NTCIR-18 Lifelog-6 Task
Jiahan Chen, Da Li, Keping Bi

TL;DR
This paper presents a multi-stage pipeline for retrieving relevant images from large-scale lifelogs using textual queries, incorporating filtering, query rewriting, event-based extension, and reranking with multimodal large language models, demonstrating improved retrieval effectiveness.
Contribution
It introduces a novel multi-stage pipeline for lifelog image retrieval that combines filtering, query rewriting, event-based candidate extension, and reranking with advanced multimodal models.
Findings
Each stage improves retrieval accuracy
The pipeline outperforms baseline methods
Multimodal reranking enhances relevance judgment
Abstract
In recent years, sharing lifelogs recorded through wearable devices such as sports watches and GoPros, has gained significant popularity. Lifelogs involve various types of information, including images, videos, and GPS data, revealing users' lifestyles, dietary patterns, and physical activities. The Lifelog Semantic Access Task(LSAT) in the NTCIR-18 Lifelog-6 Challenge focuses on retrieving relevant images from a large scale of users' lifelogs based on textual queries describing an action or event. It serves users' need to find images about a scenario in the historical moments of their lifelogs. We propose a multi-stage pipeline for this task of searching images with texts, addressing various challenges in lifelog retrieval. Our pipeline includes: filtering blurred images, rewriting queries to make intents clearer, extending the candidate set based on events to include images with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Medical Imaging Techniques and Applications
