FalconFS: Distributed File System for Large-Scale Deep Learning Pipeline
Jingwei Xu, Junbin Kang, Mingkai Dong, Mingyu Liu, Lu Zhang, Shaohong Guo, Ziyan Qiu, Mingzhen You, Ziyi Tian, Anqi Yu, Tianhong Ding, Xinwei Hu, and Haibo Chen

TL;DR
FalconFS is a distributed file system designed for deep learning pipelines that eliminates client-side caching, improves server-side path resolution, and significantly enhances throughput, proven in real-world deployment.
Contribution
The paper introduces FalconFS, a stateless-client DFS optimized for deep learning, with hybrid metadata indexing and lazy namespace replication, outperforming existing systems.
Findings
Up to 5.72× throughput for small file operations
Up to 12.81× throughput for deep learning training
Deployed in Huawei's autonomous driving system for one year
Abstract
Client-side metadata caching has long been considered an effective method for accelerating metadata operations in distributed file systems (DFSs). However, we have found that client-side state (e.g., caching) is not only ineffective but also consumes valuable memory resources in the deep learning pipelines. We thus propose FalconFS, a DFS optimized for deep learning pipelines with the stateless-client architecture. Specifically, instead of performing client-side path resolution and caching, FalconFS efficiently resolves paths on the server side using hybrid metadata indexing and lazy namespace replication. FalconFS also boosts server concurrency with concurrent request merging and provides easy deployment with VFS shortcut. Evaluations against CephFS and Lustre show that FalconFS achieves up to 5.72 throughput for small file read/write and up to 12.81 throughput for deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
