Clid: Identifying TLS Clients With Unsupervised Learning on Domain Names

Ihyun Nam; Gerry Wan

arXiv:2410.02040·cs.NI·October 4, 2024

Clid: Identifying TLS Clients With Unsupervised Learning on Domain Names

Ihyun Nam, Gerry Wan

PDF

Open Access 1 Repo

TL;DR

Clid is an unsupervised learning tool that identifies TLS clients by clustering domain names from SNI fields, providing broad client insights without relying on outdated rule-based databases.

Contribution

This paper introduces Clid, a novel unsupervised clustering approach using Bayesian optimization to identify TLS clients based on domain name associations.

Findings

01

Clid successfully identifies strongly associated domain names for at least 60% of clients.

02

Clid outperforms rule-based methods in dynamic network environments.

03

Clid can adapt to large-scale TLS datasets with millions of handshakes.

Abstract

In this paper, we introduce Clid, a Transport Layer Security (TLS) client identification tool based on unsupervised learning on domain names in the server name indication (SNI) field. Clid aims to provide some information on a wide range of clients, even though it may not be able to identify a definitive characteristic about each one of the clients. This is a different approach from that of many existing rule-based client identification tools that rely on hardcoded databases to identify granular characteristics of a few clients. Often times, these tools can identify only a small number of clients in a real-world network as their databases grow outdated, which motivates an alternative approach like Clid. For this research, we utilize some 345 million anonymized TLS handshakes collected from a large university campus network. From each handshake, we create a TCP fingerprint that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ihyunnam/clid
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification