Foundations of Unknown-aware Machine Learning
Xuefeng Du

TL;DR
This paper develops theoretical and algorithmic foundations for unknown-aware machine learning, enabling models to recognize and handle novel, out-of-distribution inputs, especially in large language models, to improve AI safety and reliability.
Contribution
It introduces new frameworks and methods for unknown-aware learning, including outlier synthesis, unlabeled data utilization, and safety tools for foundation models, advancing AI reliability in open-world settings.
Findings
Novel outlier synthesis methods improve OOD detection.
Unlabeled data enhances model reliability and formal guarantees.
Safety tools for large language models mitigate hallucinations and malicious prompts.
Abstract
Ensuring the reliability and safety of machine learning models in open-world deployment is a central challenge in AI safety. This thesis develops both algorithmic and theoretical foundations to address key reliability issues arising from distributional uncertainty and unknown classes, from standard neural networks to modern foundation models like large language models (LLMs). Traditional learning paradigms, such as empirical risk minimization (ERM), assume no distribution shift between training and inference, often leading to overconfident predictions on out-of-distribution (OOD) inputs. This thesis introduces novel frameworks that jointly optimize for in-distribution accuracy and reliability to unseen data. A core contribution is the development of an unknown-aware learning framework that enables models to recognize and handle novel inputs without labeled OOD data. We propose new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Anomaly Detection Techniques and Applications
MethodsVOS
