Depth $F_1$: Improving Evaluation of Cross-Domain Text Classification by Measuring Semantic Generalizability
Parker Seegmiller, Joseph Gatto, and Sarah Masud Preum

TL;DR
Depth F1 is a new metric for cross-domain text classification that evaluates how well models perform on target samples dissimilar from the source, addressing limitations of existing evaluation methods.
Contribution
The paper introduces Depth F1, a novel metric that measures semantic generalizability by focusing on dissimilar target samples in cross-domain classification.
Findings
Depth F1 effectively highlights models' performance on dissimilar target samples.
Benchmarking shows varying model capabilities in semantic transfer.
Depth F1 complements existing metrics by providing deeper evaluation insights.
Abstract
Recent evaluations of cross-domain text classification models aim to measure the ability of a model to obtain domain-invariant performance in a target domain given labeled samples in a source domain. The primary strategy for this evaluation relies on assumed differences between source domain samples and target domain samples in benchmark datasets. This evaluation strategy fails to account for the similarity between source and target domains, and may mask when models fail to transfer learning to specific target samples which are highly dissimilar from the source domain. We introduce Depth , a novel cross-domain text classification performance metric. Designed to be complementary to existing classification metrics such as , Depth measures how well a model performs on target samples which are dissimilar from the source domain. We motivate this metric using standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
