TL;DR
This paper introduces Generalized Funnelling (gFun), an advanced ensemble method for cross-lingual text classification that leverages heterogeneous document embeddings and multiple correlation types to improve accuracy.
Contribution
It extends the Funnelling approach by incorporating diverse language-independent representations and correlation embeddings, significantly enhancing classification performance.
Findings
gFun outperforms Funnelling and state-of-the-art baselines on large multilingual datasets.
Incorporating multiple correlation embeddings improves classification accuracy.
The proposed method is effective for multilingual multilabel text classification.
Abstract
\emph{Funnelling} (Fun) is a recently proposed method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a metaclassifier that uses this vector as its input. The metaclassifier can thus exploit class-class correlations, and this (among other things) gives Fun an edge over CLTC systems in which these correlations cannot be brought to bear. In this paper we describe \emph{Generalized Funnelling} (gFun), a generalization of Fun consisting of an HTL architecture in which 1st-tier components can be arbitrary \emph{view-generating functions}, i.e.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
