Generalized Funnelling: Ensemble Learning and Heterogeneous Document   Embeddings for Cross-Lingual Text Classification

Alejandro Moreo; Andrea Pedrotti; Fabrizio Sebastiani

arXiv:2110.14764·cs.CL·February 9, 2022

Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification

Alejandro Moreo, Andrea Pedrotti, Fabrizio Sebastiani

PDF

1 Repo

TL;DR

This paper introduces Generalized Funnelling (gFun), an advanced ensemble method for cross-lingual text classification that leverages heterogeneous document embeddings and multiple correlation types to improve accuracy.

Contribution

It extends the Funnelling approach by incorporating diverse language-independent representations and correlation embeddings, significantly enhancing classification performance.

Findings

01

gFun outperforms Funnelling and state-of-the-art baselines on large multilingual datasets.

02

Incorporating multiple correlation embeddings improves classification accuracy.

03

The proposed method is effective for multilingual multilabel text classification.

Abstract

\emph{Funnelling} (Fun) is a recently proposed method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a metaclassifier that uses this vector as its input. The metaclassifier can thus exploit class-class correlations, and this (among other things) gives Fun an edge over CLTC systems in which these correlations cannot be brought to bear. In this paper we describe \emph{Generalized Funnelling} (gFun), a generalization of Fun consisting of an HTL architecture in which 1st-tier components can be arbitrary \emph{view-generating functions}, i.e.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andreapdr/gfun
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.