Implementing NLPs in industrial process modeling: Addressing Categorical Variables
Eleni D. Koronaki, Geremy Loachamin Suntaxi, Paris Papavasileiou,, Dimitrios G. Giovanis, Martin Kathrein, Andreas G. Boudouvis, St\'ephane P., A. Bordas

TL;DR
This paper introduces a novel NLP-based embedding approach for categorical variables in industrial process modeling, improving feature importance analysis over traditional encoding methods.
Contribution
It proposes using NLP models to generate meaningful embeddings of categorical variables, combined with dimensionality reduction, for better process modeling.
Findings
Embeddings reflect actual meaning and similarities between categories.
Enhanced feature importance analysis compared to one-hot encoding.
Applicable to industrial processes with mixed input types.
Abstract
Important variables of processes are often categorical, i.e. names or labels representing, e.g. categories of inputs, or types of reactors or a sequence of steps. In this work, we use Natural Language Processing Models to derive embeddings of such inputs that represent their actual meaning, or reflect the "distances" between categories, i.e. how similar or dissimilar they are. This is a marked difference from the current standard practice of using binary, or one-hot encoding to replace categorical variables with sequences of ones and zeros. Combined with dimensionality reduction techniques, either linear such as Principal Component Analysis, or nonlinear such as Uniform Manifold Approximation and Projection, the proposed approach leads to a meaningful, low-dimensional feature space. The significance of obtaining meaningful embeddings is illustrated in the context of an industrial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Statistical and Computational Modeling · Modeling, Simulation, and Optimization
