Categorical variables, like city names or policy types, are essential in insurance and banking models, but they pose structural issues.
Since models can’t process text directly, categorical data is usually transformed through One-Hot Encoding. This method can increase model complexity, inflate the number of variables, and lead to overfitting. Even worse, similar categories can be treated as distinct, introducing inconsistencies.
Classic regularisation methods like lasso help reduce complexity but come with major limitations. One key issue is that categories can only be merged with a chosen reference category, which might not represent the average or most useful case. As a result, this limits flexibility and can lead to distorted groupings.
To solve this, Earnix has introduced Smart Grouping, a feature built into its Auto-GLM tool, part of the company’s Model Accelerator platform.
Smart Grouping uses a two-step process. First, it ranks categories in a regularised multivariate model; second, it applies variable fusion to merge similar categories, improving both simplicity and performance.
The result is a sparse, more interpretable model. Groupings reflect true relationships with the outcome variable, while accounting for interactions with other covariates. Smart Grouping also reduces overfitting by using validation to guide category merging.
Already implemented in Auto-GLM, Smart Grouping offers insurers and banks a practical way to enhance the accuracy and transparency of their predictive models.
Read the full blog from Earnix here.
Read the daily FinTech news here
Copyright © 2025 FinTech Global



