Tackling categorical complexity in GLMs with Smart Grouping

General Linear Models (GLMs) remain a cornerstone of predictive modelling for insurers, offering transparency and statistical rigour. However, effectively managing categorical features within these models continues to be a complex challenge. Earnix is investigating new techniques to simplify and enhance this process, aiming to deliver more accurate, interpretable, and efficient models for the financial services industry.
General Linear Models (GLMs) remain a cornerstone of predictive modelling for insurers, offering transparency and statistical rigour. However, effectively managing categorical features within these models continues to be a complex challenge. Earnix is investigating new techniques to simplify and enhance this process, aiming to deliver more accurate, interpretable, and efficient models for the financial services industry.

Categorical variables, like city names or policy types, are essential in insurance and banking models, but they pose structural issues.

Since models can’t process text directly, categorical data is usually transformed through One-Hot Encoding. This method can increase model complexity, inflate the number of variables, and lead to overfitting. Even worse, similar categories can be treated as distinct, introducing inconsistencies.

Classic regularisation methods like lasso help reduce complexity but come with major limitations. One key issue is that categories can only be merged with a chosen reference category, which might not represent the average or most useful case. As a result, this limits flexibility and can lead to distorted groupings.

To solve this, Earnix has introduced Smart Grouping, a feature built into its Auto-GLM tool, part of the company’s Model Accelerator platform.

Smart Grouping uses a two-step process. First, it ranks categories in a regularised multivariate model; second, it applies variable fusion to merge similar categories, improving both simplicity and performance.

The result is a sparse, more interpretable model. Groupings reflect true relationships with the outcome variable, while accounting for interactions with other covariates. Smart Grouping also reduces overfitting by using validation to guide category merging.

Already implemented in Auto-GLM, Smart Grouping offers insurers and banks a practical way to enhance the accuracy and transparency of their predictive models.

Read the full blog from Earnix here. 

Read the daily FinTech news here

Copyright © 2025 FinTech Global

Enjoying the stories?

Subscribe to our daily FinTech newsletter and get the latest industry news & research

Investors

The following investor(s) were tagged in this article.