Categorical Feature Encoding in SAS (Bayesian Encoders)

What is Bayesian Encoding?

Bayesian Encoding is a type of encoding that takes into account intra-category variation and the target mean when encoding categorical variables. It is a type of targeted encoding that comes with several advantages. For example, Bayesian Encoding requires minimal effort compared to other encoding methods.

In this blog post, we talk about the different Bayesian encoding techniques and how they work.

1. Target/Mean Encoding

Target or Mean Encoding is one of the most commonly used encoding techniques in Kaggle competitions.

Target encoding is where each class value of the categorical variable is replaced by the mean value of the target variable, with respect to the categorical class in the training dataset.

Hence, we have to specify the target variable in the SAS Mean Encoding Macro, as shown in the code below.

Check out this link to know more information about categorical variable encoding.

SAS Macro for Target/Mean Encoding
%macro mean_encoding(dataset,var,target);
   proc sql;
     create table mean_table as
     select distinct(&var) as gr, round(mean(&target),00.1) As mean_encode
     from &dataset
     group by gr;

     create table new as
     select d.* , m.mean_encode 
     from &dataset as d 
     left join mean_table as m
       on &var=m.gr;
   quit;
%mend;
2. Weight of Evidence Encoding

“Weight of Evidence (WoE) is a measure of the “strength” of a grouping technique that is used to separate good and bad. This method was developed primarily to build a predictive model to evaluate the risk of loan default in the credit and financial industry.

WoE will be 0 if the P(Goods) / P(Bads) = 1. That is, if the outcome is random for that group. If P(Bads) > P(Goods), the odds ratio will be < 1, and the WoE will be < 0. If, on the other hand, P(Goods) > P(Bads) in a group, then WoE > 0.

WoE is well suited for Logistic Regression because the logit transformation is simply the log of the odds, i.e. in(P(Goods)/P(Bads)). Therefore, by using WoE-coded predictors in Logistic Regression, the predictors are all prepared and coded to the same scale. The parameters in the linear logistic regression equation can be directly compared.

SAS Macro for Weight of Evidence Encoding
%macro woe_encoding(dataset,var,target);
   proc sql noprint;
     create table stats as
     select distinct(&var) as gr, round(mean(&target),00.1) as mean_encode 
     from &dataset
     group by gr;
   quit;

   data stats;
     set stats;
     bad_prob=1-mean_encode;
     if bad_prob=0 then bad_prob=0.0001;
     me_by_bp=mean_encode/bad_prob;
     woe_encode=log(me_by_bp);
   run;

   proc sql noprint;
     create table new as
     select d.* , s.woe_encode 
     from &dataset as d
     left join stats as s
       on &var=s.gr;
   quit;
 %mend;
3. Probability Ratio Encoding

“Probability Ratio Encoding” is similar to Weight Of Evidence, the only difference is the ratio of good and bad probability being used. For each label, we calculate the mean of target=1, that is, the probability of being 1 ( P(1) ), and also the probability of the target=0 ( P(0) ). Then, we calculate the ratio P(1)/P(0) and replace the labels by that ratio.

We need to add a minimal value with P(0) to avoid any divide by zero scenarios where for any particular category, there is no target=0. Check out this link for more information.

SAS Macro for Probability Ratio Encoding
%macro probability_encoding(dataset,var,target);
   proc sql noprint;
     create table stats as
     select distinct(&var) as gr, round(mean(&target),00.1) as mean_encode 
     from &dataset
     group by gr;
   quit;

   data stats;
     set stats;
     bad_prob=1-mean_encode;
     if bad_prob=0 then bad_prob=0.0001;
     prob_encode=mean_encode/bad_prob;
   run;

   proc sql noprint;
     create table new as 
     select d.* , s.prob_encode 
     from &dataset as d
     left join stats as s
       on &var=s.gr;
   quit;
 %mend;
Wrapping Up

Categorical Feature Encoding is an important part of cleaning up data for machine learning models. However, each method works in different circumstances so it is important to know about different techniques that fall under the Bayesian category.

If you want to take a look at how the coding operates in a SAS environment, you can find all the SAS Macro Definition code on my GitHub page here.

Dark Mode

blog-categorical-feature-encoding-bayesian (this link opens in a new window) by Selerity (this link opens in a new window)

SAS Macro examples for the Blog Post “Categorical Feature Encoding in SAS (Bayesian Encoders)”

Suraj Saini

Suraj Saini

>