Techniques of Feature Scaling with SAS Custom Macro

What is Feature Scaling?

Feature scaling is a process that is used to normalize data, it is one of the most preponderant steps in data pre-processing. Feature scaling is done before feeding data into machine learning, deep learning and statistical algorithms/models. In most cases, it has been noticed that the performance of the models increases when features are scaled, especially in models that are based on Euclidian distance. Normalization and Standardization are the two main techniques of feature scaling. I am going to define and explain how we can implement different feature scaling techniques in SAS Studio or Base SAS by using SAS Macro facility.

What is Normalization?

Normalization is the process of feature scaling in which data values are rescaled or bound into two values, most commonly between (0, 1) or (–1, 1). Min_MaxScaler and Mean_Normalization are very common examples of Normalization.

1. Min_MaxScaler

It ranges /rescales the data values between 0 and 1, the mathematical formula is here.

1.3 What does Min_MaxScaler SAS Macro do behind the scenes?

Min_MaxScaler takes the variable that you want to scale and creates a new variable “MMVariableName” with scaled values. It also creates a univariate report where you can see the histogram of both the Actual Variable and the new Scaled Variable.

2. Mean_Normalization

It rescales the data values between (–1, 1), the mathematical formula is here.

2.3 What does Mean_Normalization SAS Macro do behind the scenes?

Mean_Normalization takes the variable that you want to scale and creates a new variable “MNVariableName” with scaled values. It also creates a univariate report where you can see the histograms of both the Actual Variable and the new Scaled Variable.

What is Standardization?

Standardization is a technique of feature scaling in which data values are centered around the mean with 1 standard deviation, which means after the standardization, data will have a zero mean with a variance of 1.

“Standardization assumes that your observations fit a Gaussian distribution (bell curve) with a well-behaved mean and standard deviation. You can still standardize your data if this expectation is not met, but you may not get reliable results.”

https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/

3. Standard_Scaler

It rescales the distribution of data values so that the mean of the observed value will be 0 and standard deviation equals to 1, the mathematical formula is here.

3.3 What does Standard_Scaler SAS Custom Macro do behind the scenes?

Standard_Scaler takes the variable that you want to scale and creates a new variable “SDVariableName” with scaled values. It also creates a univariate report where you can see the histogram of both the Actual Variable and the new Scaled Variable.

4. Robust_Scaler

A Robust_Scaler converts the data values. First, by subtracting the median for the data values, then dividing by IQR, which is the Inter Quartile Range (3Quantile – 1Quantile), which means it centers the median value at zero and very robust method for outliers. The mathematical formula is here.

4.3 What does Robust_Scaler SAS Custom Macro do behind the scenes?

Robust_Scaler takes the variable that you want to scale and creates a new variable “RSVariableName” with scaled values. In the work library, it will create a STAT table where you can find the Median, Quantile 1 and Quantile 3 values to verify your results. It also creates a univariate report where you can see the histograms of both the Actual Variable and the new Scaled Variable.