# Techniques of Feature Scaling with SAS Custom Macro

#### What is Feature Scaling?

**Feature scaling** is a process that is used to normalize data, it is one of the most preponderant steps in data pre-processing. Feature scaling is done before feeding data into machine learning, deep learning and statistical algorithms/models. In most cases, it has been noticed that the performance of the models increases when features are scaled, especially in models that are based on Euclidian distance. **Normalization** and **Standardization** are the two main techniques of **feature scaling**. I am going to define and explain how we can implement different feature scaling techniques in SAS Studio or Base SAS by using SAS Macro facility.

#### What is Normalization?

Normalization is the process of feature scaling in which data values are rescaled or bound into two values, most commonly between (0, 1) or (–1, 1). Min_MaxScaler and Mean_Normalization are very common examples of Normalization.

##### 1. Min_MaxScaler

It ranges /rescales the data values between 0 and 1, the mathematical formula is here.

###### 1.1 How can you use Min_MaxScaler in SAS?

###### 1.2 Min_MaxScaler SAS Custom Macro Definition

###### 1.3 What does Min_MaxScaler SAS Macro do behind the scenes?

Min_MaxScaler takes the variable that you want to scale and creates a new variable “MMVariableName” with scaled values. It also creates a univariate report where you can see the histogram of both the Actual Variable and the new Scaled Variable.

##### 2. Mean_Normalization

It rescales the data values between (–1, 1), the mathematical formula is here.

###### 2.1 How can you use Mean_Normalization in SAS

###### 2.2 Mean_Normalization SAS Custom Macro Definition

###### 2.3 What does Mean_Normalization SAS Macro do behind the scenes?

Mean_Normalization takes the variable that you want to scale and creates a new variable “MNVariableName” with scaled values. It also creates a univariate report where you can see the histograms of both the Actual Variable and the new Scaled Variable.

#### What is Standardization?

Standardization is a technique of feature scaling in which data values are centered around the mean with 1 standard deviation, which means after the standardization, data will have a zero mean with a variance of 1.

*“Standardization assumes that your observations fit a Gaussian distribution (bell curve) with a well-behaved mean and standard deviation. You can still standardize your data if this expectation is not met, but you may not get reliable results.”*

https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/

##### 3. Standard_Scaler

It rescales the distribution of data values so that the mean of the observed value will be 0 and standard deviation equals to 1, the mathematical formula is here.

###### 3.1 Standard_Scaler in SAS

###### 3.2 Standard_Scaler SAS Custom Macro Definition

###### 3.3 What does Standard_Scaler SAS Custom Macro do behind the scenes?

Standard_Scaler takes the variable that you want to scale and creates a new variable “SDVariableName” with scaled values. It also creates a univariate report where you can see the histogram of both the Actual Variable and the new Scaled Variable.

#### 4. Robust_Scaler

A Robust_Scaler converts the data values. First, by subtracting the median for the data values, then dividing by IQR, which is the Inter Quartile Range (3Quantile – 1Quantile), which means it centers the median value at zero and very robust method for outliers. The mathematical formula is here.

###### 4.1 How can you use Robust_Scaler in SAS?

###### 4.2 Robust_Scaler SAS Custom Macro Definition

###### 4.3 What does Robust_Scaler SAS Custom Macro do behind the scenes?

Robust_Scaler takes the variable that you want to scale and creates a new variable “RSVariableName” with scaled values. In the work library, it will create a STAT table where you can find the Median, Quantile 1 and Quantile 3 values to verify your results. It also creates a univariate report where you can see the histograms of both the Actual Variable and the new Scaled Variable.