In a previous blog post, we talked about feature scaling techniques and implemented this in Base SAS using SAS Macros. However, in this blog post, we are going to focus on feature scaling and transformation techniques using SAS Macros. Here I will explain what they are and the scenarios they are best applied to.

Absolute Maximum Scaler (Abs_MaxScaler) is a feature scaling technique where data points are divided by the maximum data value. It will then rescale the value between -1 and 1.

While Abs_MaxScaler has its advantages, there are some drawbacks. The biggest disadvantage of Abs_MaxScaler is that the data values are affected by outlier values. The mathematical formula is here:

Abs_MaxScaler takes the variable that you want to scale and creates a new variable “AMVariableName” with scaled values. It also creates a univariate report where you can see the histograms of the actual variable and the new scaled variable.

I will be discussing feature transformation techniques that follow normal distribution (also known as Gaussian distribution). Along with the discussion, I will also use some advanced SAS Macro programming to implement feature transformation in base SAS.

**What is Feature Transformation?**

It is the process of transforming data from one representation to another, with the help of statistical and mathematical functions, while also retaining the information from the data. There are many different transformations that convert data distribution into normal distribution, but I will mostly use the five fundamental transformations: Log, Reciprocal, Square-Root, Exponential, and Box-Cox.

Feature Transformation is important because it makes it easier for machine learning and statistical models to understand your data and make accurate predictions. Furthermore, users require less learning and training time to get the required results; however, this does not apply for every ML, DL, and statistical algorithm.

There are statistical algorithms and models that assume data is normally distributed. For example, during regression analysis, feature transformation must follow normal distribution, otherwise it will deliver the wrong results.

1. **Log Transformation:** This transformation is the best solution when data is skewed or has outliers that impact distribution. It will convert data using a log function.

2. **Reciprocal Transformation:** This transformation is not very effective when we compare it with the others because it has little effect on the shape of the distribution. It converts the data to the inverse of its value. For example, 3 will be transformed into 1/3. It is only used for non-zero data values.

3. **Exponential Transformation:** It converts data with the exp () function given in most programming languages into “e to the power of x.”

4. **Square-Root Transformation:** As its name suggests, it transforms the data value into the square-root of its value, which means we can apply it on data with zero value. It should be noted that square-root transformation is less effective than log transformation. This is because it has less effect on distribution compared to **log transformation**.

5. **Box-Cox Transformation:** For a Box-Cox Transformation, the data value must be positive. It works well on data with an even nature and is the most commonly used transformation in the statistics field. In the transformation formula, if lambda is zero, then log takes place and the value of lambda varies between -5 to 5.

You can call this macro with **%BoxCox (dataset, variable)**.

dataset = your dataset name with library for instance **sashelp.cars**.

variable = name of the variable

%BoxCox will only take the variable if the data values are positive, otherwise, it will give an error message – “Your data has negative values, hence you cannot apply BoxCox transformation.” If data is non-negative, it will create a new temporary dataset with “Acul_Name of_your_Variable” and transformed variable “TAcul_Name of_your_Variable”. It will also generate a univariate report of transformed variables where you can check the normal distribution.

You can call this macro with **%Transform (dataset, variable, type)**.

dataset = your dataset name with library e.g. sashelp.cars

variable = name of the variable

type = you can select only one type of transformation (**Log, Square-Root, Reciprocal, and Exponential**)

The %Transform macro will take a variable and convert it into a transformation based on your selection in the “type” argument. Then, it generates a univariate report explaining the distribution of the transformed variable.

/* SAS Macro Definition*/

Code is available on GitHub here: https://github.com/Suraj-617/Blogs/blob/master/Techniques%20of%20Feature%20Scaling%20with%20SAS%20Custom%20Macro-B.sas

You must be logged in to post a comment.