Users of SAS Studio and other SAS Viya programming clients are used to having their operating system home-directories available while they work.Gerry Nelson in his article SAS Viya: making user home directories available to compute
System Administrators have been dealing with this scenario for decades now, and established methods of making a personalised, secure home directory available to users now usually rely on NFS or CIFS/SMB.
Viya provides the ability to make home directories served by NFS available to applications that use the Programming Run-Time Servers (such as SAS Studio). You do this by specifying the NFS server details during deployment. If your NFS server and Identity Provider are already used to serve home directories to other applications then the documented defaults will work great – but what if that isn’t the case?
Under the covers, Viya is running in a Linux environment (within containers, within Kubernetes). Each user in Linux is assigned a unique User ID (
uid is what allows a user to access their own personal home directory, among other things. Viya on its own has no idea what
uid is assigned to what user, unless that information is provided by the Identity Provider. If you are leveraging an existing NFS Server backed by an Identity Provider that is already in use with that NFS Server, then there is a good chance (but not guaranteed) that your Identity Provider already has the required posix attributes to provide the
uid to Viya.
If you are using Active Directory as your Identity Provider then there is a good chance that you don’t have these attributes. In this case Viya will generate a
uid (and Group ID, or
gid) for each user and store it internally. This allows Viya to kick off compute sessions using the
uid it has generated, but for obvious reasons nothing outside Viya knows about this
uid – which means when it comes to accessing a user’s home directory on NFS, the
uid will most likely not match the
uid on the home directory.
But there is a problem even before we get to the matching
uid problem – how does the system know it even needs to create a home directory on NFS for the user? In a traditional Linux environment this is taken care of by PAM, leveraging methods such as pam_mkhomedir. This in turn relies on the Identity Provider of the operating system. In fact this is what the earliest solution to this problem used. In Viya 3.4 a more integrated solution was provided directly in the deployment process, followed by further updates in Viya 3.5. Unfortunately starting with Viya 2020.x these methods no longer work.
Thankfully, in November 2011 Sample 68620: Create user home directories from the identities service in SAS® Viya® 2020.x using a script was released, which provides a bash script that will extract the
uid generated internally by Viya and then create home directories with the
uid that Viya expects.
This script works great, but the “how to” of getting it running and integrated into your Viya Kubernetes environment is left to the user.
To install this solution you should be familiar with Kubernetes and Helm, as well as have the details of the NFS Server used during your Viya Deployment. Here is all that is needed to get this deployed:
helm repo add selerity https://selerity.github.io/helm-charts helm repo update helm upgrade -i -n[VIYA_NAMESPACE] \ [RELEASE_NAME] selerity/viya4-home-dir-builder \ --set viya.base_url=[VIYA_BASE_URL] \ --set nfs.server=[NFS_SERVER_NAME]
This will create a Kubernetes Cron Job that must be triggered manually, and when you do trigger it will only report on what it will do (it won’t create or update anything). This will let you view the logs to see what it would do if it was enabled. The parameters above are:
VIYA_NAMESPACE– the namespace you have deployed Viya to
RELEASE_NAME– any string you want to use as the name of this deployment
VIYA_BASE_URL– the URL to your Viya deployment
NFS_SERVER_NAME– the hostname/IP of the NFS Server you specified in your Viya deployment
helm upgrade -i -nviya \ thor selerity/viya4-home-dir-builder \ --set viya.base_url=https://viya.server.com \ --set nfs.server=mynfs.server.com
After a successful install you will be presented with instructions on how to view/trigger/etc. the Cron Job. If you are happy that the process will work correctly in your environment (after reviewing the logs of a sample run) you can enable it to create/update home directories by adding the
--set dry_run=0 option on the Helm command, and if you want to enable it to run on a schedule also add the
--set suspend=false option. Further details are available in the Helm Chart.
viya4-home-dir-builder: Create home directories for SAS Viya 4 Users— Open in Artifact Hub
If you find any issues with our Charts or have ideas for improvements, please raise an Issue here.
|Auto Creation of Linux Home Directories for SAS Users||Paul Homes|
|SAS Viya 3.4 Automatic Home Directories||Stuart Rogers|
|SAS Viya 3.5 Automatic Home Directories||Stuart Rogers|
|SAS Viya: making user home directories available to compute||Gerry Nelson|
|Sample 68620: Create user home directories from the identities service in SAS® Viya® 2020.x using a script||Greg Wootton|
|SAS Viya Operations 2022.1 | Deployment | Installation | Common Customizations | Change the Location of the NFS Server||SAS|
|SAS Viya Administration 2022.1 | Security | Identity Management||SAS|
Big data analytics have become a standard in many industries and it has been a game-changer for many businesses around the world.
Around 55% of companies around the world use big data analytics to improve their performance and keep an eye out for changes in the market and customer behaviour.
Over the years, big data analytics have opened new horizons for all kinds of industries. That being said, big data is also being used in some very surprising ways that we would have never imagined in the past.
In this post, we’ll take a look at these non-traditional applications of big data analysis.
Smart transport solutions are quickly becoming a feature in most modern cities around the world and where there is smart transportation, smart parking technology will follow.
Today, real-time big data and information from the payment systems in parking lots are used to provide smart parking solutions to drivers.
With big data on weather patterns, daily events, the amount of time a car spends in the lot and the time of day, parking lot staff can find ways to maximise parking prices and utilise the space in the parking lots effectively.
Organisations that require large parking spaces, like hospitals, airports and community centres can optimise their revenue and staffing strategies effectively thanks to big data analytics.
According to Wen Sang, the CEO of Smarking, airports generate about 20% of their revenue through parking; this revenue can potentially increase if airports adopt big data analytics.
Demographics like age, race, social standing, gender and sexuality play a major role in determining how potential customers will react to marketing. Understanding the emotions that these marketing campaigns and advertisements instil is also important.
The emotional effect an advertisement has on people will determine how they will see the product and how they will interact with the business. Now, big data analytics can be used to measure the emotional impact these campaigns have on people.
Data collected using facial recognition software on videos or photographs of people reacting to the advertisements is analysed to gain insight into what emotions people feel when they viewed the advertisements.
If people displayed emotions a business expected from them, the advertisement would be a success and they can predict how potential customers will approach them and their product.
Using this emotional data, businesses can further optimise their campaigns for desired reactions from their customers.
Big data analytics on emotions is also being used in the movie industry, especially to measure how people react during horror movies. With big data, movie studios can identify what kind of content brings out fear in their audience and make horror movies that scare people the way they want to be scared.
Film production companies want their movies to engage their audience and their cast to be relatable.
Nowadays, film companies use big data gathered from streaming services and social media to get an idea of the kind of stories people want to watch, with the actors that viewers feel are most suited for certain roles.
According to some filmmakers, movies have become very commercialised and tend to follow similar patterns and stories. Major film companies do this to protect their investment in these movies.
Big data helps film companies have a better understanding of the current trends and what moviegoers are interested in, enabling the creation of more unique and diverse content that can satisfy a range of audiences.
Every decade or so, a new musician walks into the scene and quickly becomes popular with the masses.
Thanks to the internet, social media and data collection technology, it’s possible to predict who will become a superstar in the music industry.
Recording companies are always keeping an eye out for new talent and today, they use big data analysis to find their next profitable music icon.
These companies use data analysis software to gather big data from a potential music icon’s social media to gauge their popularity and decide if they are worth investing in. They can also use this data to identify which social media platforms the musician has the biggest following on, and use this platform for their marketing campaigns.
The number of potential uses for big data analytics is seemingly endless.
From filmmaking to measuring emotions, big data analytics can cater to virtually every industry and leverage their success.
The new containerised version of SAS Analytics Pro from the SAS Institute opens up a world of possibilities for leveraging third-party technologies to enhance what is already a pretty powerful Data and Analytics platform.
One of these technologies that has really taken off and helped Data Scientists take advantage of a unified programming experience regardless of the language used is Project Jupyter. A core feature of Project Jupyter is known as a Notebook, and this is explained on the Jupyter site as “…an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text“. This not only makes it an easy, pleasant experience to work in but also facilitates the ability to present complex processes in a nice visual manner to non-programmers – kind of like reading through a notebook 🙂
Jupyter can also be used for SAS programming, and as part of our SAS Analytics Pro Launcher (available at https://github.com/Selerity/sas-analytics-pro/releases) you can enable this functionality with a simple change to the settings file!
If you have followed our previous post, Cloud-native SAS Analytics Pro – for your Desktop!, you will have a functioning SAS Analytics Pro container environment that leverages our custom Launcher (available on GitHub).
To enable Jupyter in your environment, open up the
apro.settings file in your $deploy directory (the location where you unzipped the Selerity Launcher code) and set
# Enable Jupyter Lab? JUPYTERLAB=true
Stop your SAS Analytics Pro container if it is currently running with the following command:
docker stop sas-analytics-pro
Now start your environment back up by running the
launchapro script again. When you launch your environment with
JUPYTERLAB=true the following things happen behind the scenes (all transparent to the user):
pythondirectory in the repository)
/pythonand configured to use SAS in your SAS Analytics Pro container
Depending on the speed of your internet connection it could take up to 15 minutes for all this to happen, but as long as you don’t delete the
python directory all subsequent startups should be just as quick as before. This is what the startup process looks like with JupyterLab enabled:
############################################# # SAS Analytics Pro Personal Launcher # #-------------------------------------------# # S = SAS Studio has started # # J = Jupyter Lab has started # ############################################# .....S..J Password=aS7yiXxtA0 To stop your SAS Analytics Pro instance, use "docker stop sas-analytics-pro"
Open your browser to http://localhost:8888 and enter your generated password. You will then be presented with the JupyterLab main interface:
You can click on the SAS icons in the Launcher to create a new Notebook using the SAS Kernel, and then start writing your SAS code. Click the play button to submit your code:
If you would prefer to just login-and-start-using Jupyter with SAS Analytics Pro, our Selerity Analytics Desktop offering provides SAS Analytics Pro as-a-service (including Jupyter), which can also be integrated into your existing IT infrastructure if required. This allows you to leverage you existing security, login credentials and code assets without needing to maintain your own SAS infrastructure. Contact us if you would like to learn more!
SAS hosting services can help you maximise ROI on your SAS platform. Managed hosting services are some of the most effective ways to ensure that your SAS platform provides the insights you need on a daily basis.
SAS platforms are deployed on cloud infrastructure. This means it is highly recommended that you invest in hosting services to optimise SAS platform operations. In this blog post, I explain what you get with SAS hosting services and how these features translate into business benefits.
Analytics hosting services come with key features to ensure that your platform is securely deployed across your environment. With a hosting service, your IT team can work with data infrastructure to deliver greater performance.
Along with the right infrastructure, your team can work alongside a team of experts that can help you optimise the SAS infrastructure. That way, if your internal IT team is new to SAS, they can work with SAS experts to familiarise themselves with the environment.
Hosting services also provide a line of communication between you and the SAS team, which can help when you need to resolve a disaster quickly.
You might be wondering, what do high-performance infrastructure and access to a team of SAS experts do for your ROI? It generates the following five key benefits.
Analytics hosting services can reduce operational costs by a significant margin. Hosting services can reduce reliance on internal resources because it puts less pressure on your staff and computing capabilities. There will also be energy savings because there is no need to install any software or hardware. Finally, SAS hosting can reduce internal cross-charging of IT costs.
SAS hosting services are deployed across an entire cloud infrastructure. There is a chance that the software can suffer from outage issues, leading to several problems in operating costs and productivity.
Hosting services can, however, reduce outage incidents. This is because hosting services on the platform are overseen by SAS experts. If there is a problem, they can resolve it quickly. This quick resolution reduces downtime, bringing several benefits in the form of lower operating costs and high-availability SLAs.
SAS hosting can improve performance. Hosting services come with several options, like a tailored alert system, dedicated infrastructure, and optimal hardware sizing. Our SAS experts can configure the optimal hardware for SAS hosted applications. For industries like banking and healthcare, this can be critical because they generate a significant volume of data, which must be assessed in real-time.
SAS hosting services allow you to save a significant amount of money, in terms of capital and staff costs. When you invest in hosting services, you can save a significant amount of money by not having to invest in upfront hardware purchases. Analytics hosting services allow users to stabilise technical costs right away.
There is also the issue of staffing costs. When installing new software, your internal team will be strapped for time. By investing in hosting services, you will also be able to reduce the strain put on your internal team and reduce staffing costs.
Hosting services can enjoy faster delivery and deployment, accelerating ROI. SAS experts can extend hardware and software capacity to other business areas and departments. Furthermore, SAS hosting gives you access to the latest software releases and can help modernise the analytics platform for the future, improving ROI in the long run.
SAS platforms can improve productivity and efficiency when it comes to data analysis. However, there is no denying that hosting services can be a huge boon. SAS hosting can help optimise the overall setup to minimise operating costs and maximise ROI in the long run.
Hosting services can stabilise the platform in the long run and prevent any potential fallout that could occur when you are trying to adopt new software into your network. If you are investing in a new platform, consider hosting services to mitigate operating costs.
Visit Selerity to know more about SAS hosting services.
Bayesian Encoding is a type of encoding that takes into account intra-category variation and the target mean when encoding categorical variables. It is a type of targeted encoding that comes with several advantages. For example, Bayesian Encoding requires minimal effort compared to other encoding methods.
In this blog post, we talk about the different Bayesian encoding techniques and how they work.
Target or Mean Encoding is one of the most commonly used encoding techniques in Kaggle competitions.
Target encoding is where each class value of the categorical variable is replaced by the mean value of the target variable, with respect to the categorical class in the training dataset.
Hence, we have to specify the target variable in the SAS Mean Encoding Macro, as shown in the code below.
Check out this link to know more information about categorical variable encoding.
%macro mean_encoding(dataset,var,target); proc sql; create table mean_table as select distinct(&var) as gr, round(mean(&target),00.1) As mean_encode from &dataset group by gr;
create table new as
select d.* , m.mean_encode
from &dataset as d
left join mean_table as m
on &var=m.gr;quit; %mend;
“Weight of Evidence (WoE) is a measure of the “strength” of a grouping technique that is used to separate good and bad. This method was developed primarily to build a predictive model to evaluate the risk of loan default in the credit and financial industry.
WoE will be 0 if the P(Goods) / P(Bads) = 1. That is, if the outcome is random for that group. If P(Bads) > P(Goods), the odds ratio will be < 1, and the WoE will be < 0. If, on the other hand, P(Goods) > P(Bads) in a group, then WoE > 0.
WoE is well suited for Logistic Regression because the logit transformation is simply the log of the odds, i.e. in(P(Goods)/P(Bads)). Therefore, by using WoE-coded predictors in Logistic Regression, the predictors are all prepared and coded to the same scale. The parameters in the linear logistic regression equation can be directly compared.
%macro woe_encoding(dataset,var,target); proc sql noprint; create table stats as select distinct(&var) as gr, round(mean(&target),00.1) as mean_encode from &dataset group by gr; quit; data stats; set stats; bad_prob=1-mean_encode; if bad_prob=0 then bad_prob=0.0001; me_by_bp=mean_encode/bad_prob; woe_encode=log(me_by_bp); run; proc sql noprint; create table new as select d.* , s.woe_encode from &dataset as d left join stats as s on &var=s.gr; quit; %mend;
“Probability Ratio Encoding” is similar to Weight Of Evidence, the only difference is the ratio of good and bad probability being used. For each label, we calculate the mean of target=1, that is, the probability of being 1 ( P(1) ), and also the probability of the target=0 ( P(0) ). Then, we calculate the ratio P(1)/P(0) and replace the labels by that ratio.
We need to add a minimal value with P(0) to avoid any divide by zero scenarios where for any particular category, there is no target=0. Check out this link for more information.
%macro probability_encoding(dataset,var,target); proc sql noprint; create table stats as select distinct(&var) as gr, round(mean(&target),00.1) as mean_encode from &dataset group by gr; quit; data stats; set stats; bad_prob=1-mean_encode; if bad_prob=0 then bad_prob=0.0001; prob_encode=mean_encode/bad_prob; run; proc sql noprint; create table new as select d.* , s.prob_encode from &dataset as d left join stats as s on &var=s.gr; quit; %mend;
Categorical Feature Encoding is an important part of cleaning up data for machine learning models. However, each method works in different circumstances so it is important to know about different techniques that fall under the Bayesian category.
If you want to take a look at how the coding operates in a SAS environment, you can find all the SAS Macro Definition code on my GitHub page here.
SAS Macro examples for the Blog Post “Categorical Feature Encoding in SAS (Bayesian Encoders)”
SAS analytics is a command-driven statistical software used to collect and analyse data. At this point, we have an idea of what SAS can do. It draws up visual depictions of large data groups for analysis. Furthermore, the analytics platform can access raw files from an external database, manage and analyse data to generate useful insights. However, we have yet to stop and consider how the various SAS platforms work the way they do.
Given the critical role of SAS analytics platforms, it is important that the architecture can meet the demands of the task at hand. With that in mind, we are going to take a deep dive into how SAS analytics works.
Every single platform has its own architectural design because of differences in function and performance. However, there are some fundamentals that remain the same across the entire suite of analytics platforms.
Before diving into the ins and outs of SAS architecture, it’s worth taking a look at some of the key features of any SAS platform. Besides accessing raw data, SAS platforms manage data, using tools for entry, editing, retrieval, conversion, and formatting.
Beyond editing data, SAS analytics analyses data using different techniques, like forecasting, multivariate, descriptive, and statistical analysis. Some SAS platforms even offer advanced analytics to help improve business practices. Finally, there is the ability to create reports using detailed graphs.
Given several roles in data collection and analysis, the architecture must be designed in a way to meet demand.
SAS analytics architecture can access a large volume of data efficiently, while at the same time, providing real-time information to users. To meet this demand, the platform follows a three-tier architecture. It consists of a client tier, middle tier, and back tier.
The platform works in this way because the system can distribute functions and work equally, based on the resources that are suitable for the job.
The client tier is the first stage where the application is installed on the machine. The tier consists of a web browser and other components necessary to view the SAS platform and its contents, along with making the SAS application firewall friendly. This is because of the way the portal has been set up. Users can interact with SAS applications through a web browser. In some cases, they can even interact with the content using Microsoft Excel and Adobe Acrobat Reader.
The middle tier offers a central access point because it contains all enterprise data. Since the tier contains all valid enterprise information, processing components regulate operations in this tier.
This means there are centralised points of access, which generate several benefits for SAS consultants. Some of these benefits include the ability to administer portals, manage code changes, and even enforce security rules. This tier contains several hosts that are pivotal for its function, like SAS Information Delivery Portal Web Application, web servers, and the Selvet Engine.
Furthermore, since the middle tier is divided into different components, it allows for the separation of display logic from business logic.
Finally, the back tier is where the system runs data and processing servers. The back tier is an enterprise directory server designed to maintain metadata about content located throughout the organisation’s infrastructure.
Due to its functions, the back tier contains two servers—the IOM server and the Enterprise Directory Server. The Directory Server stores metadata about the data or content (metadata is the information that describes the content). The metadata contains information on the content and where it is stored. Furthermore, the back tier can run on machines like web servers, meaning it does not translate into additional hardware platforms.
SAS platforms are powerful pieces of architecture designed to optimise the collection and analysis of data. While each platform is different, designed to meet the demands of each platform, all SAS products follow the same basic architecture divided into three different tiers.
SAS analytics platforms are constantly evolving, as it adapts to new technologies. It is safe to say, however, that we are not going to see a significant deviation in its fundamental architecture in the near future.
Visit Selerity to know more about SAS analytics platforms and how they operate.
Regulations are like traffic lights. They are there for our protection but they can, nonetheless, slow down our momentum. If you work in the banking industry especially, you may have a better understanding of how regulation weighs heavily on financial organisations.
From edicts on corporate governance to CCAR, there are plenty of rules banks need to follow.
This presents a challenge in the form of rising costs. In fact, research shows that banks spend over $270 billion a year on compliance, which is equal to 10 per cent of their total operating costs. Moreover, the cost of regulatory compliance is set to double by 2022.
Banks need a solution that can make regulatory compliance a more cost-efficient part of their work. This is where SAS banking analytics can help finance organisations. In this post, we look at why financial analytics platforms are the solution banks are looking for.
Regulatory compliance causes data to balloon in volume.
When regulators pass a new law, it generates a new wave of corporate data. A wave that upends current data governance, data collection, and reporting mechanisms.
When this happens, banks are in a lurch. They have to implement new procedures, policies, and teams to ensure they are complying with the law.
There is also the issue of data management.
New regulations expand a bank’s data lake. Sounds good, right? Well, not if you have the wrong tools in place. Without the right tools, banks can’t keep up with the volume of new data or make sense of it.
Banking data is voluminous and complex because of its different sources. Data sources include transactional data, operational data, reference data, and security data. There is also the fact that each team manages its own branch of data.
Analysts must work with these different types of data to meet compliance regulations, which is a slow, painful procedure. Traditional analytics platforms need several data analysis cycles to complete operations, prolonging analysis and driving up costs.
What banks really need is an analytics platform that can help them adapt to new regulations quickly and reduce compliance costs. The ideal data analytics solution will reduce compliance costs and improve core operations.
Sounds complicated, right? Well, not if you have the right data analytics solution, which is why SAS banking analytics is an invaluable investment.
SAS banking analytics can help you resolve several compliance-related issues. It can meet the needs of banks and other large corporations, making it better equipped to handle the large volume of data stored in their databases.
SAS analytics uses technology like AI, machine learning, and cloud computing to help you optimise certain data collection and analysis processes to make compliance more efficient.
Besides optimising data collection procedures, banking analytics from SAS can optimise reporting procedures. You can create an infrastructure that merges data modelling, measuring, and reporting to better manage risk and regulatory management.
SAS analytics platforms support compliance for most regulatory risks, including regulatory capital, and liquidity risk. They can reduce the length of analytics cycles, improving operational efficiency. By speeding up processing time, we can also reduce the cost of compliance.
Additionally, data management becomes more efficient because it’s much easier for research teams to store data and derive useful information from it.
Along with improving regulation, banks can also improve governance with analytics. SAS analytics provides a risk profile that covers the entire network of the organisation. This ensures a level of transparency, which is difficult to manage using other means.
Better transparency makes it easy to meet regulatory compliance demands and manage internal risk, which can avert potential disasters.
As the banking industry faces tighter regulations, data analytics platforms are the key to helping the industry navigate the complex regulatory environment.
That, however, just scratches the surface of what SAS analytics platforms can do.
SAS solutions can also resolve other problems the banking industry faces, like fraud. Moreover, banking analytics can help banks improve customer service by turning it into a more personalised experience. If used properly, SAS banking analytics can resolve many of the issues the banking industry faces, especially the burgeoning cost of regulation.
To learn more about SAS analytics and what it can do for different industries, visit Selerity.
In a previous blog post, we talked about feature scaling techniques and implemented this in Base SAS using SAS Macros. However, in this blog post, we are going to focus on feature scaling and transformation techniques using SAS Macros. Here I will explain what they are and the scenarios they are best applied to.
Absolute Maximum Scaler (Abs_MaxScaler) is a feature scaling technique where data points are divided by the maximum data value. It will then rescale the value between -1 and 1.
While Abs_MaxScaler has its advantages, there are some drawbacks. The biggest disadvantage of Abs_MaxScaler is that the data values are affected by outlier values. The mathematical formula is here:
Abs_MaxScaler takes the variable that you want to scale and creates a new variable “AMVariableName” with scaled values. It also creates a univariate report where you can see the histograms of the actual variable and the new scaled variable.
I will be discussing feature transformation techniques that follow normal distribution (also known as Gaussian distribution). Along with the discussion, I will also use some advanced SAS Macro programming to implement feature transformation in base SAS.
What is Feature Transformation?
It is the process of transforming data from one representation to another, with the help of statistical and mathematical functions, while also retaining the information from the data. There are many different transformations that convert data distribution into normal distribution, but I will mostly use the five fundamental transformations: Log, Reciprocal, Square-Root, Exponential, and Box-Cox.
Feature Transformation is important because it makes it easier for machine learning and statistical models to understand your data and make accurate predictions. Furthermore, users require less learning and training time to get the required results; however, this does not apply for every ML, DL, and statistical algorithm.
There are statistical algorithms and models that assume data is normally distributed. For example, during regression analysis, feature transformation must follow normal distribution, otherwise it will deliver the wrong results.
1. Log Transformation: This transformation is the best solution when data is skewed or has outliers that impact distribution. It will convert data using a log function.
2. Reciprocal Transformation: This transformation is not very effective when we compare it with the others because it has little effect on the shape of the distribution. It converts the data to the inverse of its value. For example, 3 will be transformed into 1/3. It is only used for non-zero data values.
3. Exponential Transformation: It converts data with the exp () function given in most programming languages into “e to the power of x.”
4. Square-Root Transformation: As its name suggests, it transforms the data value into the square-root of its value, which means we can apply it on data with zero value. It should be noted that square-root transformation is less effective than log transformation. This is because it has less effect on distribution compared to log transformation.
5. Box-Cox Transformation: For a Box-Cox Transformation, the data value must be positive. It works well on data with an even nature and is the most commonly used transformation in the statistics field. In the transformation formula, if lambda is zero, then log takes place and the value of lambda varies between -5 to 5.
You can call this macro with %BoxCox (dataset, variable).
dataset = your dataset name with library for instance sashelp.cars.
variable = name of the variable
%BoxCox will only take the variable if the data values are positive, otherwise, it will give an error message – “Your data has negative values, hence you cannot apply BoxCox transformation.” If data is non-negative, it will create a new temporary dataset with “Acul_Name of_your_Variable” and transformed variable “TAcul_Name of_your_Variable”. It will also generate a univariate report of transformed variables where you can check the normal distribution.
You can call this macro with %Transform (dataset, variable, type).
dataset = your dataset name with library e.g. sashelp.cars
variable = name of the variable
type = you can select only one type of transformation (Log, Square-Root, Reciprocal, and Exponential)
The %Transform macro will take a variable and convert it into a transformation based on your selection in the “type” argument. Then, it generates a univariate report explaining the distribution of the transformed variable.
/* SAS Macro Definition*/
Code is available on GitHub here: https://github.com/Suraj-617/Blogs/blob/master/Techniques%20of%20Feature%20Scaling%20with%20SAS%20Custom%20Macro-B.sas
The pandemic has accelerated the rate of technological adoption. Where businesses were slowly making their way towards tax data analytics and other cloud-based systems, we are now seeing more organisations rapidly incorporate business intelligence and analytics into their infrastructure to adjust to the new status quo.
However, with new technology comes new opportunities. Corporate tax was seen as a massive burden on organisations, not in terms of expenses, but time and effort required to comply with the law. But analytics systems are changing the way businesses are doing tax administration and compliance.
Tax data analytics gathers data from different sources to answer questions about complex issues. This information comes from different sources like presentations, reports and returns filings. This level of insight provides the accountants or members of the tax department with a deeper understanding of an organisation’s tax status, something they did not have before, and it is opening up a host of new opportunities.
Tax data analytics can perform several operations that make tax data easier to understand. Some analytics platforms can present data findings in a visual format. The visualisation of data is useful because it makes it so much easier to assess data findings. Analysts can reach conclusions faster, but even better, visual findings can be used to explore the connection between different variables in more detail, something that was not seen with other tax compliance technologies. This means tax data analysts can explore different scenarios. For example, analysts can change assumptions on a variable to discover how the changes affect different scenarios. Furthermore, the visualisation of data makes it easier to find gaps in the information, leading to a more comprehensive analysis.
One factor that definitely hurt the efficiency of tax data is the manner data was stored. It is quite difficult to conduct a thorough analysis when valuable data is stored in different formats. Tax data analytics have helped organisations transform their functions for the better. The additional data has transformed tax regulation into a more insights-driven function. Soon, it becomes a question of “What do I need to know” rather than “What do I need to do”, which changes the way tax functions will be handled in th future.
Tax data analytics opens up several new opportunities to organisations. For example, tax analysts can now understand the key areas that drive taxation, something which may have been hard to do with facts and figures hidden in different sources. Furthermore, the additional level of analysis reveals deeper, more insightful trends that help predict earnings, sales and tax impact.
The organisation will be better placed to predict future trends (especially with the use of predictive analytics), making it much easier to anticipate certain functions like the buying and selling of assets. It is even possible to preview tax items and identify potential errors. With tax data analytics, it becomes much easier to expand the range of data sources to include unstructured data and integrate them meaningfully into the analysis, something that would have been impossible without tax data analytics.
The technical functions of tax analytics are changing the way organisations perceive tax obligation, paving the way for a new era of analysis and planning when it comes to compliance. For example, businesses can compare taxes paid against different variables like book income over a specified time, allowing for a deeper level of analysis that was not possible before. In that regard, tax compliance is no longer an obligation to be met but an opportunity to identify growing trends within the business. With the prospect of growing opportunities, perceptions around tax compliance will change over time.
Tax data analytics platforms are changing the way organisations are seeing tax administration and compliance. However, to be used effectively, it must be integrated at all stages of data processing to be truly effective. To execute such a task, organisations need to work with an analytics specialist that can incorporate the technology correctly into the network infrastructure.
Selerity is poised to help organisations install tax analytics into their systems. Our knowledge of analytics platforms can help organisations install, administer and host analytics software properly to ensure that tax data analytics platforms are properly incorporated into your data infrastructure.
Is there anything data analytics cannot do?
This question occurred to me when I was having lunch with an old friend of mine. He was a high ranking executive at a large company behind some of the biggest consumer brands in the country. He was a very busy man (as was I), so whenever there was a chance to catch up, it was an opportunity that could not be missed.
After we talked about our families, politics and cricket, the discussion inevitably turned to our businesses. He told me about some of the challenges he was facing with one of the brands his company-owned.
“We are looking to expand our e-commerce branch,” he explained to me. “But right now, we are simply not where we want to be. It’s hard to find a path forward because we don’t have enough information on our current processes,” “You don’t have any data?” I asked. “No, we do, but what we need is something that can take that data and convert it into useful information that can inform our strategies,”
In other words, he needed digital analytics.
Digital analytics is the all-encompassing term for different analytics instruments, like web analytics, social analytics and business intelligence. Analytics will collect data from different endpoints and convert them into insights that businesses can use as feedback when making decisions. With these insights, it’s much easier to complete business objectives, like optimising the buying process.
With more and more businesses going online, marketing has become more data-driven with consumer engagement, demand and brand interest becoming data-driven operations.
Analytics tools have been used to analyse qualitative and quantitive data to provide an organisation insight into how customers are responding to their marketing strategies. The information can then be used to refine the messaging and improve the overall experience.
The different categories of analytics can collect, track and analyse data from the different funnels in digital marketing. Analytics is immeasurably useful for organisations because collecting and analysing data across different data points can be time-consuming.
Digital analytics can help organisations measure the different datapoints of their marketing funnel to reveal a lot of useful information. Whether it is by measuring the number of visitors, product demand or frequently visited pages, there are plenty of variables to consider.
If you worked in digital marketing, you might have a passing familiarity with certain terms, like bounce rate and page views. If you don’t, that’s fine, just know that there are several KPIs to measure how customers interact with a brand online. Digital analytics takes all these KPIs and analyses them to reveal what gains have been made with current strategies.
With this insight, digital analytics helps organisations optimise the buying process to determine how consumers are interacting with their e-commerce sector. It allows marketers to make sense of the big picture.
Digital analytics allows organisations to better understand the current scenario and also predict how customers will respond in the future. Data-driven predictions can be used to anticipate future trends – this feature allows organisations to find answers to key objectives, like improving loyalty and engagement.
It’s easy for organisations to get caught up in the details and miss the larger trends. While details are very important, digital analytics helps string all the details together to answer the bigger questions on their marketing strategy.
Without analytics, organisations will have a hard time optimising their buying process because it would be difficult to understand how customers interact with the website or even on social media channels. When there is no proper insight, it would be difficult to devise new strategies to improve the buying process.
By using digital analytics, organisations can see what’s hurting the customer experience, what’s aiding it and make data-backed decisions that will optimise the buying process.
Digital analytics is an incredibly useful tool, one that every business should consider investing in. With data analytics, organisations will have an easier time creating smarter strategies that are sure to optimise the buyer’s journey to make the experience as smooth as possible. SAS offers several digital analytics products, like SAS Customer Intelligence and SAS Digital Marketing Analytics, to help organisations optimise the buying process and refine their marketing campaigns.