Cameron (Selerity)
Author Archives: Cameron (Selerity)

SASPy Configuration with SAS Analytics Pro Cloud Native

woman programming on a notebook

SAS Analytics Pro Cloud Native brings the ability to run SAS within a containerized environment which brings exciting possibilities for CI/CD and integrating SAS into other applications. 

SASPy has become a popular choice for Data Scientists and integration developers to bring the power of SAS procedures and data step to Python software development chains.  This post seeks to outline the steps required to configure SAS Analytics Pro cloud native to accept SSH connections which are required by SASPy and augment the current documentation for using SASpy with SAS Analytics Pro

It is also important to note that the following steps can also be used to natively call SAS in STDIO mode from your host machine to the container to perform tasks.

Some Notes on Terminology

For people without a lot of experience in using Docker, SSH, Python or networking, the terminology in web articles can be a bit confusing and overwhelming.  The below table outlines the meanings of terms used in this article.  For further information as well on how SAS Analytics Pro works in Docker, please see our previous article which outlines Docker concepts and another article on the differences of SAS Analytics Pro and SAS9.

TermMeaning
AproRefers to SAS Analytics Pro Cloud Native which is a product offering from SAS for running a SAS Programming environment within Docker.
DockerDocker is a technology company that provides a runtime and development tools for interacting with Containers and Images.
ImageAn image is a set of compressed software libraries and binaries that can be executed as a container inside the Docker runtime environment. Images operate against a common OS kernel. SAS provide an Apro image which can be run as a container.
Containers Containers are an instance of a Docker image. Containers contain additional configuration information such as network settings, volume and port mappings.
HostYour local machine where you are starting the Apro container from.  This may be your laptop, PC, or a server.
SSHThis is a communication method for connecting from one machine to another in a network.  In this instance we are performing SSH connections between your host and the Apro container.
Key pairThese are a set of cryptographically generated keys used to identify and authenticate you when using passwordless connections over SSH.

About SASPY

SASPY is a Python package developed by SAS and open-source contributors.  It provides an interface to the SAS language including the submission of SAS code, procedures and data interaction.  It provides a number of connection methods depending on the type of SAS platform you want to connect to.  This includes:

  • IOM based connections for SAS9 / Metadata server platforms.
  • HTTP/S for Viya
  • STDIO over SSH for Linux based servers
  • STDIO for local connections on Linux where SAS is installed on the machine you are working from.

We will be using STDIO over SSH in this scenario.  The STDIO over SSH method enforces passwordless SSH connections so we will need to set this up.

Configuring Passwordless SSH

The first thing you need to set up passwordless SSH is a public and private key pair.  To generate, you need some software on your host.  On Linux based operating systems OpenSSH is already installed.  On Windows you may need to install it or if you use Git, the Git bash client has it installed already.

To check if you have an existing key pair, first check your %USERPROFILE%\.ssh directory on Windows or ~/.ssh directory on OSX/Linux.

cd ~
ls -al .ssh/

Or in Powershell.

cd %USERPROFILE%
ls .ssh/

If you see a group of files in the listing starting with id_xxxx and one with an extension of .pub and the other without, you already have a public / private key pair.  For example if you have an rsa encrypted key pair you would see two files:

id_rsa
id_rsa.pub

If you have existing keys, ideally they are configured without passphrases.  Passphrases are great for interactive usage as they add an additional layer of security but they hinder things when using keys in automation scripts.  For SASPY, keys without passphrases work best.

Creating a Key Pair

If you don’t want to use an existing key pair or do not have a set you can generate them using the following commands. 

ssh-keygen -t rsa -b 4096 -C "sasdemo"

Let’s break this down:

  • The command ssh-keygen creates the public and private key pair.
  • The -t rsa is telling ssh-keygen what type of key to generate. In this case it is the RSA encryption algorithm.
  • The -b 4096 is telling ssh-keygen the bitness to use in the algorithm
  • The -C “sasdemo” is a comment to help identify what the key is for.  It is appended to the key.

After hitting enter you will be prompted for a few values.  You just need to press enter for each one without adding anything different.  The exception is If you already have a key named id_rsa and you don’t want to overwrite it.  Specify a new name in the same file path it chooses (will default to $home/.ssh/<name>.  The below illustration shows this.  I have named my keypair as id_rsa_apro

Generating a Key Pair

Take note of the name of the key you generated as we will need this later.  Next we need to configure the Apro container to allow SSH connections.

Configuring SSH in APRO

Containers by default are generally built to be as lightweight as possible and as such, generally do not have all the same libraries and packages as a full operating system.  In fact, it is one of the 12 principals of docker image development

The Apro container will allow SSH without much configuration.  The SAS instructions for this are fairly clear and are transcribed below.

  • In your /sasinside directory, create a folder called sasosconfig and in that new folder place an empty file called sshd.conf
  • In your container startup definition you need to add some system capabilities with –CAP_ADD arguments.  These capabilities are a linux concept.  To read more about their specifics see this guide. The capabilities we are adding are:
    • --cap-add AUDIT_WRITE
    • --cap-add SYS_ADMIN
  • We also need to expose a port for ssh communication.  We will use the same port used by SAS in their example.  Add the following to your invocation command.  This is telling docker to expose port 22 in the container and forward that through to port 8222 on your host.
    • --publish 8222:22

Once you have done the following restart your container for the changes to take effect.  If you have followed the steps correctly, you should see your container running in your docker client.

Now we have to create a new directory and set some permissions in the apro container to let us copy your generated key from earlier.  From a command line:

  • Create the .ssh folder under the path you specified for the /data directory.
  • Run docker exec -u sasdemo sas-analytics-pro chmod -R 755 /data/.ssh to set the permission level for the folder.  SSH expects your ssh folder to have restrictive permissions. 755 is the most permissive allowed.

Next we want to copy your public key you created earlier into the newly created .ssh folder.

To do this, we can use the docker cp command. SSH looks for a file called authorized_keys which contains a list of public keys that the server will accept connections from:

docker cp %USERPROFILE%\.ssh\id_rsa_apro.pub sas-analytics-pro:/data/.ssh/authorized_keys

Now if all has gone successfully, we can now test our connection!

Testing the SSH connection

If you are on Windows, the docker container IP address may not be usable from outside of the container. On windows we simply need to use localhost or 127.0.0.1 for our server address.

Secondly, SSH by default enforces Strict Host Key Checking. You will receive an error in your attempted connection when the IP address of your Docker container changes which is whenever you restart it. To get around this, you can do one of the following techniques:

Under your .ssh folder you created earlier, add an additional file called config and add the following:

Host *
StrictHostKeyChecking no

This is quite permissive. It is telling SSH to ignore host key checking from every host that attempts to connect. To be more stringent and just limit to your SASPy connection you can place connection arguments in your SASPy sascfg_personal.py file which we cover later in this article.

Test via Command Line

To test we have SSH configured let’s see if we can get an interactive terminal to SAS working.

ssh -t -vvv -p 8222 -i %USERPROFILE%\.ssh\id_rsa_apro sasdemo@127.0.0.1 /opt/sas/viya/home/SASFoundation/sas -fullstimer -nodms -stdio -terminal -nosyntaxcheck -pagesize MAX

While this is a long command let’s break it down to see what is happening:

  • We are using the ssh program and forcing a pseudo terminal with the -t command.  This is useful when using interactive command line programs.
  • The -v flag is giving us verbose logging.  It’s always a good idea to use -v when testing so you can see additional information about what’s going on. Less v’s give less verbose logging detail.
  • The -p 8222 argument is telling ssh to connect on port 8222.  This is the port you specified in your Apro setup for ssh.  Replace the number with the one you used.
  • The -i argument is telling ssh to use following identity file.  This is only needed when your key pair does not use standard names.  This file is the private key part of the key pair.  You don’t need this if you accepted the defaults in the earlier step of generating the key pair.
  • Next we have the server address we are communicating to.  This is in the form of <username>@<server address>.  If you have started Apro with a different username, then replace sasdemo with the login name you use for SAS Studio.  For the second part we specify either localhost or 127.0.0.1.  The SAS documentation for this part is incorrect.  If you are on windows, the docker container IP address will not be usable from outside of the container.  You will need to use either localhost or 127.0.0.1.
  • The next part of the command is invocating SAS in stdio mode.  This is the command that SASPY generates when starting a connection.  If we successfully connect to SAS then we are almost assured that SASPY will also connect.

If all is successful, you will get an interactive window to SAS.

SAS STDIO

To exit, hit enter and type endsas;;;;  Alternatively, stop the tutorial here and join the ranks of SAS demi gods by using SAS in the original form!

SASPY Configuration

Once we have confirmed that SSH is working correctly to our container next we need to update our saspy configuration.  This article won’t go into the details of installing saspy as there are plenty of articles and detailed help in the saspy user documentation at saspy.readthedocs.io

Open your sascfg_personal.py file and add the following:

  • Add a new entry in the SAS_config_names variable list to label the new connection.  Here I will use ‘apro’
  • Next create a new dictionary variable with the same name you added to SAS_config_names.
apro = {
'saspath': '/opt/sas/viya/home/SASFoundation/sas_u8',
'ssh': 'ssh',
'host': 'localhost',
'port': 8222,
'luser': 'sasdemo',
'localhost': 'host.docker.internal',
'encoding': 'utf_8',
'dasho': 'StrictHostKeyChecking=no'
}

Replace the luser value with your SAS Studio login userid, the port you opened in your docker config and any additional SAS options in the options key (not shown).

This configuration is slightly different to a standard SSH STDIO connection with the addition of the localhost and dasho parameters. These are optional and depend largely on whether or not you are running SAS Analytics Pro via Docker Desktop on Windows or via the Docker runtime.

The upload and download methods in SASPy use the socket filename engine in SAS to transfer files between client and server. Docker adds an entry to the windows hosts file which directs your IPv4 address to host.docker.internal. This is then used by Docker to communicate to the external host when required. For a longer discussion on this please see the following Github issue between myself and SAS on the topic.
As mentioned previously, you may also receive errors from SSH when your container IP address changes as a result of restarting your container. The dasho parameter is passed to the ssh command to disable this host key checking for SASPy.

Once complete you can test your SASPY connection.

try:    
import saspy
except ImportError:
raise ImportError('saspy was not found in the specified path')
sas = saspy.SASsession(cfgname='apro')
res = sas.submit(code='%put NOTE: Success;',printto=True)
print(res.get('LOG'))

If you have your sascfg_personal.py file somewhere other than the default paths, add cfgfile=’/path/to/your/file.py’ to the SASsession method and cfgname='name' if not using default.

Check that you see NOTE: Success in the returned log and if so, you now have SASPY connected to SAS Analytics Pro!

Conclusion

Hopefully this tutorial has been a help to you and assists you in setting up SASPy with Analytics Pro.

We’ve also created the Selerity Desktop (Personal) tool to help make deployments easier if you are uncomfortable with the above concepts.  The Selerity Desktop configures SASPy connectivity for you. The personal edition allows you to deploy container environments with a series of additional options such as Python, Clinical Standards Toolkit and SAS OQ testing without needing to know any of the technical details. If you are interested in further information please see our product page for further information or reach out to us to discuss more complex deployments or licensing requirements.

Also stay tuned for future posts where we delve deeper into use cases that SAS Analytics Pro Cloud Native can support and other, more advanced, deployment options.

Automated testing for SAS data analytics software – Why it’s a win

SAS data analytics software

Automated testing is a huge improvement for SAS data analytics software. SAS offers several benefits to organisations, but if there is one feature it lacks, it is automated testing. SAS algorithms are complex, and those who do not have a background in SAS have a hard time working with SAS code or testing its framework. Automated testing could be the solution the software needs. So, in this blog post, I will be discussing how test automation is a huge win for SAS software.

Benefits of test automation

Improves the speed of delivery

Testing SAS data analytics software is a challenging prospect for designers, coders and testers without SAS training. The challenge causes two problems: The teams cannot integrate the process and this then results in SAS bespoke test scripts, which leads to delays in project development and deployment. Automated testing solves this problem by streamlining the process. With this method of testing, coders don’t have to go through every line of SAS code or rely on developers to write tests. By streamlining the testing process with automation, it significantly improves the rate of software delivery and improves communication between teams.

Software is more reliable

Before the days of test automation, SAS developers were forced to take several shortcuts to accommodate testing frameworks. For example, instead of using these frameworks in the development process, they would re-write the same logic repeatedly on different projects even to test new features. Furthermore, not working with a testing framework allows many bugs to accumulate, which is a nightmare for software development. Automated testing addresses some of these issues, by removing the barrier that prevents testers and programmers from testing their bespoke SAS software. With test automation, programmers can create more stable software.

Higher test coverage

As mentioned before, SAS code can be quiet complex, especially for testers and programmers without a SAS background. Even for those who are familiar with SAS, testing can be a long, arduous process, sometimes with a large team as well. By automating the testing process, SAS analytics software can be tested more frequently, without a large team of testers and programmers. Higher frequency of testing means more stable software and higher quality features.

Complements agile methodology

Unless you have been living under a rock, you would have heard that software development has been shifting from waterfall to agile methodology, where the focus is on professionals with different specialities coming together to deliver software as quickly as possible through subsequent reiterations of the software. Automated testing is the perfect complement for the agile methodology because it allows programmers to test more efficiently and with more frequency, improving the rate of delivery. The improvement in delivery makes test automation an integral part of agile software programming.

Lower business expenses

Automated testing can lower the cost of developing SAS analytics software. Analytics providers and clients have to pay a higher upfront cost than before. However, the trade-off is a much better testing process where more is done at a faster rate and with a smaller team. While testers are still important, especially for more high-end tasks, test automation tools can do much of the testing without human input. This is exciting news for SAS software because it makes bespoke SAS software more accessible to other organisations, especially when combined with other technologies, like the cloud. Automated testing provides tremendous value to both client and analytics provider because testing is more cost-efficient.

Bringing test automation to SAS analytics software

Despite the obvious benefits of test automation, there are some challenges to integrating it into the development process. One reason is configuring test automation – because it is a huge upfront cost. Furthermore, quickly scaling test environments is incredibly challenging, especially when programmers and testers are working in the cloud. Scaling for different test environments proves to be a huge challenge. In some cases, automated testing can lack visibility, especially when different teams are using different strategies for automated testing.

An open testing framework can negate this problem, to some degree. Finally, too many UI tests can break the automated testing. Though these challenges can prove to be a barrier to entry for some organisations, it is only a matter of time before test automation becomes the norm in SAS data analytics software.

Why you should get excited about SAS and container platforms

A container platform is often deployed within the SAS platform because it enables streamlined deployments - learn about them here!

When SAS announced that it would extend its partnership with open-source developer Red Hat, it made me think about container platforms and their benefits to SAS software. It doesn’t come as a surprise to me that SAS wants to continue this relationship because of open source technologies and complementary software have done a fantastic job mitigating the weaknesses of SAS platforms. So, in this blog post, I am going to discuss what a container platform is and how it benefits SAS users.

What is a container platform?

Before explaining the benefits in SAS deployment, we need to explain what exactly a container platform is. Containers are stand-alone packaged applications containing the SAS software and its complementary software bundled together. Containers are often compared to shipping containers, just like containers, a shipping container is used to carry goods around the cargo ship and move it around quickly. Containers present a consistent interface and can easily shift software to different environments.

A container platform is often deployed with the SAS platform because it allows SAS programmers to streamline deployments and tie up several internal and multiple software packages. It usually takes months to deploy SAS software, but containers allow programmers to skip this step, making SAS more affordable for organisations that would have found the cost of deployment too high.

How does it benefit SAS applications?

As mentioned before, a container platform streamlines SAS deployment, but there are other benefits as well. For example, containers can make SAS platforms more accessible to different infrastructures, thanks to the dependency software that normally comes with the container. This means that SAS can be launched, even if the code was not developed in a test environment. Containers can even be moved between different servers, and even between the private and public cloud. Hence, SAS developers do not have to worry about the type of environment they are deploying in. This brings significant benefits by making upgrades easier, scalability becomes a valid option and allows developers to make the most of their resources.

Containers are designed to take full advantage of the power and benefits of the cloud, which includes scalability and the ability to quickly deploy new applications. With cloud-based analytics, organisations can deploy SAS without the need for expensive data servers. This bodes well for SAS because the software is designed to process large amounts of data. So organisations will have access to the power of SAS without investing in expensive physical servers.

For SAS programmers, a container platform saves a lot of time because they do not have to wait for the application as containers can start-up immediately, freeing up system resources for other containers. It works because containers are compact and lightweight by sharing operating system kernel allowing more containers to fit into a single host.

A container platform helps with rollback and makes for tighter tracking between changes. Put simply, a container platform helps with version control. It works because a container is a version of an application known as an image. The image is placed under source control in a private or publicly hosted container image repository. When combined, image and repository act as a version control mechanism for the SAS application.

Finally, containers can access the CPU and RAM of the infrastructure. This enables containers to do several things like requesting the resources they need at run time, enabling them to perform certain functions, like fast spin-up and spin-down of containers.

Container platforms for SAS

It should be noted that a container platform is not a perfect match for SAS platforms by default. There are several things to consider before implementing a container platform. For example, programmers need to pay attention to data Input/Output. Furthermore, factors like RAM, CPU and job execution all need to be taken into account when implementing SAS with container platforms. SAS platforms try to process as much data as possible, so hardware and software need to be well-optimised for this function.

Nonetheless, there is no denying that a container platform is one of the best assets for SAS platforms because it optimises several functions and makes the entire process easier. SAS platforms make it easier for programmers to deploy and much more accessible to a wider range of organisations. With all these benefits in mind, it is no wonder that SAS is renewing their partnership with Red Hat.

Ansible for SAS configuration management

SAS installation is an incredibly complex process, it's always best to get professional help in setting up your SAS configuration management.

SAS installation and environment configuration management can be an incredibly complex process that comprises multiple configuration options. These options have the ability to affect every aspect of an analytics environment and SAS configuration management, from performance to security. With the help of expert SAS installers and administrators, many organisations leveraging SAS software are often provided with the perfect environment – optimised and built to deliver an analytics platform they need.

What goes wrong? A lot of the operations and procedures that are conducted during the installation phase are documented externally – either on spreadsheets or separate documents. Therefore, as time goes by, administrators make changes to configuration files, updates are implemented, and the overall operating system experiences changes, these changes break from the initial configuration. What we see happening in this situation is the SAS environment drifting away from its original, perfect form. This is precisely where Ansible can come into play.

Before we dive into that, however, let’s first take a look at what encompasses a true SAS configuration. These insights will give you a clearer perspective as to why automated and streamlined SAS configuration management is so important.

 

Why is SAS configuration management so important within your SAS environment?

The important thing to keep in mind is that, if used correctly, your SAS environment is going to be constantly evolving. From regular updates to various internal administrative tweaking, it’s normal for any SAS environment and configuration to change over time. The problem with this is that in many instances, these changes occur in a decentralised manner, which results in a disconnect between an environment’s initial configuration versus where it is right now.

To understand why this is so critical, you need to look at the three deployment phases that a SAS environment goes through – prerequisite determination, installation and deployment, and ultimately the configuration phase.

From making sure you have the right users created with the right permissions on groups, along with the right disk space at the prerequisite phase to actually completing the required manual tasks, installing SAS, and running the SAS Hotfix tool at the deployment stage to finally configuring your environment, setting up your environment is no small task.

Throughout these very complex stages, vital configurations that are critical to the sustainability and existence of a SAS environment are set up. Therefore, to answer our initial question as to why SAS configuration management is so important, poor configuration management can result in the fragmentation of this environment – resulting in inefficiencies and misconfigurations in the long-term. SAS admins need to be on top of this!

 

What is Ansible?

In short and within the context of a SAS environment, our services manager, Cameron Lawson, says it best – “Ansible is an extendable tool, written in Python, that is a scriptable way of managing configurations across multiple hosts – you can run it directly from your laptop, via dedicated hosts, on-premise, multi-clouds (Azure, AWS), or hybrids”.

From SAS installation to day-to-day management, Ansible is something your environment can benefit from significantly. Additionally, while there are many alternatives to Ansible, there are noticeable differences that set it apart.

 

Where does Ansible come in and how does it help?

The main concern about configuration management in a given SAS environment is that, over time, configurations are altered from the original configuration. Therefore if left managed improperly, a SAS configuration can drift away from what it once was. By running Ansible, users and administrators can ensure they maintain the baseline of their original configuration while making changes as required.

Here are three specific ways that Ansible is a major asset for SAS configuration management:

 

Any changes you need to make can be done via Ansible – Unlike initiating changes to your configuration via a random, on-the-go method, Ansible allows SAS admins to do this centrally. The changes made through Ansible are organised and permeated throughout the system – ensuring your configuration remains consistent with its original form.    

Ansible is agent-free – There are many alternatives to Ansible, like Chef. However, one key distinction between Ansible and many of its counterparts is that you wouldn’t have to install an agent on the host that you’re managing to keep the configuration running as required. Ansible uses SSH to communicate with your hosts – essentially leaving no footprint on it. You can just connect to it, perform your tasks, and end the connection.

A combination of simplicity and complexity – For basic environments, you can conveniently run Ansible directly from your laptop. However, for more complex environments, you can typically use a dedicated host – called your an Ansible controller that is Linux or runs on the Linux subsystem for Windows.    

 

We’ve only highlighted three specific aspects that make Ansible an extremely valuable component in SAS configuration management. The good news? There’s so much more.

If you’d like to learn more about Ansible and its many benefits, specifically how it can be used for configuration management of your SAS environment, be sure to join our upcoming webinar. We’ll run you through the entire process and welcome your questions with open arms. In the meantime, if you’d like to know more – don’t hesitate to get in touch!