SAS Analytics Pro Cloud Native brings the ability to run SAS within a containerized environment which brings exciting possibilities for CI/CD and integrating SAS into other applications.
SASPy has become a popular choice for Data Scientists and integration developers to bring the power of SAS procedures and data step to Python software development chains. This post seeks to outline the steps required to configure SAS Analytics Pro cloud native to accept SSH connections which are required by SASPy and augment the current documentation for using SASpy with SAS Analytics Pro.
It is also important to note that the following steps can also be used to natively call SAS in STDIO mode from your host machine to the container to perform tasks.
For people without a lot of experience in using Docker, SSH, Python or networking, the terminology in web articles can be a bit confusing and overwhelming. The below table outlines the meanings of terms used in this article. For further information as well on how SAS Analytics Pro works in Docker, please see our previous article which outlines Docker concepts and another article on the differences of SAS Analytics Pro and SAS9.
Term | Meaning |
Apro | Refers to SAS Analytics Pro Cloud Native which is a product offering from SAS for running a SAS Programming environment within Docker. |
Docker | Docker is a technology company that provides a runtime and development tools for interacting with Containers and Images. |
Image | An image is a set of compressed software libraries and binaries that can be executed as a container inside the Docker runtime environment. Images operate against a common OS kernel. SAS provide an Apro image which can be run as a container. |
Containers | Containers are an instance of a Docker image. Containers contain additional configuration information such as network settings, volume and port mappings. |
Host | Your local machine where you are starting the Apro container from. This may be your laptop, PC, or a server. |
SSH | This is a communication method for connecting from one machine to another in a network. In this instance we are performing SSH connections between your host and the Apro container. |
Key pair | These are a set of cryptographically generated keys used to identify and authenticate you when using passwordless connections over SSH. |
SASPY is a Python package developed by SAS and open-source contributors. It provides an interface to the SAS language including the submission of SAS code, procedures and data interaction. It provides a number of connection methods depending on the type of SAS platform you want to connect to. This includes:
We will be using STDIO over SSH in this scenario. The STDIO over SSH method enforces passwordless SSH connections so we will need to set this up.
The first thing you need to set up passwordless SSH is a public and private key pair. To generate, you need some software on your host. On Linux based operating systems OpenSSH is already installed. On Windows you may need to install it or if you use Git, the Git bash client has it installed already.
To check if you have an existing key pair, first check your %USERPROFILE%\.ssh
directory on Windows or ~/.ssh
directory on OSX/Linux.
cd ~
ls -al .ssh/
Or in Powershell.
cd %USERPROFILE%
ls .ssh/
If you see a group of files in the listing starting with id_xxxx
and one with an extension of .pub and the other without, you already have a public / private key pair. For example if you have an rsa encrypted key pair you would see two files:
id_rsa
id_rsa.pub
If you have existing keys, ideally they are configured without passphrases. Passphrases are great for interactive usage as they add an additional layer of security but they hinder things when using keys in automation scripts. For SASPY, keys without passphrases work best.
If you don’t want to use an existing key pair or do not have a set you can generate them using the following commands.
ssh-keygen -t rsa -b 4096 -C "sasdemo"
Let’s break this down:
ssh-keygen
creates the public and private key pair.-t rsa
is telling ssh-keygen what type of key to generate. In this case it is the RSA encryption algorithm.-b 4096
is telling ssh-keygen the bitness to use in the algorithm-C “sasdemo”
is a comment to help identify what the key is for. It is appended to the key.After hitting enter you will be prompted for a few values. You just need to press enter for each one without adding anything different. The exception is If you already have a key named id_rsa and you don’t want to overwrite it. Specify a new name in the same file path it chooses (will default to $home/.ssh/<name>
. The below illustration shows this. I have named my keypair as id_rsa_apro
Take note of the name of the key you generated as we will need this later. Next we need to configure the Apro container to allow SSH connections.
Containers by default are generally built to be as lightweight as possible and as such, generally do not have all the same libraries and packages as a full operating system. In fact, it is one of the 12 principals of docker image development.
The Apro container will allow SSH without much configuration. The SAS instructions for this are fairly clear and are transcribed below.
/sasinside
directory, create a folder called sasosconfig
and in that new folder place an empty file called sshd.conf
–CAP_ADD
arguments. These capabilities are a linux concept. To read more about their specifics see this guide. The capabilities we are adding are:--cap-add AUDIT_WRITE
--cap-add SYS_ADMIN
--publish 8222:22
Once you have done the following restart your container for the changes to take effect. If you have followed the steps correctly, you should see your container running in your docker client.
Now we have to create a new directory and set some permissions in the apro container to let us copy your generated key from earlier. From a command line:
/data
directory.docker exec -u sasdemo sas-analytics-pro chmod -R 755 /data/.ssh
to set the permission level for the folder. SSH expects your ssh folder to have restrictive permissions. 755 is the most permissive allowed.Next we want to copy your public key you created earlier into the newly created .ssh folder.
To do this, we can use the docker cp
command. SSH looks for a file called authorized_keys which contains a list of public keys that the server will accept connections from:
docker cp %USERPROFILE%\.ssh\id_rsa_apro.pub sas-analytics-pro:/data/.ssh/authorized_keys
Now if all has gone successfully, we can now test our connection!
If you are on Windows, the docker container IP address may not be usable from outside of the container. On windows we simply need to use localhost or 127.0.0.1 for our server address.
Secondly, SSH by default enforces Strict Host Key Checking. You will receive an error in your attempted connection when the IP address of your Docker container changes which is whenever you restart it. To get around this, you can do one of the following techniques:
Under your .ssh
folder you created earlier, add an additional file called config
and add the following:
Host *
StrictHostKeyChecking no
This is quite permissive. It is telling SSH to ignore host key checking from every host that attempts to connect. To be more stringent and just limit to your SASPy connection you can place connection arguments in your SASPy sascfg_personal.py
file which we cover later in this article.
To test we have SSH configured let’s see if we can get an interactive terminal to SAS working.
ssh -t -vvv -p 8222 -i %USERPROFILE%\.ssh\id_rsa_apro sasdemo@127.0.0.1 /opt/sas/viya/home/SASFoundation/sas -fullstimer -nodms -stdio -terminal -nosyntaxcheck -pagesize MAX
While this is a long command let’s break it down to see what is happening:
-t
command. This is useful when using interactive command line programs.-v
flag is giving us verbose logging. It’s always a good idea to use -v when testing so you can see additional information about what’s going on. Less v’s give less verbose logging detail.-p 8222
argument is telling ssh to connect on port 8222. This is the port you specified in your Apro setup for ssh. Replace the number with the one you used.-i
argument is telling ssh to use following identity file. This is only needed when your key pair does not use standard names. This file is the private key part of the key pair. You don’t need this if you accepted the defaults in the earlier step of generating the key pair.<username>@<server address>
. If you have started Apro with a different username, then replace sasdemo with the login name you use for SAS Studio. For the second part we specify either localhost or 127.0.0.1. The SAS documentation for this part is incorrect. If you are on windows, the docker container IP address will not be usable from outside of the container. You will need to use either localhost or 127.0.0.1.If all is successful, you will get an interactive window to SAS.
To exit, hit enter and type endsas;;;;
Alternatively, stop the tutorial here and join the ranks of SAS demi gods by using SAS in the original form!
Once we have confirmed that SSH is working correctly to our container next we need to update our saspy configuration. This article won’t go into the details of installing saspy as there are plenty of articles and detailed help in the saspy user documentation at saspy.readthedocs.io
Open your sascfg_personal.py file and add the following:
apro = {
'saspath': '/opt/sas/viya/home/SASFoundation/sas_u8',
'ssh': 'ssh',
'host': 'localhost',
'port': 8222,
'luser': 'sasdemo',
'localhost': 'host.docker.internal',
'encoding': 'utf_8',
'dasho': 'StrictHostKeyChecking=no'
}
Replace the luser
value with your SAS Studio login userid, the port you opened in your docker config and any additional SAS options in the options key (not shown).
This configuration is slightly different to a standard SSH STDIO connection with the addition of the localhost
and dasho
parameters. These are optional and depend largely on whether or not you are running SAS Analytics Pro via Docker Desktop on Windows or via the Docker runtime.
The upload and download methods in SASPy use the socket filename engine in SAS to transfer files between client and server. Docker adds an entry to the windows hosts file which directs your IPv4 address to host.docker.internal
. This is then used by Docker to communicate to the external host when required. For a longer discussion on this please see the following Github issue between myself and SAS on the topic.
As mentioned previously, you may also receive errors from SSH when your container IP address changes as a result of restarting your container. The dasho parameter is passed to the ssh command to disable this host key checking for SASPy.
Once complete you can test your SASPY connection.
try:
import saspy
except ImportError:
raise ImportError('saspy was not found in the specified path')
sas = saspy.SASsession(cfgname='apro')
res = sas.submit(code='%put NOTE: Success;',printto=True)
print(res.get('LOG'))
If you have your sascfg_personal.py
file somewhere other than the default paths, add cfgfile=’/path/to/your/file.py’
to the SASsession method and cfgname='name'
if not using default
.
Check that you see NOTE: Success
in the returned log and if so, you now have SASPY connected to SAS Analytics Pro!
Hopefully this tutorial has been a help to you and assists you in setting up SASPy with Analytics Pro.
We’ve also created the Selerity Desktop (Personal) tool to help make deployments easier if you are uncomfortable with the above concepts. The Selerity Desktop configures SASPy connectivity for you. The personal edition allows you to deploy container environments with a series of additional options such as Python, Clinical Standards Toolkit and SAS OQ testing without needing to know any of the technical details. If you are interested in further information please see our product page for further information or reach out to us to discuss more complex deployments or licensing requirements.
Also stay tuned for future posts where we delve deeper into use cases that SAS Analytics Pro Cloud Native can support and other, more advanced, deployment options.
Automated testing is a huge improvement for SAS data analytics software. SAS offers several benefits to organisations, but if there is one feature it lacks, it is automated testing. SAS algorithms are complex, and those who do not have a background in SAS have a hard time working with SAS code or testing its framework. Automated testing could be the solution the software needs. So, in this blog post, I will be discussing how test automation is a huge win for SAS software.
Testing SAS data analytics software is a challenging prospect for designers, coders and testers without SAS training. The challenge causes two problems: The teams cannot integrate the process and this then results in SAS bespoke test scripts, which leads to delays in project development and deployment. Automated testing solves this problem by streamlining the process. With this method of testing, coders don’t have to go through every line of SAS code or rely on developers to write tests. By streamlining the testing process with automation, it significantly improves the rate of software delivery and improves communication between teams.
Before the days of test automation, SAS developers were forced to take several shortcuts to accommodate testing frameworks. For example, instead of using these frameworks in the development process, they would re-write the same logic repeatedly on different projects even to test new features. Furthermore, not working with a testing framework allows many bugs to accumulate, which is a nightmare for software development. Automated testing addresses some of these issues, by removing the barrier that prevents testers and programmers from testing their bespoke SAS software. With test automation, programmers can create more stable software.
As mentioned before, SAS code can be quiet complex, especially for testers and programmers without a SAS background. Even for those who are familiar with SAS, testing can be a long, arduous process, sometimes with a large team as well. By automating the testing process, SAS analytics software can be tested more frequently, without a large team of testers and programmers. Higher frequency of testing means more stable software and higher quality features.
Unless you have been living under a rock, you would have heard that software development has been shifting from waterfall to agile methodology, where the focus is on professionals with different specialities coming together to deliver software as quickly as possible through subsequent reiterations of the software. Automated testing is the perfect complement for the agile methodology because it allows programmers to test more efficiently and with more frequency, improving the rate of delivery. The improvement in delivery makes test automation an integral part of agile software programming.
Automated testing can lower the cost of developing SAS analytics software. Analytics providers and clients have to pay a higher upfront cost than before. However, the trade-off is a much better testing process where more is done at a faster rate and with a smaller team. While testers are still important, especially for more high-end tasks, test automation tools can do much of the testing without human input. This is exciting news for SAS software because it makes bespoke SAS software more accessible to other organisations, especially when combined with other technologies, like the cloud. Automated testing provides tremendous value to both client and analytics provider because testing is more cost-efficient.
Despite the obvious benefits of test automation, there are some challenges to integrating it into the development process. One reason is configuring test automation – because it is a huge upfront cost. Furthermore, quickly scaling test environments is incredibly challenging, especially when programmers and testers are working in the cloud. Scaling for different test environments proves to be a huge challenge. In some cases, automated testing can lack visibility, especially when different teams are using different strategies for automated testing.
An open testing framework can negate this problem, to some degree. Finally, too many UI tests can break the automated testing. Though these challenges can prove to be a barrier to entry for some organisations, it is only a matter of time before test automation becomes the norm in SAS data analytics software.
When SAS announced that it would extend its partnership with open-source developer Red Hat, it made me think about container platforms and their benefits to SAS software. It doesn’t come as a surprise to me that SAS wants to continue this relationship because of open source technologies and complementary software have done a fantastic job mitigating the weaknesses of SAS platforms. So, in this blog post, I am going to discuss what a container platform is and how it benefits SAS users.
Before explaining the benefits in SAS deployment, we need to explain what exactly a container platform is. Containers are stand-alone packaged applications containing the SAS software and its complementary software bundled together. Containers are often compared to shipping containers, just like containers, a shipping container is used to carry goods around the cargo ship and move it around quickly. Containers present a consistent interface and can easily shift software to different environments.
A container platform is often deployed with the SAS platform because it allows SAS programmers to streamline deployments and tie up several internal and multiple software packages. It usually takes months to deploy SAS software, but containers allow programmers to skip this step, making SAS more affordable for organisations that would have found the cost of deployment too high.
As mentioned before, a container platform streamlines SAS deployment, but there are other benefits as well. For example, containers can make SAS platforms more accessible to different infrastructures, thanks to the dependency software that normally comes with the container. This means that SAS can be launched, even if the code was not developed in a test environment. Containers can even be moved between different servers, and even between the private and public cloud. Hence, SAS developers do not have to worry about the type of environment they are deploying in. This brings significant benefits by making upgrades easier, scalability becomes a valid option and allows developers to make the most of their resources.
Containers are designed to take full advantage of the power and benefits of the cloud, which includes scalability and the ability to quickly deploy new applications. With cloud-based analytics, organisations can deploy SAS without the need for expensive data servers. This bodes well for SAS because the software is designed to process large amounts of data. So organisations will have access to the power of SAS without investing in expensive physical servers.
For SAS programmers, a container platform saves a lot of time because they do not have to wait for the application as containers can start-up immediately, freeing up system resources for other containers. It works because containers are compact and lightweight by sharing operating system kernel allowing more containers to fit into a single host.
A container platform helps with rollback and makes for tighter tracking between changes. Put simply, a container platform helps with version control. It works because a container is a version of an application known as an image. The image is placed under source control in a private or publicly hosted container image repository. When combined, image and repository act as a version control mechanism for the SAS application.
Finally, containers can access the CPU and RAM of the infrastructure. This enables containers to do several things like requesting the resources they need at run time, enabling them to perform certain functions, like fast spin-up and spin-down of containers.
It should be noted that a container platform is not a perfect match for SAS platforms by default. There are several things to consider before implementing a container platform. For example, programmers need to pay attention to data Input/Output. Furthermore, factors like RAM, CPU and job execution all need to be taken into account when implementing SAS with container platforms. SAS platforms try to process as much data as possible, so hardware and software need to be well-optimised for this function.
Nonetheless, there is no denying that a container platform is one of the best assets for SAS platforms because it optimises several functions and makes the entire process easier. SAS platforms make it easier for programmers to deploy and much more accessible to a wider range of organisations. With all these benefits in mind, it is no wonder that SAS is renewing their partnership with Red Hat.
SAS installation and environment configuration management can be an incredibly complex process that comprises multiple configuration options. These options have the ability to affect every aspect of an analytics environment and SAS configuration management, from performance to security. With the help of expert SAS installers and administrators, many organisations leveraging SAS software are often provided with the perfect environment – optimised and built to deliver an analytics platform they need.
What goes wrong? A lot of the operations and procedures that are conducted during the installation phase are documented externally – either on spreadsheets or separate documents. Therefore, as time goes by, administrators make changes to configuration files, updates are implemented, and the overall operating system experiences changes, these changes break from the initial configuration. What we see happening in this situation is the SAS environment drifting away from its original, perfect form. This is precisely where Ansible can come into play.
Before we dive into that, however, let’s first take a look at what encompasses a true SAS configuration. These insights will give you a clearer perspective as to why automated and streamlined SAS configuration management is so important.
The important thing to keep in mind is that, if used correctly, your SAS environment is going to be constantly evolving. From regular updates to various internal administrative tweaking, it’s normal for any SAS environment and configuration to change over time. The problem with this is that in many instances, these changes occur in a decentralised manner, which results in a disconnect between an environment’s initial configuration versus where it is right now.
To understand why this is so critical, you need to look at the three deployment phases that a SAS environment goes through – prerequisite determination, installation and deployment, and ultimately the configuration phase.
From making sure you have the right users created with the right permissions on groups, along with the right disk space at the prerequisite phase to actually completing the required manual tasks, installing SAS, and running the SAS Hotfix tool at the deployment stage to finally configuring your environment, setting up your environment is no small task.
Throughout these very complex stages, vital configurations that are critical to the sustainability and existence of a SAS environment are set up. Therefore, to answer our initial question as to why SAS configuration management is so important, poor configuration management can result in the fragmentation of this environment – resulting in inefficiencies and misconfigurations in the long-term. SAS admins need to be on top of this!
In short and within the context of a SAS environment, our services manager, Cameron Lawson, says it best – “Ansible is an extendable tool, written in Python, that is a scriptable way of managing configurations across multiple hosts – you can run it directly from your laptop, via dedicated hosts, on-premise, multi-clouds (Azure, AWS), or hybrids”.
From SAS installation to day-to-day management, Ansible is something your environment can benefit from significantly. Additionally, while there are many alternatives to Ansible, there are noticeable differences that set it apart.
The main concern about configuration management in a given SAS environment is that, over time, configurations are altered from the original configuration. Therefore if left managed improperly, a SAS configuration can drift away from what it once was. By running Ansible, users and administrators can ensure they maintain the baseline of their original configuration while making changes as required.
Here are three specific ways that Ansible is a major asset for SAS configuration management:
Any changes you need to make can be done via Ansible – Unlike initiating changes to your configuration via a random, on-the-go method, Ansible allows SAS admins to do this centrally. The changes made through Ansible are organised and permeated throughout the system – ensuring your configuration remains consistent with its original form.
Ansible is agent-free – There are many alternatives to Ansible, like Chef. However, one key distinction between Ansible and many of its counterparts is that you wouldn’t have to install an agent on the host that you’re managing to keep the configuration running as required. Ansible uses SSH to communicate with your hosts – essentially leaving no footprint on it. You can just connect to it, perform your tasks, and end the connection.
A combination of simplicity and complexity – For basic environments, you can conveniently run Ansible directly from your laptop. However, for more complex environments, you can typically use a dedicated host – called your an Ansible controller that is Linux or runs on the Linux subsystem for Windows.
We’ve only highlighted three specific aspects that make Ansible an extremely valuable component in SAS configuration management. The good news? There’s so much more.
If you’d like to learn more about Ansible and its many benefits, specifically how it can be used for configuration management of your SAS environment, be sure to join our upcoming webinar. We’ll run you through the entire process and welcome your questions with open arms. In the meantime, if you’d like to know more – don’t hesitate to get in touch!
You must be logged in to post a comment.