SAS Analytics Pro Cloud Native brings the ability to run SAS within a containerized environment which brings exciting possibilities for CI/CD and integrating SAS into other applications.
SASPy has become a popular choice for Data Scientists and integration developers to bring the power of SAS procedures and data step to Python software development chains. This post seeks to outline the steps required to configure SAS Analytics Pro cloud native to accept SSH connections which are required by SASPy and augment the current documentation for using SASpy with SAS Analytics Pro.
It is also important to note that the following steps can also be used to natively call SAS in STDIO mode from your host machine to the container to perform tasks.
Some Notes on Terminology
For people without a lot of experience in using Docker, SSH, Python or networking, the terminology in web articles can be a bit confusing and overwhelming. The below table outlines the meanings of terms used in this article. For further information as well on how SAS Analytics Pro works in Docker, please see our previous article which outlines Docker concepts and another article on the differences of SAS Analytics Pro and SAS9.
Refers to SAS Analytics Pro Cloud Native which is a product offering from SAS for running a SAS Programming environment within Docker.
Docker is a technology company that provides a runtime and development tools for interacting with Containers and Images.
An image is a set of compressed software libraries and binaries that can be executed as a container inside the Docker runtime environment. Images operate against a common OS kernel. SAS provide an Apro image which can be run as a container.
Containers are an instance of a Docker image. Containers contain additional configuration information such as network settings, volume and port mappings.
Your local machine where you are starting the Apro container from. This may be your laptop, PC, or a server.
This is a communication method for connecting from one machine to another in a network. In this instance we are performing SSH connections between your host and the Apro container.
These are a set of cryptographically generated keys used to identify and authenticate you when using passwordless connections over SSH.
SASPY is a Python package developed by SAS and open-source contributors. It provides an interface to the SAS language including the submission of SAS code, procedures and data interaction. It provides a number of connection methods depending on the type of SAS platform you want to connect to. This includes:
IOM based connections for SAS9 / Metadata server platforms.
HTTP/S for Viya
STDIO over SSH for Linux based servers
STDIO for local connections on Linux where SAS is installed on the machine you are working from.
We will be using STDIO over SSH in this scenario. The STDIO over SSH method enforces passwordless SSH connections so we will need to set this up.
Configuring Passwordless SSH
The first thing you need to set up passwordless SSH is a public and private key pair. To generate, you need some software on your host. On Linux based operating systems OpenSSH is already installed. On Windows you may need to install it or if you use Git, the Git bash client has it installed already.
To check if you have an existing key pair, first check your %USERPROFILE%\.ssh directory on Windows or ~/.ssh directory on OSX/Linux.
ls -al .ssh/
Or in Powershell.
If you see a group of files in the listing starting with id_xxxx and one with an extension of .pub and the other without, you already have a public / private key pair. For example if you have an rsa encrypted key pair you would see two files:
If you have existing keys, ideally they are configured without passphrases. Passphrases are great for interactive usage as they add an additional layer of security but they hinder things when using keys in automation scripts. For SASPY, keys without passphrases work best.
Creating a Key Pair
If you don’t want to use an existing key pair or do not have a set you can generate them using the following commands.
ssh-keygen -t rsa -b 4096 -C "sasdemo"
Let’s break this down:
The command ssh-keygen creates the public and private key pair.
The -t rsa is telling ssh-keygen what type of key to generate. In this case it is the RSA encryption algorithm.
The -b 4096 is telling ssh-keygen the bitness to use in the algorithm
The -C “sasdemo” is a comment to help identify what the key is for. It is appended to the key.
After hitting enter you will be prompted for a few values. You just need to press enter for each one without adding anything different. The exception is If you already have a key named id_rsa and you don’t want to overwrite it. Specify a new name in the same file path it chooses (will default to $home/.ssh/<name>. The below illustration shows this. I have named my keypair as id_rsa_apro
Take note of the name of the key you generated as we will need this later. Next we need to configure the Apro container to allow SSH connections.
Configuring SSH in APRO
Containers by default are generally built to be as lightweight as possible and as such, generally do not have all the same libraries and packages as a full operating system. In fact, it is one of the 12 principals of docker image development.
The Apro container will allow SSH without much configuration. The SAS instructions for this are fairly clear and are transcribed below.
In your /sasinside directory, create a folder called sasosconfig and in that new folder place an empty file called sshd.conf
In your container startup definition you need to add some system capabilities with –CAP_ADD arguments. These capabilities are a linux concept. To read more about their specifics see this guide. The capabilities we are adding are:
We also need to expose a port for ssh communication. We will use the same port used by SAS in their example. Add the following to your invocation command. This is telling docker to expose port 22 in the container and forward that through to port 8222 on your host.
Once you have done the following restart your container for the changes to take effect. If you have followed the steps correctly, you should see your container running in your docker client.
Now we have to create a new directory and set some permissions in the apro container to let us copy your generated key from earlier. From a command line:
Create the .ssh folder under the path you specified for the /data directory.
Run docker exec -u sasdemo sas-analytics-pro chmod -R 755 /data/.ssh to set the permission level for the folder. SSH expects your ssh folder to have restrictive permissions. 755 is the most permissive allowed.
Next we want to copy your public key you created earlier into the newly created .ssh folder.
To do this, we can use the docker cp command. SSH looks for a file called authorized_keys which contains a list of public keys that the server will accept connections from:
Now if all has gone successfully, we can now test our connection!
Testing the SSH connection
If you are on Windows, the docker container IP address may not be usable from outside of the container. On windows we simply need to use localhost or 127.0.0.1 for our server address.
Secondly, SSH by default enforces Strict Host Key Checking. You will receive an error in your attempted connection when the IP address of your Docker container changes which is whenever you restart it. To get around this, you can do one of the following techniques:
Under your .ssh folder you created earlier, add an additional file called config and add the following:
Host * StrictHostKeyChecking no
This is quite permissive. It is telling SSH to ignore host key checking from every host that attempts to connect. To be more stringent and just limit to your SASPy connection you can place connection arguments in your SASPy sascfg_personal.py file which we cover later in this article.
Test via Command Line
To test we have SSH configured let’s see if we can get an interactive terminal to SAS working.
While this is a long command let’s break it down to see what is happening:
We are using the ssh program and forcing a pseudo terminal with the -t command. This is useful when using interactive command line programs.
The -v flag is giving us verbose logging. It’s always a good idea to use -v when testing so you can see additional information about what’s going on. Less v’s give less verbose logging detail.
The -p 8222 argument is telling ssh to connect on port 8222. This is the port you specified in your Apro setup for ssh. Replace the number with the one you used.
The -i argument is telling ssh to use following identity file. This is only needed when your key pair does not use standard names. This file is the private key part of the key pair. You don’t need this if you accepted the defaults in the earlier step of generating the key pair.
Next we have the server address we are communicating to. This is in the form of <username>@<server address>. If you have started Apro with a different username, then replace sasdemo with the login name you use for SAS Studio. For the second part we specify either localhost or 127.0.0.1. The SAS documentation for this part is incorrect. If you are on windows, the docker container IP address will not be usable from outside of the container. You will need to use either localhost or 127.0.0.1.
The next part of the command is invocating SAS in stdio mode. This is the command that SASPY generates when starting a connection. If we successfully connect to SAS then we are almost assured that SASPY will also connect.
If all is successful, you will get an interactive window to SAS.
To exit, hit enter and type endsas;;;; Alternatively, stop the tutorial here and join the ranks of SAS demi gods by using SAS in the original form!
Once we have confirmed that SSH is working correctly to our container next we need to update our saspy configuration. This article won’t go into the details of installing saspy as there are plenty of articles and detailed help in the saspy user documentation at saspy.readthedocs.io
Open your sascfg_personal.py file and add the following:
Add a new entry in the SAS_config_names variable list to label the new connection. Here I will use ‘apro’
Next create a new dictionary variable with the same name you added to SAS_config_names.
Replace the luser value with your SAS Studio login userid, the port you opened in your docker config and any additional SAS options in the options key (not shown).
This configuration is slightly different to a standard SSH STDIO connection with the addition of the localhost and dasho parameters. These are optional and depend largely on whether or not you are running SAS Analytics Pro via Docker Desktop on Windows or via the Docker runtime.
The upload and download methods in SASPy use the socket filename engine in SAS to transfer files between client and server. Docker adds an entry to the windows hosts file which directs your IPv4 address to host.docker.internal. This is then used by Docker to communicate to the external host when required. For a longer discussion on this please see the following Github issue between myself and SAS on the topic. As mentioned previously, you may also receive errors from SSH when your container IP address changes as a result of restarting your container. The dasho parameter is passed to the ssh command to disable this host key checking for SASPy.
Once complete you can test your SASPY connection.
try: import saspy except ImportError: raise ImportError('saspy was not found in the specified path') sas = saspy.SASsession(cfgname='apro') res = sas.submit(code='%put NOTE: Success;',printto=True) print(res.get('LOG'))
If you have your sascfg_personal.py file somewhere other than the default paths, add cfgfile=’/path/to/your/file.py’ to the SASsession method and cfgname='name' if not using default.
Check that you see NOTE: Success in the returned log and if so, you now have SASPY connected to SAS Analytics Pro!
Hopefully this tutorial has been a help to you and assists you in setting up SASPy with Analytics Pro.
We’ve also created the Selerity Desktop (Personal) tool to help make deployments easier if you are uncomfortable with the above concepts. The Selerity Desktop configures SASPy connectivity for you. The personal edition allows you to deploy container environments with a series of additional options such as Python, Clinical Standards Toolkit and SAS OQ testing without needing to know any of the technical details. If you are interested in further information please see our product page for further information or reach out to us to discuss more complex deployments or licensing requirements.
Also stay tuned for future posts where we delve deeper into use cases that SAS Analytics Pro Cloud Native can support and other, more advanced, deployment options.
Customers who are opting for the new SAS® Analytics Pro offering may experience some confusion when they receive the license for the product. When viewing the order information, customer’s will see that the product is licensed to run on the Linux operating system however they may be using Microsoft Windows on their desktop. Further to this, when running the Analytics Pro container, some customers may wonder why their folders and paths to files are different. This article aims to help explain why this is and help you understand these differences so you can fully utilise the numerous advantages of running SAS inside containers.
The SAS Analytics Pro Cloud Native product runs as one or more Docker Containers either via the Docker runtime or packaged as a Kubernetes deployment or service in a k8s cluster. This is a smart approach:
Containers are highly portable and remove many of the quirks of building software for multiple operating systems.
They are fast and scalable and more lightweight than servers.
They are disposable. You can say good bye to the traditional dev / production environment approach.
The underlying root cause of the differences you see is due to how Docker works. While your desktop or server host may be running Windows, the container image is running Linux. Programs and files inside the Analytics Pro container act and look like they would on Linux.
What is a Container?
Containers were launched in 2013 as part of the open-source Docker Engine. Containers were built on existing concepts used in operating systems like Solaris and other unix based operating systems. Over the years, containers have become a reference standard. They are not tied to the Docker Inc. company and numerous vendors implement container technology.
Containers abstract application and deployment dependencies and run against a common OS kernel but run within an isolated user space.
This typically consumes less resources than using Virtual Machines. A Virtual Machine approach packages an entire operating system by abstracting physical hardware processes. A virtual machine also uses a Hypervisor which manages the capability to run virtual machines within a single physical host. Virtual machines consume more space and resources as they normally require a full copy of the target operating system and binaries. This also results in slower boot times to containers.
Docker formed a partnership with Microsoft to bring containers to the Windows operating system to allow Docker to run natively within Windows. On modern Windows hosts, the common kernel is the Windows Subsystem for Linux. When Docker Desktop was released in 2013, WSL did not exist. As a result, older versions of Docker Desktop would run within a Virtual Machine on Windows. This Virtual Machine would house a common Linux Kernel and required docker binaries.
Already it is easy to see the advantages of a technology like Docker for application development.
Standardise the delivery of applications to run on either Windows or Linux with a single code base.
Low-cost testing of applications on a variety of operating systems and their variants easily.
Throw everything away when you don’t need it.
Containers, Images, Networks, Volumes, Ports
Containers are an instance of an Image which is a compressed file of binaries and libraries containing all the required components to execute within the Docker runtime. Images are created using a Dockerfile which is a file format developed by Docker.
As mentioned previously, containers are isolated. That means they typically need to be given references to files and folders to use from your operating system and also be told what ports they can communicate on.
A container will typically contain one or more volume mappings, port mappings and instructions for networking.
Volume mappings are given to a container to expose files and folders on your host to the container so they can be used. This might include things like your source code, configuration files, data etc.
As well as this, an application hosted in a container may need to be told what port to let your host communicate on. Let’s take a web application as an example that runs on port 8080 within docker. On your host you want to access it on the standard web address port of 80. You will tell your Docker container to expose port 8080 in the container to port 80 on your host.
Docker containers can also run within completely self-contained networks and have multiple containers running. A simple version of this is using Docker Compose which comes with most Docker installations. This allows you to define container configurations using a yaml based file format and each container is managed from a created network object and can talk to each other easily.
SAS in the New World
Using containers means that you can also create multiple container instances easily with specific configuration tailored to the purpose you are trying to achieve. In many cases, this means you no longer need to maintain separate non production / production environments. Your user’s can have development environments locally, push their code to version control including the container configuration required and then have your cloud or hosted infrastructure run as isolated production containers. SAS can now sit easily alongside modern CI/CD approaches.
How the Analytics Pro Container uses Linux on Windows
The SAS Analytics Pro for Cloud Native container contains all the required binaries and libraries to execute Base SAS, SAS/STAT, SAS/GRAPH and over 20 SAS/ACCESS engines as well as the Basic deployment of SAS Studio. For further information on the inclusions of SAS Analytics Pro Cloud Native, please see our previous post. If your current SAS usage is focused on the SAS programming interface as opposed to the newer visual interfaces, then it’s a smart modernization choice to make.
As a user, you need to supply the image the following information at a minimum:
A SAS License (as a .jwt file) and deployment certificates (.zip file).
Configuration for mapping files and folders
Environment variables to configure port mappings
Driver and configuration files for whichever access engines you want to use.
The SAS Analytics Pro Container
The current Analytics Pro container comes in at just under 10GB in size. This is opposed to the older SAS9 Depot which would be anywhere up to 30GB in size.
The image is downloadable from SAS using the mirror manager command line tool. You can then also use another command line tool, the SAS Order manager (or manually at my.sas.com) to download licenses and certificate files required. There is also an API available to automate the download of certificates and licenses.
The container runs using the Red Hat Linux base container image. This means that from the application’s perspective, SAS is running with a Linux operating system even though the machine you are running the container from may be Microsoft Windows. This is why when you receive your SAS licensing paperwork, it will say that it is licensed for the Linux operating system.
Analytics Pro can be configured for personal or shared use. The full range of configuration options are too vast to completely cover here. Watch this space as we will be providing deep dives into the various configuration approaches that can be taken. What we will cover here are the most visible differences you will see when using SAS through a container. The most visible are mount volumes and mappings.
Making directories/files available in the Container
The Analytics Pro container needs at a minimum two Volume mappings:
A Data folder mapped to /data in the container. This acts like your Linux home directory. It is typically used to store your SAS programs and any permanent data, etc.
A folder mapped to /sasinside. This stores your SAS licence, deployment certificates and any configuration overrides such as a custom autoexec_usermods.sas and sasv9_usermods.cfg files.
You can also mount any other additional folders from your host machine to the container. Some typical scenarios include:
Creating a mapping for /saswork and /sasutil for SAS temporary datasets and util files.
Creating a mapping for /var/log for system logs
One or more volumes for permanent SAS Datasets.
A python runtime to support the usage of PROC PYTHON
Volumes inside the Analytics Pro container act like Linux mount points. For those who come from a Windows background, this may be confusing at first.
Linux volumes don’t have the same concept of Drive names like C:\, D:\, etc.
Linux volumes do not use back slashes. They use forward slashes.
Linux mounts can be either a directory or a single file.
Say for example, you decide you want to have the following mappings in your Analytics Pro container.
When you use Analytics Pro, your programs will need to reference files and folders by their container path. So a libname statement like libname myenv "/sasdata"; would point to the directory C:\shared\data on your host.
It is important to note that the SAS analytics pro container will also apply permissions to some configuration files. Typically you won’t notice this except in cases such as mapping public/private keys for ssh which require restrictive permissions.
The 2nd most visible difference is that by default, your host username and password won’t be used. In SAS9, SAS will use the authentication provided by your local machine or server you are connecting to.
The Docker container does not have access to this by default due to the isolation principals mentioned previously.
By default, Analytics Pro will create a user account within the container which it uses to allow logging into SAS Studio. It will look for a .authinfo.txt file within your /data mount point. This is a simple text file which has the following contents:
default user myusername password mypassword!
If this file does not exist, SAS will create a default account called sasdemo and generate a random password for you and create this authinfo file in the /data folder.
If you choose to run Analytics Pro in a multi user scenario, then you may need to configure the container to utilise PAM authentication. This is done the same as you would normally within a Linux server.
There are also methods for tying user accounts inside a Docker container to the host’s authentication provider. We’ll cover these approaches in separate articles to give them justice.
Selerity Desktop to Make This Easier
We’ve created the Selerity Desktop (Personal) tool to help make deployments easier if you are uncomfortable with the above concepts. This personal edition allows you to deploy container environments with a series of additional options such as Python, Clinical Standards Toolkit and SAS OQ testing without needing to know any of the technical details. If you are interested in further information please see our product page for further information or reach out to us to discuss more complex deployments or licensing requirements.
Also stay tuned for future posts where we delve deeper into use cases that SAS Analytics Pro Cloud Native can support and other, more advanced, deployment options.
SAS Analytics Pro has been used by analysts, data scientists and programmers for decades, making enterprise grade analytics available on a personal desktop. This solution was first released on the SAS 9 platform and was typically installed on a Windows PCs for use by a single user. With the latest release of SAS Analytics Pro now on the SAS Viya platform, we have prepared a quick comparison of what is included as standard compared to the previous version on SAS 9.x.
The new containerised version of SAS Analytics Pro from the SAS Institute opens up a world of possibilities for leveraging third-party technologies to enhance what is already a pretty powerful Data and Analytics platform.
One of these technologies that has really taken off and helped Data Scientists take advantage of a unified programming experience regardless of the language used is Project Jupyter. A core feature of Project Jupyter is known as a Notebook, and this is explained on the Jupyter site as “…an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text“. This not only makes it an easy, pleasant experience to work in but also facilitates the ability to present complex processes in a nice visual manner to non-programmers – kind of like reading through a notebook 🙂
To enable Jupyter in your environment, open up the apro.settings file in your $deploy directory (the location where you unzipped the Selerity Launcher code) and set JUPYTERLAB to true.
# Enable Jupyter Lab?
Stop your SAS Analytics Pro container if it is currently running with the following command:
docker stop sas-analytics-pro
Now start your environment back up by running the launchapro script again. When you launch your environment with JUPYTERLAB=true the following things happen behind the scenes (all transparent to the user):
A virtual Python 3.9 environment is created in /python (the python directory in the repository)
The bits-and-pieces required to run JupyterLab and Jupyter Notebooks are installed to /python
The SAS Kernel for Jupyter is installed to /python and configured to use SAS in your SAS Analytics Pro container
Jupyter Lab is started up after SAS Studio is started
Depending on the speed of your internet connection it could take up to 15 minutes for all this to happen, but as long as you don’t delete the python directory all subsequent startups should be just as quick as before. This is what the startup process looks like with JupyterLab enabled:
# SAS Analytics Pro Personal Launcher #
# S = SAS Studio has started #
# J = Jupyter Lab has started #
To stop your SAS Analytics Pro instance, use "docker stop sas-analytics-pro"
Open your browser to http://localhost:8888 and enter your generated password. You will then be presented with the JupyterLab main interface:
You can click on the SAS icons in the Launcher to create a new Notebook using the SAS Kernel, and then start writing your SAS code. Click the play button to submit your code:
If you would prefer to just login-and-start-using Jupyter with SAS Analytics Pro, our Selerity Analytics Desktop offering provides SAS Analytics Pro as-a-service (including Jupyter), which can also be integrated into your existing IT infrastructure if required. This allows you to leverage you existing security, login credentials and code assets without needing to maintain your own SAS infrastructure. Contact us if you would like to learn more!
In August 2021 SAS released a cloud-native version of SAS Analytics Pro. This release is based on the SAS Viya Platform and provides the full features of Base SAS, SAS/STAT and SAS/GRAPH via SAS Studio – a browser-based interface that users of recent SAS 9.4 and SAS Viya server environments will be familiar with.
The cloud-native version also adds the full SAS Viya set of SAS/ACCESS products, giving you access to many data sources!
This new release of SAS Analytics Pro leverages container technology, which means that the concept of installing your software is no longer just a matter of running “setup.exe”. The benefits of containerisation are many and include:
Ability to run on Windows, Mac and Linux
A consistent environment – e.g. your install is not going to be different from your colleague’s
Ease of updates – SAS regularly update and patch their software. Previously this meant finding, downloading and installing those updates. Now you just have to “point” to the release of SAS (including updates) you want to use
Portability – if you get a new PC (or move from PC to Mac) you just need to copy your config across
Installing Docker (the software used to run containers)
Logging into my.sas.com to get your license and certificate files
Creating a script to startup your SAS Analytics Pro environment
On the surface, this seems pretty straightforward, but in case users feel a bit hesitant or unsure (especially when it comes to creating a launch script) we have created a “launcher” process to help you out. Our process mirrors the official SAS process but we provide the script, along with a simple way to tweak the features you want to add.
Extract the ZIP file to a location on your machine, e.g. C:\SAS. The final directory (referred to as the $deploy directory) will be a subdirectory of this location created as a result of the unzipping, e.g. C:\SAS\sas-analytics-pro-2021.1.4.
Now log into my.sas.com and go to the “My Orders” section. In here, expand your SAS Analytics Pro Order and then click both the Download Certificates and Download License Only links. Save the file that each link provides to the directory created when you extracted the ZIP file, e.g. C:\SAS\sas-analytics-pro-2021.1.4
Finally, open a PowerShell (or Terminal on Linux/Mac) prompt, “cd” into the $deploy directory (created when you extracted the ZIP file) and then run launchapro.ps1 (or launchapro.sh on Linux/Mac)
After a minute or so you see a message letting you know that SAS Analytics Pro is up and running!
You can now open your browser and log into SAS Studio at http://localhost:81 using the same username you logged into Windows with along with the generated password that is displayed.
There are many ways that this SAS Analytics Pro environment can be tailored/customised and we have provided some of the key options available within the apro.settings file that comes with our launcher. This file contains comments explaining each option, which we also document in the CONFIGURATION.md file.
If you would prefer to just login-and-start-using SAS Analytics Pro, our Selerity Analytics Desktop offering provides SAS Analytics Pro as-a-service, which can also be integrated into your existing IT infrastructure if required. This allows you to leverage you existing security, login credentials and code assets without needing to maintain your own SAS infrastructure. Contact us if you would like to learn more!