Big Data Analytics with SAS
上QQ阅读APP看书,第一时间看更新

What does SAS do?

 The co-founder and CEO of SAS, Dr. James H. Goodnight, sums what SAS does with this quote:

"SAS is the first company to call when you need to solve complex business problems."                                                                        -Dr. James H. Goodnight, SAS Institute Inc., CEO and co-founder

SAS helps solve business problems by being the best at applying advanced analytics, whether it's predictive analytics (otherwise known as data mining), forecasting, optimization, or a of some or all of them, in order to improve business processes and deliver more valuable data-driven information to decision-makers so they can make the best decisions possible to help grow their organizations. The fundamental value of analytics is in being able to increase revenue and/or cut costs, and ultimately, that's what SAS provides to its clients and their organizations.   

What is your perception of SAS?

Some people will come to this book with their own of what SAS is, which is fine. I hope this book will serve to open their minds to a broader understanding of what SAS is beyond what they may have thought prior to reading this book. People's perceptions of SAS are typically based on either their own experience of working with SAS as a programming language, or with talking to someone else who is or was a SAS Programmer within their organization. Many times, people's perceptions are partially accurate, but most of the time they are basing their perception on out-dated information.

For example, many people will tell you that you have to buy a SAS license in order to learn how to use it. This was true in the past, but no longer is accurate. You will learn shortly in this chapter how to a download, install, and use a free version of SAS so that you can have hands-on experience using SAS by doing the examples provided in this book. Another perception some people have of SAS is that you must always write code, which again is based on somewhat out-dated information: 

Figure 1.1: Perceptions of SAS

While it is true that you can write SAS code if you wish to, there are several ways to use SAS solutions via GUIs that provide easy-to-understand and drag and drop capabilities. Some will generate SAS code for you, while several of SAS's newer solutions are primarily driven via modern web-based interfaces that allow you to interact or integrate with other technologies via standard application program interfaces (APIs) such as Java, REST, Python, and even R. This book's primary focus is on you some of the programming languages built into SAS; however, there will also be some overviews and references to some of the optional GUIs available within the SAS ecosystem. 

Let's get started with your free version of SAS

The free version of SAS that you use while reading this book is known as the SAS® University Edition, and is available for download the main SAS website: https://www.sas.com/en_us/software/university-edition.html.

Download and install the yourself, or launch it in the cloud via Amazon Web Services (AWS). There's no need to go through convoluted channels for software distribution. Here is the SAS® University Edition website: https://www.sas.com/en_us/software/university-edition.html

This free version is available for direct download for Windows, OS X, and Linux, as well as available via AWS. Please verify that for whichever version you want to use, your system meets the requirements listed here:  https://www.sas.com/en_us/software/university-edition.html#m=system-requirements

  1. When you select Get free software you will activate this URL,https://www.sas.com/en_us/software/university-edition.html#m=get-free-software, and be presented with the following window:

Figure 1.2: SAS® University Edition selection window

Note

Choose whichever option works best for you; however, for this book we will walk through and use the Direct Download option. On the next page, you should choose the operating system that you want to use: Windows, OS X, o r Linux. For this book, we will choose Windows. It is recommended that the reader downloads the quick start guide and/or watches the video available at the given link. 

  1. Now you will want to on to the next step. Because SAS® University Edition is a virtual application (vApp), you need virtualization to run it. You can download Oracle VirtualBox for Windows, a free virtualization software package, using the following link: https://www.virtualbox.org/wiki/Downloads?_sm_byp=iVVSgJ3HMfR7vg5r.
Note

In addition to Oracle VirtualBox, SAS® University Edition works with VMware Workstation Player virtualization software. If you prefer to use VMware Workstation Player, charges may apply. For this book, we chose to use the Oracle VirtualBox for Windows.

  1. After installing the Oracle VirtualBox, you should see the following screen when the VirtualBox application starts:

Figure 1.3: Oracle VirtualBox application

  1. Leave the Oracle VirtualBox and return to the download SAS® University Edition page and perform the next step, which is to download the SAS® University Edition vApp.
Note

If you don't already have a profile set up on www.sas.com, you will need to create one in order to download the SAS® University Edition vApp. It is important to note that the vApp is 2.0 GB in size, and as such you should plan to connect via as large a bandwidth as you have available to complete this step.

  1. After the SAS vApp downloads, you need to import it into the Oracle VirtualBox. Once you select the SAS® University Edition from the list that pops up and select Import , you should see something similar to this window:

Figure 1.4: Importing the SAS® University Edition vApp into Oracle VirutalBox

  1. Once you have successfully completed importing the SAS vApp, the Oracle VirtualBox application should look like this:

Figure 1.5: Completed import of SAS® University Edition vApp into Oracle VirtualBox

Note

Now you will need to follow steps 3, 4, and 5 listed in the SAS® University Edition quick start guide in order to make use of your SAS® University Edition. Make sure you use the exact folder names and case as stated in the guide. For this book we used C:\SASUniversityEdition\myfolders.

  1. Once you have completed step 4 in the SAS® University Edition quick guide, you should see a window similar to this:

Figure 1.6: Successful start of the SAS® University Edition vApp

Note

You can minimize this window; however, don't close it until you are done with your current SAS session. Once you minimize it, you can start your SAS environment via one of the supported web browsers using http://localhost:10080

Your web browser should look similar to this:

Figure 1.7: Successful start of your SAS environment from your web browser

Congratulations, you have successfully your free SAS® software and are now ready to begin learning your new skill, which will be your first steps in learning how to do big data analytics with SAS.   

Note

SAS Studio is the newest GUI for writing SAS code and is a web-based thin client that in this case will be communicating with SAS, running within your SAS® University Edition vApp. This is just one example of how SAS has made complex work, such as client-server setup and installation, easy for their users.

History of SAS interfaces

Before we start getting familiar with the SAS Studio GUI, it be best to provide the reader with a historical background of the interfaces to the SAS system. The SAS system was initially written and run on the mainframe back in the 1970s, and as such it worked with what is known as a command-line interface. This means there was no application window, but you could write one line of code and submit it, followed by your next line of code.

Interestingly enough, this command-line interface still persists today across all the operating systems that SAS runs on, which includes mainframes, Windows, and Unix/Linux. When SAS was rewritten in C in the 1980s, the original interface with SAS, which still exists and is used today, was called the SAS Display Management System (DMS). Today, DMS is referred to as the SAS windows environment. This consists of three primary windows: a program editor for writing and submitting code, a log for debugging the code, and an output window for displaying results. Today, you still have the capability of running SAS with DMS or with a NODMS option on several operating systems. One of the other most popular and commonly-used GUIs for SAS is known as SAS Enterprise Guide, which is a .NET-written Windows-only client that allows you to do quite a lot of SAS work using drag and drop functionality, and generates SAS code for you for everything you do in the interface.

A more analytically advanced and data-scientist-focused interface for doing data mining within the SAS environment is known as SAS Enterprise MinerTM. Once again, this type of GUI provides the user with a lot of power within a drag and drop environment, and also provides self-documentation of the process, thereby helping one data scientist become much more productive from a time perspective than always having to hand-code and then self-document their work.    

Why the history on the interface to SAS? Well, first of all you should want to be seen as an experienced SAS Programmer, and if you don't know about the SAS DMS, SAS Enterprise Guide, and SAS Enterprise MinerTM, then you will not be viewed as such. Second, as has already been stated, SAS is an analytic processing environment, and as such there is a variety of SAS solutions that provide their own business-purpose-focused GUI to interact with this single backend environment. These GUIs make it easier to perform specific tasks associated with the entire analytics lifecycle, whether it's data management and data preparation steps or data mining steps, or forecasting steps, or data visualization steps. What's nice about this is regardless of which way you interact with SAS, whether you program, interact through a GUI, or for that matter through an API or web service, from a governance and audit standpoint you are using the same set of tested and proven algorithms that provide consistent and repeatable results.