Thursday, December 4, 2014

My favorite SAS proc: proc freq

No doubt, my favorite proc step is proc freq. It is extremely simple, but therein lies its beauty: it's fast to run and easy to understand. With just two lines of code (almost zero effort), it delivers a set of frequency output for all variables in one printout that anyone can understand, regardless of their involvement with the actual data set. I've found raw proc freq output to be a terrific starting point for conversations on data cleaning or data exploration, no matter who is involved in the conversation:

PROC freq data=NAME;
run;

Tuesday, June 17, 2014

Free SAS Date Calculator

SAS Date Calculator*
Date:

Days since 1/1/1960:


*For valid results, SAS date value must be between 1582 CE and 20,000 CE on the Gregorian Calendar. Note that American Colonies and Great Britain did not adopt the Gregorian Calendar until 1752.

To convert a date in Excel, subtract 21,916:

Free SAS Datetime Calculator

SAS Datetime Calculator*
Datetime:

Seconds since midnight 1/1/1960:


*For valid results, SAS date value must be between 1582 CE and 20,000 CE on the Gregorian Calendar. Note that American Colonies and Great Britain did not adopt the Gregorian Calendar until 1752.

To convert a datetime in Excel, subtract 21,916 and multiply by 86,400:

Friday, July 6, 2012

Sample "Where" Clauses for Finding Dates in Text Fields

The following where clauses use SAS PRXMatch functions and simple Perl regular expressions to help find dates within SAS dataset text fields.

Identify records that contain number/number, e.g. 1/2, 12/25, etc.:
where prxmatch('*\d\/\d*', fieldname) >= 1;

Identify records that contain number-number, e.g. 1-2, 12-25, etc.:
where prxmatch('*\d-\d*', fieldname) >= 1;

Identify records that contain four-digit numbers, e.g. 2007:
where prxmatch('*\d\d\d\d*', fieldname) >= 1;

Identify records that contain the word "July":
where prxmatch('*July*', fieldname) >= 1;

Or identify records that contain any of the above, any month name, or any month three-letter abbreviation:
where prxmatch('*\d\/\d*', fieldname) >= 1
OR prxmatch('*\d-\d*', fieldname) >= 1
OR prxmatch('*\d\d\d\d*', fieldname) >= 1
OR prxmatch('*January*', fieldname) >= 1
OR prxmatch('*Feburary*', fieldname) >= 1
OR prxmatch('*March*', fieldname) >= 1
OR prxmatch('*April*', fieldname) >= 1
OR prxmatch('*May*', fieldname) >= 1
OR prxmatch('*June*', fieldname) >= 1
OR prxmatch('*July*', fieldname) >= 1
OR prxmatch('*August*', fieldname) >= 1
OR prxmatch('*September*', fieldname) >= 1
OR prxmatch('*October*', fieldname) >= 1
OR prxmatch('*November*', fieldname) >= 1
OR prxmatch('*December*', fieldname) >= 1
OR prxmatch('*Jan*', fieldname) >= 1
OR prxmatch('*Feb*', fieldname) >= 1
OR prxmatch('*Mar*', fieldname) >= 1
OR prxmatch('*Apr*', fieldname) >= 1
OR prxmatch('*May*', fieldname) >= 1
OR prxmatch('*Jun*', fieldname) >= 1
OR prxmatch('*Jul*', fieldname) >= 1
OR prxmatch('*Aug*', fieldname) >= 1
OR prxmatch('*Sep*', fieldname) >= 1
OR prxmatch('*Oct*', fieldname) >= 1
OR prxmatch('*Nov*', fieldname) >= 1
OR prxmatch('*Dec*', fieldname) >= 1;


Or, a macro that will print any records that match the above:
%macro dsearch(tablename, fieldname);
    proc print data = &tablename;
        var &fieldname;
        where prxmatch('*\d\/\d*', &fieldname) >= 1
            OR prxmatch('*\d-\d*', &fieldname) >= 1
            OR prxmatch('*\d\d\d\d*', &fieldname) >= 1
            OR prxmatch('*January*', &fieldname) >= 1
            OR prxmatch('*Feburary*', &fieldname) >= 1
            OR prxmatch('*March*', &fieldname) >= 1
            OR prxmatch('*April*', &fieldname) >= 1
            OR prxmatch('*May*', &fieldname) >= 1
            OR prxmatch('*June*', &fieldname) >= 1
            OR prxmatch('*July*', &fieldname) >= 1
            OR prxmatch('*August*', &fieldname) >= 1
            OR prxmatch('*September*', &fieldname) >= 1
            OR prxmatch('*October*', &fieldname) >= 1
            OR prxmatch('*November*', &fieldname) >= 1
            OR prxmatch('*December*', &fieldname) >= 1
            OR prxmatch('*Jan*', &fieldname) >= 1
            OR prxmatch('*Feb*', &fieldname) >= 1
            OR prxmatch('*Mar*', &fieldname) >= 1
            OR prxmatch('*Apr*', &fieldname) >= 1
            OR prxmatch('*May*', &fieldname) >= 1
            OR prxmatch('*Jun*', &fieldname) >= 1
            OR prxmatch('*Jul*', &fieldname) >= 1
            OR prxmatch('*Aug*', &fieldname) >= 1
            OR prxmatch('*Sep*', &fieldname) >= 1
            OR prxmatch('*Oct*', &fieldname) >= 1
            OR prxmatch('*Nov*', &fieldname) >= 1
            OR prxmatch('*Dec*', &fieldname) >= 1
    ;
    run;
%mend dsearch;

Wednesday, June 13, 2012

SAS Macro to Import all Worksheets of an XLSX file

/* SAS Macro to import all worksheets of an XLSX file */

/* Extra required software includes SAS/ACCESS, SAS/MACRO */
/* Assumes first row contains field names */
/* Assumes worksheets have valid SAS dataset names */ 
/* This macro provided as-is, use at your own risk */

/* Macro parameters:
    libin = libname to be temporarily assigned to XSLX file
    libinpath = path to XLSX file
    libout = libname where SAS datasets should be exported
*/

%macro XLSXimport(libin, libinpath, libout);
    /* Open link to xlsx file */
    libname &libin. "&libinpath";

    proc sql noprint;
        /* Create array with name of each dataset */
        select memname into :mem1 - :mem&sysmaxlong
        from dictionary.tables
        where libname=upcase("&libin"
        and memtype = upcase('data');

        /* Loop to import each dataset to libout */
        %do i=1 %to &sqlobs;
            create table &libout..%substr(&&mem&i,1,
                %EVAL(%LENGTH(&&mem&i)-1)) as
            select * from &libin.."&&mem&i"n ;
        %end;
    quit;

    /* Close link to xlsx file */
    libname &libin. CLEAR;
%mend XLSXimport;

%XLSXimport(mylib, c:\users\public\EXCEL FILE.xlsx, work);
run;