History
A brief history of the SodaPop system
In the beginning...
The SodaPop System had its beginnings at the Population Research Institute Computer Core (which included the Data Archive) in 1995. The resources of the Computer Core were considered top-notch for the time. We were running a single-platform (unix), were using interactive SAS for statistical analysis and had moved probably 1G of data from tape to disk. We also had a Gopher server.
When it came to using the vast amounts of data on the network, there were still many problems. Users had to navigate the unix network in command-line mode to locate the files, files were in a multitude of file formats ranging from text files (ascii or ebcdic) to a variety of proprietary file formats, all compressed to save disk space. Users first had to determine what type of file it was and how to read it. Each user then had to build a SAS dataset from the raw data file every time they wanted to analyze it. Sometimes code was provided with the data, sometimes not, but it was up to the user to determine whether the data had be read in correctly. Many hours of user training focused solely on this process.
The central idea behind SodaPop was to have the Computer Core programmers create a SAS dataset view from the compressed raw data and store that view on the network. Eliminating the first step in the process and ensuring that data was properly loaded. Users could concentrate on learning SAS commands for computation and analysis, rather than input statements.
Two new developments occurred around that time: 1) we began experimenting with HTML to document the data holdings and 2) we began experimenting with using and programming graphical user interfaces.
With the data in SAS dataset views, html documentation of the holdings and GUIs we were ready to launch the pre-cursor to SodaPop -- "PRI PopUps". The system was a GUI written in SAS/AF screen control language that was accessible from within an interactive SAS sesssion. The interface provided users with a list of available data, issued the LIBNAME statement to access the data set view, opened a browser window containing the HTML documentation and sent the user to the SAS/ASSIST, the SAS GUI to statistical procedures. For details, refer to the paper presented at the SAS Users Group International Conference in 1996 - Quick and Easy AF Applications Using SAS/ASSIST Software, SAS Data Views, Block Menus and HTML(.doc) by Jeanne Spicer
Evolution
By 1999, the use of the web was commonplace and we began experimenting with SAS/Intrnet software, a PC network was added which expanded the number of software packages available to the user and the cost disk space dropped. In light of those developments, we decided to totally re-engineer the PRI PopUp system.
The first major change involved taking advantage of the metadata stored in a SAS dataset. SAS/Intrnet programs were written to access output from PROC CONTENTS about variable names and variable labels in the datasets or dataset view. This enabled users to search that output for variables by name or keyword via a web form. We also began to generate standardized HTML documentation from the information for each data collection. With help from the Information Core, bibliographic citations were provided for the datasets along with "Programmers Notes" and links to related resources. Sample programs for accessing the SAS data directly were also generated for the user to modify on their own.
Since disk space was now cheap, data set views we began to replace dataset views with actual SAS datasets. For more information, refer to the paper presented by the programming team at the NorthEast SAS Users Group Conference in 2000 - SAS Online Data Archive for Population Studies by David Barro, Leslie Benson, Steve Maczuga, Cindy Mitchell and Jeanne Spicer
Post-Y2K
The data archive, now in the Information Core, began to standardize its HTML documentation its the holdings. SAS/Intrnet programs were added to allow the user to construct extracts of selected variables in the SAS datasets via a web form. There was an individual SAS/Intrnet program tailored to each data collection. More data was being added and we were receiving requests from users outside of PSU for access. Programs were developed to produce crosstabs describing gender and minority inclusion in the datasets, a requirment for NIH grant proposals.
In order to manage this, we needed more meta-data and more generic programs to handle the data. Metadata tables were developed to store information about the data collections in SodaPop including the LIBREF, path for data, location of documentation, location of the generic program to run to access the data for extract, searches and crosstabs. The metadata drives the online system.
Data-use restrictions became a problem. Information Core had to receive permission from all issuing agencies to re-distribute the data. Password access was needed for users not using the Penn State backbone. A system for storing that information and incorporating a check for valid password needed to be added to the programs.
Users show a clear preference for the PC network over the unix network. SodaPop data has been copied to the PC network for use with PC SAS directly. This eliminates the need to download and copy data. The PC data serves as a backup to the unix files. We can add more sensitive data to the system and utilize the network login and acls to control access, which was too problematic on the web. Users can take advantage of PC platform data conversion tools to use the data with their favorite software package.
In 2004, we were approached by the new Population Center at the University of Maryland with an offer to help prepare data for SodaPop in exchange for use of the system by their researchers. Also, in 2004 the Social Sciences division of Pattee library approached us to discuss the potential of support and collaboration.
SodaPop Revamp
In 2006, the entire site underwent a complete revamp. An updated version of SAS/Intrnet software was installed and the web-pages were reorganized using the Plone content management system. The SodaPop site now stands independent of the PRI webspace. Both web-accessible and nonweb-accessible data collections were combined into one data archive system, giving researcher one place to look for data. New system was switched over June 29, 2006.
