FINAL REPORT

“Expanding a Web-Accessible Women’s Health Data Archive”

      
           
a. Specific Aims
   The stated specific aim, to merge two parts of the TREMIN Research Program on Women’s Health and create a single, web-accessible electronic data archive,
remains unchanged and has now been achieved.  Prior to the inception of this project, TREMIN had collected prospectively recorded menstrual diaries and annual 
health report surveys from 1934 to present, but because of difficulties gaining access to and linking together much of the TREMIN data, this rich resource has been 
under-utilized. To make the data accessible to as many researchers as possible, the data had to be cleaned, recoded and relabeled, and the variables linked across 
the years of study. In addition, it was necessary to provide documentation about the data, and the process for gaining access to the data had to be simplified. At the 
completion of this project, the data collected between 1934 and 1980 have now been incorporated with the post-1980 data to create an archive that can track the 
menstrual cycle patterns and reproductive health of over 5000 women who have been TREMIN participants, many of them from their late teens through their 
post-menopausal years.  By accomplishing our aim, we have created one of the largest, if not the largest, data set of its kind in the world and have made it 
accessible to qualified researchers.
      
      
b. Results
      
   Overview 

      The TREMIN data are a collection of more than 3.5 million data records over many decades.  The data consist of reports of menstrual bleeding recorded 
daily by participants on a menstrual calendar card (MCC) and responses to questions on an annual health report form (HRF) primarily regarding reproductive health 
status (e.g., births, birth control use, exogenous hormone use) and general health status (e.g., occurrence of diseases). Some of these records are single events, such 
as a bleed event on a single day for a participant.  Other records document the detailed response of a participant to a particular question on the HRF. While 
connecting these disparate sources of data has huge promise for a number of research questions, until this project, bringing together these separate data sources was 
too arduous for anything but small, manually created datasets. Indeed, the work to transform the TREMIN data into a coherent, easy-to-analyze dataset had to 
overcome a large set of obstacles.
      The long history of the TREMIN project includes periods when data collection and archiving were quite primitive by today’s standards.  Through the 
decades, as researchers learned more, the focus of data collection shifted with the most important questions of the era receiving the most attention in the data.  
Furthermore, data have been passed among 3 institutions, a handful of directors, and hordes of actual data handlers.  Additionally, the format of the data collection 
has changed considerably over time and therefore one of the major goals of this effort has been to transform these different formats into a single comprehensive 
format without losing any of the original content.  Most researchers involved during many of these periods are long gone so any questions we have had about the 
data we have to resolve by indirect means.  
      Despite these obstacles, although the full dataset contains a heterogeneous mix of data types, and with some variables, data integrity, the TREMIN dataset is 
an unprecedented and unique resource to researchers of women’s menstrual health and promises to be a goldmine of valuable information for decades to come.
      To minimize uncertainty and errors in the dataset, researchers on this project used several techniques.  The primary work was done with a combination of 
manual inspection of records and a suite of internal data checking scripts.  Errors were corrected, often by returning to original data sheets (sometimes dating to the 
1930’s), and making the data as internally consistent as possible.   Furthermore, because these data are 1) available on a public web site (although through 
password protection) and 2) some variables of the data could be used by an analyst to identify a participant (variables such as date of birth), we have made the 
decision to convert certain variables (for instance, birth date to age) and to withhold other variables to be made available upon special request. 
      However, we recognized that error reduction and internal consistency were not the only requirements for a useable database.  Our goal was to make the 
TREMIN database a vital resource to a broad range of researchers and analysts.  Therefore, in addition to this primary database creation, an analyst performed a 
number of data manipulations and checks on the dataset – much like our target audience will do.  This work was external to the primary database cleanup efforts 
with the aim to uncover inconsistencies in data format, further errors in data and even inconvenient database structures.  Furthermore, another target of this external 
work was to document the plethora of variables in the dataset in a manner most helpful to analysts.
      
   Creating the data archive

      Data sources
      The data for building the data base come from 4 distinct physical sources.  As a result each file had unique issues to be dealt with.
      The sources are 



      
      Data Files
      The preliminary data organization consisted of: