New Data in the Federal Statistical Research Data
Centers
Melissa Ruby Banzhaf, PhDAdministrator, ARDC
Center for Economic StudiesU.S. Census Bureau
October 9, 2015
Overview
Background on Federal Statistical RDCs
Types of Data Available in the RDC (Emphasis on New Data)
How to Obtain Access to this New Data (and other data) in the RDCs
What are Federal Statistical Research Data Centers (RDCs)?
Secure computing labs where qualified researchers conduct approved statistical analysis on non-public data.
These data are collected by various government agencies (Census Bureau, NCHS, AHRQ, SSA, and more to come).
Established through an agreement between federal statistical agencies and a local research community.
Managed by the Census Bureau.
Federal Statistical Research Data Center Locations
The Atlanta Research Data Center
Located in the Federal Reserve Bank of Atlanta corner of 10th & Peachtree
Consortium Members Emory University University of Georgia Georgia State University Clemson University Federal Reserve Bank of Atlanta University of Alabama at Birmingham University of Tennessee – Knoxville Florida State University Georgia Institute of Technology
Types of Restricted Data Available
Economic Data Microdata on firms and establishments Business Register data
Demographic Data Survey data on individuals and households Administrative data on individuals Linked survey and administrative datasets
Employer-Employee Jobs Data (LEHD) Data on employees linked with data on employers
Health Data National Center for Health Statistics Agency for Healthcare Research & Quality
Advantages of Restricted Data
Vast number of business datasets that are not publicly available at the micro level
Census datasets can be linked together Census datasets can be linked to external data More detailed level of geographic identifiers Very little top or bottom-coding
Economic DatasetsAnnual Survey of ManufacturesCensus of ConstructionCensus of Finance and InsuranceCensus of ManufacturesCensus of MiningCensus of Real EstateCensus of RetailCensus of ServicesCensus of TransportationCensus of WholesaleSurvey of Business OwnersCommodity Flow SurveyImport and Export TransactionsAnnual Capital Expenditures Survey
Business Register (SSEL)Longitudinal Business Database Manufacturing Energy Consumption Survey Medical Expenditure Panel Survey, Insurance ComponentNational Employer SurveyPollution Abatement Costs and ExpendituresQuarterly Financial ReportsResearch and Development SurveySurvey of Manufacturing TechnologyAnnual Retail/Wholesale Trade SurveysKauffman Firm Survey
New Data – Management and Organizational Practices Survey
Supplement to the 2010 Annual Survey of Manufactures Goal: Collect information on establishment’s use of
structured management practices 36 questions:
16 Management (monitoring, targets, and incentives)
13 Organization (who makes decisions, data in decision-making)
7 background (number of managers/non-managers, union status)
Permits analysis of relationship between management practices and key economic outcomes (e.g., productivity)
Demographic Datasets - Survey
Decennial Surveys (1950-2010) American Community Survey Current Population Survey Survey of Income and Program Participation American Housing Survey National Survey of College Graduates National Crime Victimization Survey
New Data - Decennial
1950 – 1% PUMS sample Geography: Census tract but lowest level is
enumeration district (roughly 600 people) 1960 – 25% sample (densest ever)
Geography: Census tract and other sub-county geographies (Census place) but lowest level is enumeration district (roughly 600 people)
Harmonized coding across 1950 and 1960
New Data – Current Population Survey
CPS Basic Monthly Data (2000-2014) CPS Food Security Supplement (2001-2012) CPS Voting and Registration Supplement
(2006, 2008, 2010, 2012) CPS Fertility Supplement (1998, 2000, 2002,
2004, 2006, 2008, 2010, 2012)
New Data – Current Population Survey
Characteristics of Internal Files: Geography: Census Tract March CPS is only file that has PIKs Has CPS identification key so may be able to link
across CPS surveys. Some limitations on types of analysis permitted by
BLS.
New Data – National Crime Victimization Survey
National survey of households (2006-2012) Collects information on frequency,
characteristics, and consequences of criminal victimization (sexual assault, robbery, burglary, motor vehicle theft etc.)
New: Public Police Contact Survey (2011) – Collects information on perceptions of police behavior and response during encounters.
New Data – National Survey of College Graduates
Biennial survey collects information (such as occupation, work activities, salary, relationship between degree field and occupation) on college-educated individuals with particular emphasis on those in science and engineering fields.
2010 currently available Geography at state level Currently no PIKs
Demographic Datasets -Administrative
Census Numident File (SSA) Housing Datasets (HUD):
Public and Indian Housing Information Center Dataset
Tenant Rental Assistance Certification Systems dataset
Computerized Homes Underwriting Management System
Demographic - Administrative Continued
Medicare/Medicaid Datasets (CMS): Medicare Enrollment Database Medicaid Statistical Information System
Administrative – Census Numident
Data derived from applications for Social Security Numbers
Contains data on: Birthdate Town or county of birth Gender Race Citizenship Date of death PIKs
Administrative - Housing Public and Indian Housing Information Dataset
Contains information on all members of HH with a participant in a covered program: Housing Choice Voucher Public Housing Indian Housing
Includes age, race, sex, rent, household income, PIK
Geography: block level
Administrative - Housing Tenant Rental Assistance Certification Systems
(TRACS) dataset Contains information on all members of HH with a
participant in a covered program. These programs provide rental assistance for
participants living in privately-owned, subsidized housing.
Includes age, race, sex, rent, household income, PIK
Geography: block level
Administrative - Housing Computerized Homes Underwriting
Management System (CHUMS) Contains records on approved mortgage
applications insured by Federal Housing Administration (FHA)
Contains information on borrowers and co-borrowers including income, housing value, mortgage, demographic characteristics, PIKs
Geography: block level
Administrative - CMS
Medicare Enrollment Database (1999-2014) Information on all Medicare beneficiaries Limited to information on people not claims:
eligibility dates and statuses, residence change dates, basic demographic information, PIKs
Geography: block level
Administrative - CMS
Medicaid Statistical Information System (2000-2013) Information on all Medicaid and CHIP enrollees in
each month Limited to information on people not claims:
eligibility dates and statuses, basic demographic information, PIKs
Geography: zip code level
Demographic Datasets: Linked Survey-Administrative
Current Population Survey - SSA Earnings Files Survey of Income and Program Participation –
SSA Earnings Files National Longitudinal Mortality Study
Linked: SSA Files with CPS and SIPP
CPS and SIPP Survey Data matched to SSA earnings files by PIK
SSA records include: Detailed Earnings Record – earnings from FICA, non-FICA, and
self-employment income (1978+) from Master File Summary Earnings Record – all earnings for each year from
1951 to present Master Beneficiary Record – contains information (entitlement
and payment data) on Social Security Recipients (including Disability).
831 Disability File – determines medical eligibility for Disability Insurance, and SSI benefits.
Linked: National Longitudinal Mortality Study
Purpose of database: to study the effects of demographic and socio-economic characteristics on mortality
Survey data: March CPS, 1980 Decennial Census (sample)
Administrative data: Death Certificate information from National Death Index (through 2011)
Geography: county level
LEHD
“Tracks” a person based on their place of employment; essentially links employees with employers
Based on unemployment insurance administrative records Available on a state-by-state basis Quarterly data starting in 1990 – currently through 2011 Can link employer to employer data in other Census datasets Can link employee to data on individuals in other Census
datasets New Variables: Firm age and size, Firm ID that matches
Business Register
New Data – Innovation Measurement Initiative
Goal: Improve measurement of innovation resulting from research grants, a small but important sector of the economy.
How: Integrate university data on federally funded research grants with Census Bureau data on people and businesses.
Specifically link: Employee, vendor, sub-award transactions to the Census Business
Register and LEHD (employee-employer database).
Innovation outcomes: Job placements, start-up activity and business dynamics, vendor characteristics
New Data – Innovation Measurement Initiative
Partnership between Census and Institute on Research in Innovation and Science (IRIS) at the University of Michigan
Member institutions of IRIS provide data to Census and in turn receive: Individual and collective reports Underlying tables and graphics for institution’s use Access to aggregate data for researchers Input on new product design
New Data – IMI Opportunity
Census is asking for nominations of teams of 2-5 researchers (at least one member with SSS) to assist in enhancing and documenting data for the IMI project.
What is in it for you? Opportunity to do research on new data. $25K in funding support for 1 graduate student.
Initial deadline for nominations: October 16
Health Data in the ARDC
These data are collected by: National Center for Health Statistics (NCHS) Agency for Healthcare Research and Quality
(AHRQ)
What types of NCHS data?National Health Status Surveys• National Health and Nutrition
Examination Survey (NHANES) I, II, and III• National Health Interview Survey (NHIS)• Longitudinal Study on Aging I and II
(LSOA)• National Survey of Family Growth• National Survey of Children's Health• National Survey of Early Childhood
Health• National Survey of Children with Special
Health Care Needs• National Asthma SurveyNational Health Care Surveys• National Ambulatory Medical Care
Survey
• National Hospital Ambulatory Medical Care Survey
• National Survey of Ambulatory Surgery• National Hospital Discharge Survey• National Nursing Home Survey (NNHS)• National Home and Hospice Care Survey• National Employer Health Insurance
Survey • National Health Provider Inventory• National Immunization SurveyVital Statistics• Mortality and Multiple Mortality • Birth• Fetal Death• National Death Index• Marriage and Divorce
What types of NCHS data?
Linked Data Sets Linked mortality data: NHIS, NHANES LSOA II, NNHS
Linked Medicare Enrollment and Claims data: NHIS, NHANES, LSOA II
Linked Social Security Administration Data: NHIS, NHANES, LSOA II, NNHS
Linked EPA data
What types of AHRQ Data?
Medical Expenditure Panel Survey (MEPS) files include: Household Component Provider Component Insurance/Employer Component Nursing Home Component (1996 only) Area Resource File Two-year two panel file MEPS-NHIS linked data
Only Household Component and portions of Provider Component are publicly available
35
How to Access the RDC
Develop proposal Different guidelines for Census data vs.
NCHS/AHRQ guidelines Submit proposal for agency review
Census (and agency sponsors) NCHS/AHRQ
Obtain Special Sworn Status (SSS) Pay one-time fee for NCHS/AHRQ data
Timeframe – “Patience is a Virtue”
Census Data Plan on 6 to 9 months before working in lab Census approval/ Other Agency Approval
NCHS/AHRQ Data Timeframe dependent on agency approval
process Census approval NOT required
Special Sworn Status 3 to 4 months for your security clearance
Working in the ARDC lab
All analysis conducted in the ARDC lab Data located on server in Maryland Access data via thin client terminals
No internet access or personal computers allowed in lab
Statistical software available: SAS, Stata, R, Matlab, GIS, Sudaan, etc.
Agency reviews output before releasing Penalty for disclosure is $250,000 and/or 5 yrs in
prison (inadvertent or otherwise)
Upcoming RDC-Related Events
Cornell University Course – INFO 7470 – Understanding Social and Economic Data Can be connected via distance learning (and get
course credit) Intended for Ph.D. students and faculty who use
large-scale restricted-access data from government suppliers
Emphasis on data accessible through the RDC network
Interested? Contact us for more information.
Contact Information
People: Melissa Ruby Banzhaf, ARDC Administrator
[emailprotected], 404-498-7538
Julie L. Hotchkiss, ARDC Executive Director
[emailprotected], 404-498-8198
Resources: ARDC website: atlantardc.org Quarterly ARDC Newsletter (email us to get on
list)
mailto:[emailprotected]
mailto:[emailprotected]