Basic purpose to access the LODES data

OnTheMap  (http://lehd.did.census.gov/led/datatools/onthemap.html) is a web-based, interactive mapping application released by the LEHD program at the US Census Bureau. The objective is to show where people work and where workers live on maps with companion reports on their age, earnings, and industry distributions. The underlying data (LEHD Origin-Destination Employment Statistics, LODES) are public-use data available for access and download on the Cornell VirtualRDC (http://www.vrdc.cornell.edu/news/?page_id=4), an internet-accessible computing environment dedicated to the exploration and development of synthetic data.

What can be downloaded and accessed

Since 2005, the U.S. Census Bureau has released multiple versions of the LODES data underlying the OnTheMap (http://lehd.did.census.gov/led/datatools/onthemap.html) application. This site holds

Version 5.0 LODES files are available for local computation only, the link above directs download users to the Census Bureau's website. Version 1 files are now no longer accessible on the VirtualRDC.

LODES data can be downloaded from the OnTheMap download area on the VirtualRDC. While we previously required login and password, we have stopped collecting that information. Data are stored in compressed CSV format. For your convenience, read-in programs for SAS, Stata, and MySQL can be found on the web at http://www.vrdc.cornell.edu/onthemap/programs/. Note that there is no guarantee that these programs read in the data correctly, although we have used them ourselves in the past. The data package consists of Origin-Destination data (OD), Residence Area Characteristics data (RAC), Workplace Area Characteristics data (WAC), and block-group level Quarterly Workforce Indicator data (QWI).

What users should know about the data

The place of residence counts are generated from a synthetic data model that conditions on disclosure-proofed place of work counts and other observable characteristics. Each of the implicate files available for OD and the RAC represents an independent draw from the synthetic data model. Detailed information on the full OTM data and the synthetic data model can be found in the data documentation (also see updates for OTM v3).

The U.S. Census Bureau wants to encourage use of the multiple implicates of the OTM data. LEHD Program research has found that three (3) implicates are usually sufficient to determine the extent to which the confidentiality protections affect the statistical results. Users who wish to explore the LODES data with additional implicates, please contact LEHD directly.

The base geography for version 3.0 of OnTheMap is TIGER 2006 Second Edition. An archival copy can be found here. Version 4.0 uses TIGER (TBD).

For further information on how to properly analyze multiply synthesized or imputed datasets, see or consult Sessions 8a and 8b of the online INFO~747 class at Cornell University's CISER at http://www.vrdc.cornell.edu/info747/2005/course_outline.html. For a more complete bibliography, consult the OTM Public data documentation's bibliography.

We ask that you provide comments, analysis, feedback and/or published papers. That information can be provided through the following list servers: The research and evaluation results will be used to enhance understanding and developmental efforts for future versions of OnTheMap and synthetic data in general. Specific questions regarding

Technical requirements

Downloading data and analyzing it on own computer

In order to analyze the data on their own computers, users need to bring their own statistical software, and depending on the analysis, significant memory. Access is through a regular Web browser in the OnTheMap Download Area (http://www.vrdc.cornell.edu/onthemap/data/). The programs are available on the VirtualRDC OTM website (http://www.vrdc.cornell.edu/onthemap/).

Where to get help

For further information and assistance, contact the VirtualRDC administrators (mailto:virtualrdc@cornell.edu).

Funding and disclaimers

The VirtualRDC is not affiliated with the US Census Bureau. All data made available at this facility are public-use data. The VirtualRDC is partially funded by NSF Grants #0427889 (http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0427889), #0339191 (http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0339191) and #9978093 (http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=9978093) and donations by Novell (http://www.novell.com/linux/) and Intel (http://www.intel.com).