Generation of Prediction Intervals to Assess Data Quality in the Distribute System Using Quantile Regression

Publication Date


Document Type

Conference Proceeding


Distribute is a national influenza-like-illness (ILI) surveillance project that integrates data from multiple jurisdictions. Distribute works solely with summarized (aggregated) data. Timeliness of the data varies considerably between sites; for many sites data for each encounter date arrives piecemeal, spread over several days. This spread adds additional noise into the data received by the Distribute system. Systematic differences in the timeliness between sources of data can introduce bias into the indicator of interest, the ILI ratio. Quantile regression using the observed relationship between incomplete and complete data is used to calculate prediction intervals for complete data. Some sites have very narrow prediction intervals that indicate the ILI-ratio calculated from incomplete data approximates the complete data ratio very accurately. Other sites show considerable asymmetry.

Publication Title

2011 Joint Statistical Meetings Proceedings

First Page


Last Page


This document is currently not available here.

Find in your library