Generation of Prediction Intervals to Assess Data Quality in the Distribute System Using Quantile Regression
Distribute is a national influenza-like-illness (ILI) surveillance project that integrates data from multiple jurisdictions. Distribute works solely with summarized (aggregated) data. Timeliness of the data varies considerably between sites; for many sites data for each encounter date arrives piecemeal, spread over several days. This spread adds additional noise into the data received by the Distribute system. Systematic differences in the timeliness between sources of data can introduce bias into the indicator of interest, the ILI ratio. Quantile regression using the observed relationship between incomplete and complete data is used to calculate prediction intervals for complete data. Some sites have very narrow prediction intervals that indicate the ILI-ratio calculated from incomplete data approximates the complete data ratio very accurately. Other sites show considerable asymmetry.
2011 Joint Statistical Meetings Proceedings
Painter, Ian; Eaton, Julie; Revere, Debra; Lober, Bill; and Olson, Donald R., "Generation of Prediction Intervals to Assess Data Quality in the Distribute System Using Quantile Regression" (2011). SIAS Faculty Publications. 246.