Presenter Information

Stacey C. Newman

Degree Name

Master of Computer Science and Systems (MCSS)

Department

Institute of Technology

Streaming Media

Location

UW Y Center

Start Date

21-5-2015 5:15 PM

End Date

21-5-2015 5:20 PM

Abstract

The predictive potential of the many large datasets being held in healthcare, financial markets, social media, etc. by separate entities is locked behind privacy constraints. These separate entities either cannot share their data with one another or it is against their interests to do so. The ability to produce powerful predictive models that leverage knowledge from these different data sources is restrained by an inability to do so without revealing the data.

In my talk, I will outline our proposed protocol in which two different entities can build one of the most popular machine learning modules, a linear regression model (a technique used throughout both industry and research communities), which leverages knowledge from both datasets without revealing either party's confidential data.

I will demonstrate how we plan to ensure protection of both parties' data throughout our protocols, building a more powerful model than either party could compute in isolation; more data leads to better models, producing benefits in many areas. A relevant example application is in healthcare: many hospitals hold a wide variety of data on patients, but privacy restrictions limit these institutions’ ability to leverage this data to build accurate predictive models that could aid in the financial and medical well-being of a patient.

We’re training models on a variety of public and private healthcare datasets to simulate this exact scenario and move toward unlocking the power behind these large datasets without compromising the privacy of individuals or the institutions that hold the data.

Share

COinS
 
May 21st, 5:15 PM May 21st, 5:20 PM

Private Predictive Modeling Power

UW Y Center

The predictive potential of the many large datasets being held in healthcare, financial markets, social media, etc. by separate entities is locked behind privacy constraints. These separate entities either cannot share their data with one another or it is against their interests to do so. The ability to produce powerful predictive models that leverage knowledge from these different data sources is restrained by an inability to do so without revealing the data.

In my talk, I will outline our proposed protocol in which two different entities can build one of the most popular machine learning modules, a linear regression model (a technique used throughout both industry and research communities), which leverages knowledge from both datasets without revealing either party's confidential data.

I will demonstrate how we plan to ensure protection of both parties' data throughout our protocols, building a more powerful model than either party could compute in isolation; more data leads to better models, producing benefits in many areas. A relevant example application is in healthcare: many hospitals hold a wide variety of data on patients, but privacy restrictions limit these institutions’ ability to leverage this data to build accurate predictive models that could aid in the financial and medical well-being of a patient.

We’re training models on a variety of public and private healthcare datasets to simulate this exact scenario and move toward unlocking the power behind these large datasets without compromising the privacy of individuals or the institutions that hold the data.