Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we assemble multiple data sources, including gene expression data, genome-wide binding data, gene ontology, and known pathways, and use a supervised learning framework to compute prior probabilities of regulatory relationships. We show that our integrated method improves the accuracy of inferred gene networks as well as extends some previous Bayesian frameworks both in theory and applications. We apply our method to two different human cell lines, namely skin melanoma cell line A375 and lung cancer cell line A549, to illustrate the capabilities of our method. Our results show that the improvement in performance could vary from cell line to cell line and that we might need to choose different external data sources serving as prior knowledge if we hope to obtain better accuracy for different cell lines.
Journal of Computational Biology
Open Access Status
Hung, Ling-Hong; Yeung, Ka Yee; Liang, Xiao; Young, William Chad; and Rafferty, Adrian E., "Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data" (2019). School of Engineering and Technology Publications. 372.