Keywords
learning causal structure, belief networks, genetic epidemiology, bioinformatics
Start Date
1-7-2012 12:00 AM
Abstract
Studies of the relation between genetic traits and cancer susceptibility are often inconclusive or conflicting. This is likely due to the challenges of accommodating multiple genetic and environmental risk factors using traditional analytic models. Each risk factor is likely to contribute to susceptibility through a combination of additive and non-additive interactions with other risk factors, and such interactions are not often addressed by conventional methods. Additionally, data from single studies rarely allow for conclusive identification of causal relationships in such complex systems. Yet, there is often a wealth of knowledge available from previous studies that could be brought to bear on the task of model building. In this paper, we review the potential applicability of Bayesian networks for learning causal relations from gene-environment-cancer data. We first describe the Bayesian network approach, including a variety of algorithms for learning the structure of the causal network from observational data. We then demonstrate application of the approach using a subset of data from a population-based study on bladder cancer in New Hampshire, USA. We find minor differences in the performance or results of different algorithms. However, we expect larger differences when these algorithms are applied to the large number of genes included in the full data set. Incorporation of prior knowledge will thus be a priority.
Overview of Bayesian network approaches to model gene-environment interactions and cancer susceptibility
Studies of the relation between genetic traits and cancer susceptibility are often inconclusive or conflicting. This is likely due to the challenges of accommodating multiple genetic and environmental risk factors using traditional analytic models. Each risk factor is likely to contribute to susceptibility through a combination of additive and non-additive interactions with other risk factors, and such interactions are not often addressed by conventional methods. Additionally, data from single studies rarely allow for conclusive identification of causal relationships in such complex systems. Yet, there is often a wealth of knowledge available from previous studies that could be brought to bear on the task of model building. In this paper, we review the potential applicability of Bayesian networks for learning causal relations from gene-environment-cancer data. We first describe the Bayesian network approach, including a variety of algorithms for learning the structure of the causal network from observational data. We then demonstrate application of the approach using a subset of data from a population-based study on bladder cancer in New Hampshire, USA. We find minor differences in the performance or results of different algorithms. However, we expect larger differences when these algorithms are applied to the large number of genes included in the full data set. Incorporation of prior knowledge will thus be a priority.