
Dr. Margaret J. Eppstein
Department of Computer Science
Date: Monday, September 25, 2006
Time: 12:20 p.m. - 1:10 p.m.
Location: 367 Votey
Abstract
There are estimated to be on the order of 106
single nucleotide polymorphisms (SNPs) existing as
standing variation in the human genome.
Certain combinations of these SNPs can
interact in complex ways to predispose individuals for a variety of common
diseases, even though individual SNPs may have no ill
effects. Detecting these epistatic combinations is a computationally daunting
task. Trying to use individual or
growing subsets of SNPs as building blocks for
detection of larger combinations of purely epistatic SNPs (e.g., via genetic algorithms or genetic programming)
is no better than random search, since there is no predictive power in subsets
of the correct set of epistatically interacting SNPs. Here, we
explore the potential for hill-climbing from the other direction; that is, from
large sets of candidate SNPs to smaller ones. This approach was inspired by Kauffman’s
“random chemistry” approach to detecting small auto-catalytic sets of molecules
from within large sets. Although the
algorithm is conceptually straightforward, its success hinges upon the creation
of a fitness function able to discriminate large sets that contain subsets of
interacting SNPs from those that don’t. Here, we employ an approximate and noisy
fitness function based on the ReliefF data mining
algorithm. Preliminary results from synthetic data sets show that the resulting
algorithm can detect epistatic pairs from up to 1000
candidate SNPs in O(log N) fitness
evaluations, although success rate degrades as heritability declines. The
results presented herein are offered as proof of concept for the random
chemistry approach, and suggestions are proposed for future improvements that
will extend the approach to larger data sets and to lower heritabilities.