Machine Learning

Large scale, complex data sets arising in modern biology are often of a tremendous complexity, making their manual analysis infeasible. Computers are of paramount importance in this situation, and automated machine learning tools can be key to extracting information from high dimensional data. Frequent problems encountered in biological data analysis concern the recognition and identification of (noisy) patterns in large, high-dimensional data sets, the correlation of such patterns with biological or clinical phenotypes, and the prediction of phenotypes based on new data.

We develop and apply machine learning algorithms to analyze and classify large scale data sets, and develop predictive models for biological processes based on high dimensional experimental data. Our work ecompasses supervised and unsupervised machine learning tools for this purpose, and employ them in collaborative research projects to elucidate biological function.

Personalized Medicine: Predictive Patterns in Diagnosis and Treatment

Individual genes are known to correlate with certain phenotypic traits, for example, increased risk for specific diseases. One example out of many is sickle cell disease, a severe condition that is due to a single mutation in a hemoglobin gene.

For complex diseases, no single gene is responsible, but a combination of several to many genes ultimately cause the disease. An example for a complex disease is cancer, where for most cancers a combination of environmental factors and genetic predisposition underlies the development and progression of the tumor. If these patterns were known, they could be used not only for diagnosis or staging, but also to understand a particular patients disease in more detail, up to the point where we can tailor treatment of the disease in response to the individual patients genomic profile.

We work on the development of supervised and unsupervised methods to identify such predictive patterns in large scale data, pursuing the following aims:

  1. Identify genes and pathways that underly the disease, to better understand the ethiology of the trait under consideration.
  2. Identify predictive patterns, that can be used for diagnosis, but also to predict how a patient will respond to a particular treatment, or how the disease will develop in the future.
  3. Make personalized treatment recommendations, to choose from a set of available drugs the one that will be most efficient in a particular patient.

Network Inference

Network inference deals with the problem of reconstructing a gene regulatory or signal transduction network from observations of the networks behavior. Hence, based on mere observational data, for example simply over time, or after certain interventions, the question is to infer how this systems functions internally. This is a complicated inverse problem, that has attracted much attention in engineering ("system identification"), and has tremendous potential for applications in Biology. We work at the forefront of the development of new methods for network reconstrucion in biological applications, using, for example, Bayesian models or nonlinear dynamic systems.