Bioinformatics and Statistical Data Analysis

Modern experimental high-throughput platforms generate large datasets, for which sophisticated analysis tools are required. We develop computational and statistical methods to normalize and analyze complex data, from basic statistical processing to the downstream bioinformatics analysis. We work with a wide range of different data types, stemming, for example, from Next Generation Sequencing, genome wide image based RNAi Screening, Gene Expression Microarrays, aCGH, Mass Spectrometry, Flow Cytometry, Life Cell Imaging, Yeast-2-Hybrid screening, Western and Northern Blotting, and other platforms.

Projects are typically carried out in close collaboration with experimental groups, and methods developed are directly applied to solve relevant biological problems. We furthermore offer Statistical consulting as a service to members of the Medical Faculty of Greifswald University and to collaboration partners.

High Throughput Sequencing

Next generation sequencing platforms enable the deciphering of full genomes, of genome wide expression, genomic aberration or methylation patterns in a cost-efficient, high-throughput manner. Data coming from NGS platforms require special algorithms for the alignment and/or mapping of the short sequence reads, and sophisticated tools are needed for the downstream bioinformatics analysis of the vast amounts of data generated in large scale sequencing projects.

In close collaboration with biological groups and sequencing facilities, we develop analysis methods for sequencing projects, and apply them to data from collaboration partners. Questions addressed in our group include, for example, the identification of genomic aberrations in cancer, of point mutations underlying developmental disorders, species identification from their mitochondrial DNA, and identification of ageing-associated methylation changes.

RNAi Screening

Genome-wide RNAi knockouts permit the functional characterization of individual genes by studying their effect on a particular phenotypic trait under consideration. Knock-outs are carried out on a high-throughput experimental platform, read-outs are gathered automatically by HT microscopy. These microscope images are then fed into a bioinformatics pipeline, involving image recognition, quality control, statistical data analysis and automated mapping to pathways and gene ontologies.

In close collaboration with our experimental and image recognition partners, we develop tools for the statistical and bioinformatics analysis of these data, providing quantitative assessments of the effect a particular gene knockout has. This then provides the basis for further modeling.

Gene Expression Data

Gene Expression Microarrays allow it to measure large scale mRNA profiles. The processing of DNA Chip data requires sophisticated normalization and analysis tools for robust identification of differentially expressed genes and pathways.

Gene expression data have been shown to correlate with therapy response and patient survival in many cancers. Such data can assist a clinician in evaluating treatment options, and genes found to correlate with survival may hint at novel targets for drug design.

However, due to the high dimensional nature of the experimental data, computational tools are needed for data analysis. We develop and apply normalization procedures and regularized prediction methods using Gene Expression Array data.

Multi-OMICs data integration

Integration of data stemming from different experimental platforms is becoming increasingly difficulty. Vast amounts of data are being generated, but interpreting and analyzing this data and integrating it with data from others studies and other platforms is becoming more and more problematic.

We develop and apply bioinformatics tools for multi-OMICs data integration, for example by integrating OMICs data with protein interaction and pathway information, thus mapping available experimental information to the underlying biological processes. Sophisticated computational algorithms are required for this task, to maximally extract information from experiments.