A selection of research conventions in which BIO3 is playing a key role, representing the aforementioned 3 themes, is given below.
|Robust Machine Learning Forests in Network Construction for Integrative Omics Analyses (Kristel Van Steen)||Regional – FNRS||1/07/14-30/06/18||The primary goal of this project is to develop an analysis workflow for integrated (>2) omics data analysis, while working out and implementing novel network-based representations of the multi-layered data system using Conditional Inference Forests or related machine learning approaches. The methods will be developed using available in-house data on healthy controls, in particular genome-wide SNP, transcriptome and methylome data, and validated with methodology comparative studies using simulated data. Practical analyses, including applications within particular disease contexts (bladder cancer and cardiovascular disease), will enable to address questions about the relative importance of the aforementioned data sources in characterizing control populations or explaining disease trait variations, and about the robustness of the biological networks when additional data (either individuals or variables) are accumulated.|
|Molecular drivers and markers of pancreatic cancer initiation and progression: a translational and multidisciplinary approach (Ingrid Struman)||Regional – TELEVIE||1/09/16-31/08/18||Pancreatic ductal adenocarcinoma (PDAC) is the fourth cause of cancer-related death worldwide, and recent epidemiological data predict that it will become the second cause in the next twenty years. The lack of diagnostic tools for early detection and the absence of clearly defined populations at risk limit the ability to find efficient biomarkers. Even when diagnosed early, the probability of metastatic spread throughout the body is high since cell dissemination precedes invasiveness in PDAC. Therefore, there is an urgent need for early diagnosis based on the understanding of the mechanisms driving tumor initiation and progression. The aim of the project is to identify molecular drivers and markers of initiation, progression and dissemination of PDAC. To achieve this goal, several teams will combine their expertise in development and phenotyping of animal models, in cancer cell biology, in bioinformatic analyses, and in the follow-up of human patients. More specifically, exosomal functions and signaling pathways that drive PDAC initiation and progression will be identified and validated. Results obtained with animal models will be exploited and translated in the search for markers of early-onset of primary PDAC tumor in humans and for the discrimination between pre and postmetastatic tumors|
|SysMedPC: Systems Medicine of Pancreas Cancer (Kristel Van Steen)||Regional – Research Credits FNRS||1/01/2017-31/12/2018||Pancreatic ductal adenocarcinoma (PDAC), the most common type of pancreatic cancer, could be considered an orphan disease because, until present, it has not been adopted by the pharmaceutical industry. Despite its relatively low population incidence, it is the deadliest cancer worldwide with <6% 5-year survival rate. Furthermore, this is the single cancer for which there has been little improvement in its fatal prognosis over the last decades despite the efforts done in tertiary prevention (treatment). One of the promising technologies in pancreatic cancer research is Next Generation Sequencing (NGS). The goal of this project is to sequence DNA (whole-exome sequencing - WES) and RNA for PDAC patients from the CHU-Liège (Belgium). These data will be used to provide molecular characterizations for PDAC, while adopting a systems view of the disease.|
|PDAC-xome: Exome Sequencing in Pancreatic Ductal AdenoCarcinoma (Kristel Van Steen)||Regional – TELEVIE||1/09/15-31/08/17||Pancreatic ductal adenocarcinoma (PDAC), the most common type of pancreatic cancer, could be considered an orphan disease because, until present, it has not been adopted by the pharmaceutical industry. Despite its relatively low population incidence, it is the deadliest cancer worldwide with <6% 5-year survival rate. Furthermore, this is the single cancer for which there has been little improvement in its fatal prognosis over the last decades despite the efforts done in tertiary prevention (treatment). One of the promising technologies in pancreatic cancer research is Next Generation Sequencing (NGS). In this project, we aim to whole-exome sequence (WES) PDAC patients who have been treated between 2007-2014 at the CHU-Liège, Belgium. We will use state-of-the art analytic tools and in-house developed novel methodologies to molecular characterize these patients and to compare them to a WES-similarity based matched control group. To overcome issues of power and interpretability, we will emphasize region-based (gene-centric) rather than single-marker driven approaches.|
|DESTinCT: Detecting statistical interactions in complex traits (Kristel Van Steen)||Regional – WELBIO||1/10/15-30/09/17||Model organisms indicate that heritability attributed to gene-gene interactions may be as high as 80% for certain traits. There is no reason to assume that this would not be the case for humans. The increased complexity of human biology compared to the biology of model organisms requires investing in sophisticated epistasis detection methods, creating consensus criteria for their evaluation, and bringing awareness about pros and cons of each method. Large-scale epistasis studies can give new clues to systems-level genetic mechanisms and a better understanding of the underling biology of human complex disease traits. Though many novel methods have been proposed to carry out such studies, so far only a few of them have demonstrated replicable results. Recently, we published a minimal protocol for large scale epistasis screening. This protocol is based on our knowledge to date about epistasis mapping. However, despite the efforts to improve the detection rate of genetic interactions, to integrate (prior) omics-based information into the analysis protocol, and despite attempts to reconcile statistical epistasis with biological epistasis, several problems remain unresolved in a satisfactory way. This project aims to tackle some of these problems: the problem of confounding factors such as those arising from shared genetic ancestry, and the problem of model-dependent meta-analysis strategies used in epistasis research. In addition, this project aims to develop a gene-centric approach to epistasis analysis, which will increase interpretability and replicability. Only when epistasis detection becomes routine practice, we will be able to show the impact of epistasis on personalized medicine, disease risk prediction, and evolutionary genetics.|
|EUPancreas – BM1204 (Nuria Malats)||EU – COST||14/12/12-13/12/16||Integration of omics data will consider the following aspects: (i) Optimization and standardization of methods for the omics analysis of pancreas tumoral and normal tissue samples; (ii) Establishment of standardized approaches for omics data deposit. To this end, the Web-Based Platform for Mining Pancreatic Expression Datasets will be used as a model and its potential extension to other omics data will be assessed; and (iii) Identifying and documenting the available algorithms for omics data integration.
Issues to be considered include: the data high dimensionality – small sample size problem, the inherently noisy nature of the data, the stability and reproducibility of the models, the incorporation of domain knowledge into the knowledge discovery process using innovative statistical and bioinformatics approaches.
WG2 prioritized the identification of pancreas cancer omics databases and omics-related data standardization or optimization and the evaluation of confidentiality issues and issues related to data sharing (WG2-1), and the integrated omics analysis (WG2-2).
|MLPM: Machine learning for personalized medicine (Karsten Borgwardt)||HORIZON2020 – ITN||1/01/13-31/12/16||MLPM is a Marie Curie Initial Training Network, funded by the European Union within the 7th Framework Programme. MLPM has started on January 1, 2013 and will be carried out over a period of four years. MLPM is a consortium of several universities, research institutions and companies located in Spain, France, Germany, Belgium, UK, Switzerland, Israel and in the USA. MLPM involves the predoctoral training of 14 young scientists in the research field at the interface of Machine Learning and Medicine. Its goal is to educate interdisciplinary experts who will develop and employ the computational and statistical tools that are necessary to enable personalized medical treatment of patients according to their genetic and molecular properties and who are aware of the scientific, clinical and industrial implications of this research.|
|Integration and interpretation of “omics” biological data via networks and conditional inference forests (Kyrylo Bessonov)||Regional Grants and Fellowships 2014||Vast amounts of biological “multi-omics” data on complex diseases call for the creation of new and efficient methods for data analysis in clinical and experimental settings and the interpretations of the results. Unfortunately the current pace of “-omics” data analysis methods is by far slower than the data generation pace. In my thesis work I propose an integrative multi-omics data analysis method based on conditional inference forests (CIFs) and a network inference framework for useful biological and clinical “knowledge extraction” in the context of human complex diseases. The resulting networks will elucidate a hidden structure existing in complex multi-source datasets (genotypes and expression data) and will provide classification of patients in relation to disease outcomes. The proposed analytical tool aims to increase our understanding of complex diseases and to make significant steps forward towards personalized medicine in the EU. Networks capture simultaneously all intrinsic interactions existing in data and allow application of advanced algorithms to find key drivers of disease, unknown pathways and identification of shared mechanisms. Our preliminary results show that the chosen hybrid CIF-network inference method is superior to other machine-learning based methods such as GENIE3, in terms of precision and accuracy and other performance metrics. Contrary to other feature selection methods based on information measure and impurity index such as Gini, the advantages of CIFs is that they provide a solid statistical framework and support of many diverse data types providing flexibility and wide applicability. The final outcome of this project is a CIF-based network construction analysis pipeline with an easy-to-use and well-documented software tool. Out of the proposed data analysis multi-omics methodology, we aim to build CIF-network tool that will give access to advanced data analysis techniques to non-experts and be useful in clinical setting.|
|Developing methodologies and a standard protocol for meta Genome-Wide Association Interaction (GWAI) studies (Elena Gusareva)||Regional Grants and Fellowships 2012||Meta-analysis for genome-wide association studies (meta-GWA) of complex traits and common diseases has been proven to be a successful approach for the identification of novel susceptibility loci and gene variants. However, there is still a large gap to bridge between our current knowledge about complex diseases and a full understanding of their genetic etiologies through the identification of relevant genetic players. Recent GWA and meta-GWA studies leave no doubt that the complexity of the underlying biology needs to be better accounted for during analysis. The biochemical networks underlying complex diseases naturally create dependencies among the genes in the network that is realized as epistasis or gene-gene interactions. Therefore, incorporating gene-gene interactions in disease association models via Genome-Wide Association Interaction (GWAI) studies is one way of reflecting underlying biological and biochemical disease mechanisms. The development of sound meta-analytic methodologies, acknowledging specific characteristics of and issues with gene-gene interaction studies, as well as the drafting of comprehensive protocols to promote good standard practice to perform meta-GWAI studies, are crucial to enhance the success rate and reproducibility of epistasis findings. Given the availability of an extensive biostatistics meta-analysis toolbox, it may be surprising that hardly any meta-GWAIs have been published as the core topic of the publication. This is in part explained by the absence of strict guidelines or best practices for epistasis analysis, and a number of methodological problems that still need to be resolved. The quest for optimal meta-GWAI strategies is a largely underdeveloped field and is the subject of this project.|
|Development of efficient large-scale gene-gene and gene-environment interaction screening methods (Kristel Van Steen)||Regional – FNRS||1/01/12-31/12/15||Current research is mainly focused on finding single genetic susceptibility factors in complex genetic diseases. This is mainly due to the lack of computer power and algorithms that allow the discovery of interacting factors in genome wide population studies, with optimal efficiency. However, the development of common disease results from complex interactions between numerous environmental factors and alleles of many genes. But even when performed at a smaller scale, the success rate of epistasis screens is rather disappointing. Several researchers in the field attribute this to the realization that, so far, often inadequate solutions have been given to complex statistical challenges. In this project, we will further develop and evaluate powerful methodologies to detect gene-gene and gene-environment interactions, complementing main effects screens. The efficient use and analysis of whole genome re-sequencing data will further help in elucidating actual causal variants or multiple rare variants to disease etiology. The developed paradigms will be translated and incorporated in a user-friendly IT environment, which facilitates easy and secure data accessing, data integration, data analysis and reporting. Efficient routines and algorithms will guarantee feasible computation times and will pave the way for large-scale integrated –omics analyses in the context of epistasis screening and beyond.|