Managing the Knowledge

New high-throughput systems biology studies are rapidly accumulating huge amounts of ‘omics’-scale data in addition to all information that is stored in multiple different databases (information on genes, proteins, phenotypic information, public data, literature, etc.). Due to the heterogeneity and the vast amount of data, it is a great challenge to extract as much useful information and knowledge as possible, as well as to find a way to communicate the findings in a preferably intuitive and efficient way. To maximize the gain of knowledge out of all this information it is necessary to combine different levels of information and mine it for patterns, for this existing knowledge must be semantically integrated and dynamically organized into structured networks that are connected with experimental data (see following figure).

Figure1Knowledge Management in Systems Biology

True semantic integration requires the mapping ofequivalent meaning and objects across all information types and relies on the initial detailed manual description of these concepts according to standardized vocabularies (for more information see Maier et al. 2011). The BioXMTM software, a user-configurable semantic knowledge management software framework was used to configure the STATegra KB.

The STATegra KB is a semantically integrated repository of general knowledge, prior knowledge from public resources on B-cell differentiation together with public and project specific experimental data. Prior knowledge includes among others thousands of objects or molecular elements (genes, proteins, metabolites, regulators, etc.) from different organisms, millions of connections (e.g. protein-protein interactions (PPI), transcription factor-target relations) and several levels of modularity (e.g. signal pathways or sub-cellular localization). In addition, this network involves context and quality information (e.g. evidences of functional relations) and information about gene homologs (mouse, rat, human). Furthermore, genomes with features and their coordinates were incorporated to enable NGS peak to gene associations. Finally, cell types related to B-cell differentiation as well as ontologies (GO, NCI Thesaurus, Mouse Anatomy, etc.) represent and map super-molecular information. This pre-existing, integrated and dynamically organized knowledge serves as structured background network to which experimental data are semantically mapped to enable complex integration and analysis approaches. Experimental data sets include several public data sets around Ikaros and the B-cell differentiation system and project specific data.

A browser based graphical user interface, the Knowledge Base (KB) Portal containing pre-structured queries, views and reports, was configured on top of the BioXM system to enable easy access to the semantically mapped information. Also functions such as document management (containing all documents related to the STATegra project together with related information) as well as overviews about work packages and project participants were integrated here. The integration and semantic mapping of heterogeneous prior knowledge and data creates extremely information rich structures. Visualization is a key aspect of coping with complexity in such information- and knowledge- rich scenarios and can help researchers in the interpretation of complex analysis results. The challenge is to create clear, meaningful and integrated visualizations that present the data at the right level of detail, in a cohesive, insightful manner without overwhelming the researcher by the intrinsic complexity of the data.
An aim of the KB is therefore also to develop appropriate visualization structures to represent the different types of information contained in the KB. It will mainly incorporate visualization tools based on biological process, network and pathway representations such as the visualization of expert knowledge at the cellular process level, integrative local molecular networks  or the collection of multiple sources of data and information (functional information, experimental data, regulators, etc.) at gene level in form of gene cards.

Future developments will focus on improving the quality of the specific knowledge and extending the integrated data sources related to the system under study. The knowledge base interface will be adapted and extended continuously based on the feedback of individual partners and their specific use cases.

Figure2

 

 B-cell differentiation graph with visualization of analysis results. Size and color of the shown transcription factors indicate their importance for the corresponding cell differentiation stage

 

 

Figure3

 

Ikaros local network based on PPI and TF-target relations with overlaying data analysis results: Gene expression in form of heatmaps, ChIP-Seq evidence indicated by coloring of the gene symbols, protein quantification indicated by coloring of the protein symbols

EU Project

Contact

Website: http://stategra.eu/
Email: info@stategra.eu

Privacy & Terms

All images, figures and contents are copyrighted to Stategra Project Partners.