Jonathan Laperle, Simon Hébert-Deschamps, Joanny Raby, David Bujold, David Morais, Michel Barrette, Guillaume Bourque and Pierre-Étienne Jacques
Summary
The Epigenomic Efficient Correlator (EpiGeEC) tool, aimed at efficiently performing pairwise correlate of thousands of epigenomic datasets, has been used over the last year to pre-calculate correlation matrices that are incorporated to the IHEC Data Portal (http://epigenomesportal.ca/ihec/) (Bujjold et al.). GeEC was also proven useful for some members of IHEC (Breeze et al.) to demonstrate that even if the experiemental procedures are not necessarily consistent through all projejcts, the generated datasets are still overall highly comparable since they tend to cluster based on the assay type and sample cell type, rather than on the producing consortium. Moreover, the correlation data were used to identify potentially mislabeled or problematic datasets, as part of a quality control pipeline implemented in the IHEC Data Portal.
In addition to visualizing the pre-compued correlation scores from the Data Portal, users can now also compare their own epigenomic datasets to IHEC ones using a public version of EpiGeEC recently launched. The could be useful, for instance, to help in the characterization of datasets, or as a quality control. The various features of EpiGeEC, integrated into the Galaxy framework of the Genetics and genomics Analysis Platform (GenAP, genap.ca) project, include the support of any genomic file formats (bigWig, WIG, bedGraph, BAM), the possibility to compute correlations at different metrics (e.g. Pearson Spearman), on different subsets of regions (e.g. genes, TSS, user-defined) or the complete genome, and the clustering algorithms to display the results as an annotated heatmap and/or dendrogram. We also provide a user-friendly interface facilitating the selection of the desired datasets, and we plan to offer datasets from model organisms generated by international consortia such as modENCODE as well as data downloaded from GEO/SRA and uniformly processed. We will present the design and implementation of EpiGeEC as well as a performance comparison with other tools and some of the key results obtained so far.