Epiviz(r)

Turning a genome browser into a display device

Hector Corrada Bravo
Center for Bioinformatics and Computational Biology, University of Maryland

Our motivation

Measuring DNA methylation and understanding role in expression regulation in solid tumors

  • Hansen, et al., Nat. Genetics, 2011
  • Corrada Bravo, et al., BMC Bioinformatics, 2012
  • Timp, et al., Genome Medicine, in press.

Our motivation

Measuring DNA methylation and understanding role in expression regulation in solid tumors

  • Hansen, et al., Nat. Genetics, 2011
  • Corrada Bravo, et al., BMC Bioinformatics, 2012
  • Timp, et al., Genome Medicine, in press.

Our motivation

Measuring DNA methylation and understanding role in expression regulation in solid tumors

  • Hansen, et al., Nat. Genetics, 2011
  • Corrada Bravo, et al., BMC Bioinformatics, 2012
  • Timp, et al., Genome Medicine, in press.

Our motivation

Measuring DNA methylation and understanding role in expression regulation in solid tumors

  • Hansen, et al., Nat. Genetics, 2011
  • Corrada Bravo, et al., BMC Bioinformatics, 2012
  • Timp, et al., Genome Medicine, in press.

What we wanted

  • Data transformation and modeling: data smoothing, region finding (Bsmooth, minfi)
  • Genome browsing: search by gene, search by overlap
  • Region analysis: overlap with other data (our own, other labs, UCSC, ensembl)
  • Regulation: expression data (Gene Expression Barcode)

Analysis era

  • Funding agencies calling for proposals to (strictly) analyze project data
    • Epigenomics roadmap, Encode, TCGA, ...
  • Journals calling for (strictly) analysis papers (e.g., Nature Methods)
  • We have unprecendented ability to measure
  • and lots of publicly available data to contextualize it

Analysis era

  • Funding agencies calling for proposals to (strictly) analyze project data
  • Epigenomics roadmap, Encode, TCGA, ...
  • Journals calling for (strictly) analysis papers (e.g., Nature Methods)
  • We have unprecendented ability to measure
  • and lots of publicly available data to contextualize it
[H. Wickham]

Integrative, visual and computational exploratory analysis of genomic data

  • Browser-based
  • Interactive
  • Integration of data
  • Reproducible dissemination
  • Communication with R/Bioc: epivizr package

I want to use a genome browser track as a display device in R!!

e.g.: http://epiviz.cbcb.umd.edu/?ws=45KBV4C7z3u

[Nat. Methods, in press]

Communication with R/Bioc

Using the epivizr package

  • Setup up an epivizr session
library(epivizr)
data(tcga_colon_example)
mgr <- startEpiviz(workspace="qyOTB6vVnff")
  • Add a device with GRanges data
blocks_dev <- mgr$addDevice(colon_blocks, "450k blocks")
  • Subset ranges by width
keep <- width(colon_blocks) > 250000
mgr$updateDevice(blocks_dev, colon_blocks[keep,])

Communication with R/Bioc

Using the epivizr package: browse by regions of interest.

  • What's around the widest blocks?
o <- order(-width(colon_blocks))
slideShowRegions <- colon_blocks[o[1:10],]
slideShowRegions <- slideShowRegions + 1e5
mgr$slideshow(slideShowRegions)
  • Close session
mgr$stopServer()
  • More info in the epivizr vignette:
browseVignettes("epivizr")

epivizr uses WebSockets for connection, same as shiny. Big, big, big thanks to the @rstudio folks for working on this infrastructure.

Plugins, plugins, plugins

This is how we integrate different data types and add new visualizations.

see: https://gist.github.com/11017650

epiviz.plugins.charts.MyTrack.prototype.draw = function(range, data, slide, zoom) {
  epiviz.ui.charts.Track.prototype.draw.call(this, range, data, slide, zoom);

  // If data is defined, then the base class sets this._lastData to data.
  // If it isn't, then we'll use the data from the last draw call.
  // Same with this._lastRange and range.
  data = this._lastData;
  range = this._lastRange;

  // If data is not defined, there is nothing to draw
  if (!data || !range) { return []; }

  // Using D3, compute a function that maps base-pair locations to chart pixel coordinates
  var xScale = d3.scale.linear()
    .domain([range.start(), range.end()])
    .range([0, this.width() - this.margins().left() - this.margins().right()]);

Plugins, plugins, plugins

see: https://gist.github.com/c41a2df3671395d8e4ad

goog.provide('epiviz.plugins.data.UCSCDataProvider');

epiviz.plugins.data.UCSCDataProvider = function (id, endpoint) {
  epiviz.data.DataProvider.call(this, id || epiviz.plugins.data.UCSCDataProvider.DEFAULT_ID);

  this._endpoint = endpoint;

  this._refGene = new epiviz.measurements.Measurement(
    'refGene', // The column in the data source table that contains the values for this feature measurement
    'refGene', // A name not containing any special characters (only alphanumeric and underscores)
    epiviz.measurements.Measurement.Type.RANGE,
    'refGene', // Data source: the table/data frame containing the data
    'ucsc_refGene', // An identifier for use to group with other measurements from different data providers
    // that have the same seqName, start and end values
    this.id(), // Data provider
    null, // Formula: always null for measurements coming directly from the data provider
    'any', // Default chart type filter

Plugins, plugins, plugins

Datatypes

  • Based on "three-table" design
  • Scripts can define coordinate space

Project status

What's coming very soon

  • Standalone version (no internet required, javascript code provided in epivizr)
  • Browse your favorite genome:
library(epivizr)
library(Mus.musculus)

mgr <- startStandalone(geneInfo=Mus.musculus, geneInfoName="mm10",
                          keepSeqlevels=paste0("chr",c(1:19,"X","Y")))
  • Support for BigWigFile, BamFile through epivizr (initially targeted to RNA-seq workflows)

Check it out:

Acknowledgements

Florin Chelaru, UMD

  • CBCB@UMD: my group
  • JHU/Harvard: Kasper Hansen, Winston Timp, Rafael Irizarry, Andy Feinberg
  • Genentech: Michael Lawrence
  • Rstudio: Joe Cheng, et al.
  • Funding: NIH, Genentech