Tutorial

gr1 and gr2 are GRanges objects

gr1 %^% gr2

gives a length(gr1) logical vector with values of TRUE or FALSE. TRUE if gr1 intersects some interval in gr2, FALSE otherwise.

gr1 %*% gr2

gives a length n ( where n <= length(gr1)*length(gr2)) of all pairwise overlaps of gr1 and gr2 with merged meta.data

gr1 %$% gr2

returns length(gr1) GRanges aggregating all metadata values of gr2 within gr1, taking mean if the meta data item is numeric and concatenating the value if it is a string

gr1 %Q% (expression)

subsets gr1 using indices or logical values resulting from expression

gr1 %Q% (gene== “EGFR”)

will return the subset of gr1 entries for whose metadata column $gene has “EGFR”

library(gTrack)
library(gUtils)

## tutorial on using the above operations

## set current working directory to the folder where the OSMIC TSV was saved in.
setwd("~/Downloads")

## load the load. Need to download cosmic data set - http://cancer.sanger.ac.uk/census
COSMICgenes <- read.delim("cosmicdata.tsv")

## save the gene symbols into a factor
geneSymbols <- COSMICgenes[,1]

## loading gene definitions which we can use to plot windows (GRanges)
genes = readRDS('files/genes.rds')

## this loads coverage data from cancer cell line (GRanges object, have to make it  into a gTrack)
cov = readRDS('files/coverage.rds')

## loading the GENCODE gene model gTrack
gt.ge = track.gencode()

# subset genes to just the genes in the COSMIC dataset
genez <- genes %Q%(gene_name==geneSymbols[1])

# subset the rest of the genes
for (i in 2:length(geneSymbols)) {genez <- c( genes %Q% (gene_name==geneSymbols[i]) , genez)}

# add the average coverage metadata column for each gene
genez <- genez %$% cov

gt.cov = gTrack(genezz , y.field = 'mean' , circles = TRUE , col = 'blue' , name = 'Cov')

# set window for plot -RB1 gene and genes around it in a 2e6 range
window = genez[genez$gene_name == "RB1"] + 2e6

plot(c(gt.ge , gt.cov) , window)