Data manipulationΒΆ

R provides a number of data structures for storing genomic data, each with its advantages and drawbacks.

The most useful structures for this purpose are:

GRanges
Store ranges along with metadata, sequences and the coordaintes of the reference genome.
GRangesList
Store groups of ranges, with additional metadata belonging to the group.
data.table
Fast and efficient general-purpose container similar to data.frame, but with significant performance improvements.

In gUtils functions, we often manipulate the data to move between these data structures where one is more useful than another. A key example is in gr.findoverlaps, which converts the input GRanges into data.table objects to take advantage of the blazing fast foverlaps util. For the most part, these conversions should be invisible to the user.

However, often there are data structures conversions that may be useful to the end user. This includes unlisting GRangesList objects into GRanges, making data.table objects from GRanges, and binding together multiple GRanges or GRangesList objects, among others. This section will describe and demonstrate the functionality gUtils provides for manipulating these data structures.

ref19 <- readRDS(system.file("extdata","refGene.hg19.gr.rds", package="gUtils"))
gr <- GRanges(1, IRanges(c(2,5,10), c(4,9,16)), seqinfo=Seqinfo("1", 20))
dt <- data.table(seqnames=1, start=c(2,5,10), end=c(3,8,15))

grbind

## add metadata to one field
mcols(gr)$score = 3
## try to concatenate
c(gr,gr2)  ## ERROR
## with grbind
grbind(gr, gr2) ## SUCCESS. Adds NA for missing fields
## GenomicRanges::c does this already for GRangesList