VOISIN Greg
May, 19 , 2015
Some specific data structures are been developped by Bioconductor team:
ExpressionSet/methylSet: Matrix-like dataset plus experiment/sample/feature metadata.. SummarizedExperiment: Analogous to ExpressionSet, but features defined in genomic coordinates. GRanges :Genomic coordinates and associated qualitative and quantitative information, e. g., gene symbol, coverage, p-value.
S3 and S4 structures: particular structures in R to manage a complex data structure.
Lawrence M, Huber W, PagEs H, Aboyoun P, Carlson M, Gentleman R, Morgan M and Carey V (2013).
“Software for Computing and Annotating Genomic Ranges.”
PLoS Computational Biology, 9.
seqnames: define the chromosome
ranges: define the boundaries of the element ( gene, exons, TFBS…). start/end
strand : +/-/*
Note1: the start is always the left position and the end the right, even when the range is on the minus strand.
Note2: seqnames/interval/strand are used for Granges objects comparison.
Extra column of informations. accessor mcols()
GRanges objects are considered vector-like objects
storing a list of compatible GRanges objects.
Compatible : relative to the same genome same metadata columns
GRanges objects are considered list-like objects
Some functions work very well with this kind of list: elementLengths(GRangeList obj) shift() reduce() …