The below is from fumbling through a bunch of help pages online.. the first page was this one that lists a bunch of tools (I ended up using bedtools fisher)
- dump out genomic locations for target and background set (I was using CpG sites in gene bodies on the 450k array).
- sort the bed files using ./sortbed -i from bedtools.
- create a genome file and sort it: "
- mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from hg19.chromInfo" > hg19.genome
- thanks to Samad's post
- then sort that file with plain sort
- download the tract of interest from usc genome table browser in bed format (from biostars)
- http://genome-test.cse.ucsc.edu/cgi-bin/hgTables?command=start
- select bed format
- sort the bed files using ./sortbed -i from bedtools again
- run
- bedtools fisher -a target -b download_tract -g hg19.sorted.genome
- bedtools fisher -a background -b download_tract -g hg19.sorted.genome
- use the output to compute a p-value that takes into account the background results (using hypergeometric test)
It would be nice if there was an app that would do this for me, and iterate a bunch of tracts. There is tools like GREAT but those seem to be gene centric.
This was in part inspired by this work on distal SNP-CpG relations - Lemire et al. describe there methods in the "SNP and CpG site annotations" section.
This Galaxy based tool - genomic hyperbrowser might be an easier way of doing things.
Update: the Forge tool seems to do what I want but for DNAse I sites only
This was in part inspired by this work on distal SNP-CpG relations - Lemire et al. describe there methods in the "SNP and CpG site annotations" section.
This Galaxy based tool - genomic hyperbrowser might be an easier way of doing things.
Update: the Forge tool seems to do what I want but for DNAse I sites only