Ref: ENCODE - Terms and Definitions
Fraction of reads in peaks (FRiP) - Fraction of all mapped reads that fall into the called peak regions, i.e. usable reads in significantly enriched peaks divided by all usable reads. In general, FRiP scores correlate positively with the number of regions. (Landt et al, Genome Research Sept. 2012, 22(9): 1813–1831)
In ENCODE - ATAC-seq Data Standards and Prototype Processing Pipeline, FRiP score is calculated using tagAlign
with intersectBed
, and they use bamtobed
covert BAM
files to tagAlign
. However, intersectBed
can also use BAM
files to count the intersection. Also, someone argued that one should use featureCounts
to get accurate results (https://www.biostars.org/p/337872/#337890).
I compared different ways to calculate FRiP scores below.
1 | 1. prepare |
Equal read counts in peak regions were got either from BAM
file or tagAlign
file. Although counting from BAM
consumes more time, one do not need to covert BAM
to tagAlign
.
And for featureCounts
, the result is close to those from intersectBed
, which is expected. featureCounts
counts the number of fragments, while intersectBed
counts the number of reads. But the number of reads is twice as big as the number of fragments (only considering properly mapped reads).
1 | # from intersectBed |
featureCounts
may be more accurate when assigning reads spanning multiple features, but it may not be worthy.
So, in practice, I would like to use intersectBed
to calculate the FRiP score from BAM
files.
Change log
- 20190320: create the note.