Ref: ENCODE - Terms and Definitions
Fraction of reads in peaks (FRiP) - Fraction of all mapped reads that fall into the called peak regions, i.e. usable reads in significantly enriched peaks divided by all usable reads. In general, FRiP scores correlate positively with the number of regions. (Landt et al, Genome Research Sept. 2012, 22(9): 1813–1831)
In ENCODE - ATAC-seq Data Standards and Prototype Processing Pipeline, FRiP score is calculated using tagAlign with intersectBed, and they use bamtobed covert BAM files to tagAlign. However, intersectBed can also use BAM files to count the intersection. Also, someone argued that one should use featureCounts to get accurate results (https://www.biostars.org/p/337872/#337890).
I compared different ways to calculate FRiP scores below.
1 | 1. prepare |
Equal read counts in peak regions were got either from BAM file or tagAlign file. Although counting from BAM consumes more time, one do not need to covert BAM to tagAlign.
And for featureCounts, the result is close to those from intersectBed, which is expected. featureCounts counts the number of fragments, while intersectBed counts the number of reads. But the number of reads is twice as big as the number of fragments (only considering properly mapped reads).
1 | # from intersectBed |
featureCounts may be more accurate when assigning reads spanning multiple features, but it may not be worthy.
So, in practice, I would like to use intersectBed to calculate the FRiP score from BAM files.
Change log
- 20190320: create the note.