Case study

In order to estimate the accuracy of methylation site identification and the advanced analysis results from WBSA, we downloaded the published dataset from NCBI, and one sample's dataset (SRA accessions SRX006782, 447M reads) was used. The data is from Lister et al., which presents the first genome-wide, single-base resolution maps of methylated cytosines in a mammalian genome, from both human embryonic stem cells and fetal fibroblasts. It took about one week for WBSA to complete all calculations and provide results.

We compared our annotation result with the Lister paper and observed good consistency. The bisulfite conversion rate that WBSA estimated is 99.7%, which is almost the same as in the paper (99.3%). For the identified methylcytosines, non-CG accounts for more than 20% of all methylated cytosines, which is consistent with the published data (Figure 1).The methylation level distribution shows that most of mCG is highly methylated, consistent with published results (Figure 2). Furthermore, we did not find local sequence enrichment for mCG, but did find a preference towards TA dinucleotides upstream of non-CG methylated regions. Furthermore, the base following a non-CG methylcytosine was most commonly an A, with a T also observed relatively frequently. Furthermore, the base following a non-CG methylcytosine was most commonly an A, with a T also observed relatively frequently. This is the same as the preference in the paper (Figure 3). We also observed that the methylcytosine distribution for all the chromosomes has almost the same shape as that in the Lister’s paper (Figure 4).

Figure 1
Figure 2
Figure 3
Figure 4
Chr1 Chr2 Chr3 Chr4
Chr5 Chr6 Chr7 Chr8
Chr9 Chr10 Chr11 Chr12
Chr13 Chr14 Chr15 Chr16
Chr17 Chr18 Chr19 Chr20
Chr21 Chr22 ChrX ChrY