1.How to use WBSA to analyze RRBS data?
1.1  Upload raw data

RRBS starts with raw data uploading and only accept single-end data (either T-rich file or A-rich file). Raw data should be FastQ format and compressed (*.zip or *.tar.gz). The compressed file should be no more than 2GB if use HTTP model, otherwise please use the FTP model. Input file format:

FastQ format


1.2 Quality analysis of the raw reads

FastQC software is integrated to analyze quality of bisulfite sequencing data. The result includes quality distribution, nucleotide distribution, GC content distribution and overrepresented sequences identification. See demo.

1.3  Filter adaptor sequences and low quality bases of reads

The user could trim low quality bases from two ends if the base quality value is less than a threshold. The adaptor sequences will be filtered if it is provided. If the read length is less than a preset value after above trimming and filtering, the read will be discarded.

The minimum quality value is a threshold, less than which the C base is not considered to be a methylcytosine.

Duplicated reads could be removed (not recommend)

1.4  Reference Preparation

In the current version, one of the following 8 species reference data can be chosen:

  • Homo sapiens
  • Chicken
  • Mouse
  • Rat
  • Cow
  • Dog
  • Pig
  • Zebrafish

If the reference of the research species is not listed on the menu, the user can contact us to add it to WBSA, or download WBSA��s program package from Downloads page and run jobs with the reference of research species on a local server.

Please upload the Lambda sequence file (file name must be chrLam.fa) if an unmethylated DNA sequence is used as the control for calculating the conversion rate of bisulphate sequencing. Otherwise, please provide the p-value, i.e., the false rate (p = 1 - conversion rate).

1.5  Mapping parameters

WBSA uses BWA program to map reads to the reference.
For example, the parameters could be -l 32, -k 2, -n 4 if the read length is 80bp.

1.6  Advanced options

RRBS supports three additional options: methylation level distribution in TE, sequence preference analysis of the mCG, mCHG, and mCHH and the correlation analysis between the gene expression and methylation level.
Transposable element file should be provided if this option is chosen. The upload te data file format is as below:

species  TE_id   chromosome   strand   start position (count from 1)   end   position (count from 1)

The gene expression data should be provided if this option is chosen. The upload expression data file format is as below:

gene id  express value   chromosome   gene start position (count from 0)   gene end position (count from 1)   strand

1.7  Email

RRBS will notify users by sending a message when each step of the task is finished. So users should provide a valid email address here to receive the messages.