1.How to use WBSA to analyze WGBS data?
1.1  Upload raw data

WGBS (Whole-Genome Bisulfite-Sequencing) starts with raw data uploading and accepts both single-end data and pair-end data. Raw data should be FastQ format and compressed (*.zip or *.tar.gz). Each compressed file should be no more than 2GB for HTTP mode, but there is no limit to the size of sequencing data for FTP mode. The first file is the T-rich file, and the second file is the A-rich file. Input file format:

FastQ format

@HWI-ST958:71:8:1101:1212:2193#0/1
ATAAAATATTANTTTTTGTTAGAAAGAATTAGTGNTNAATTTTTNGATTTAATAATGGGTATCGTATTATTAGTTGATGTTTAATCGTATAAAATGTTAGT
+
CC@FDFFFHHH#2AEHIIHIJJJJJJHIJJJJEG#1#0:DFHII#07FHIIIIIGGIJJ=CGHIGIHHGHHFFBEDFFFEEEEEEEDDDDDCDDEDFEEDD
@HWI-ST958:71:8:1101:1246:2245#0/1
TAGTGAGATAAATTAATTGTTAATTTTTTTACGTTTTTAATGGTATTTTGAGTATTTTTGAAGGAAAGGTTTAATTTAATGGCGTATTGTAGAATAATTGA
+
BBBDDDFEHHHHHJJJJJJJIIIJJJJJJJJJJDHIIJHIJJIFFHIJJJJJBBHIJJJJEDHIGHHHH=CFFFFBCCEEEEEDDBDDDEEDEDDCCADD<
@HWI-ST958:71:8:1101:1491:2132#0/1
NTCACGAAATATCGTTTTTTATTTTTCGTTAATATTTTATTTATTTTATAAAGATCAGAATGTAGGAGTTTTAGTGGTGTTAAGTTTTTTTTTTCGGTAAT
+
#1:ABDDDFHD?AEHI@GIECEBFHIIIIIG?BDBDECGHIIIIEH@F>=FAH@;@)7A;CA=?CDE?ABCA@;;>CCBBBBBB>&58?8>
@HWI-ST958:71:8:1101:1321:2152#0/1
GAAAGATATAATTATGTATTTGAATAATTCGTATTAAGTGATGAATGTTGTTTTTGGAATATTTTTTTTACGGATTTTTAACTGGTGATAGTCTGATCGCA
+
@@CFFFFFHHHHHJJJJJJJJJJJJIJJJJJGIJJEIJCGHIIEGIJHIIHJJJJJJIJJIJJJJJJJHFFFF?A>CCD@CDDDDDDDDEDDEEDDEDDDB

1.2  Quality analysis of the raw reads

FastQC software is integrated to analyze quality of bisulfite sequencing data. The result includes quality distribution, nucleotide distribution, GC content distribution and overrepresented sequences identification. See demo.

1.3  Filter adaptor sequences and low quality bases of reads

The user could trim low quality bases from two ends if the base quality value is less than a threshold. The adaptor sequences will be filtered if it is provided. If the read length is less than a preset value after above trimming and filtering, the read will be discarded.

The minimum quality value is a threshold, less than which the C base is not considered to be a methylcytosine.

Duplicated reads could be removed (recommend)

1.4  Reference Preparation

In the current version, one of the following 10 species reference data can be chosen:

  • Homo sapiens
  • Chicken
  • Mouse
  • Rat
  • Cow
  • Dog
  • Pig
  • Zebrafish
  • Rice
  • Arabidopsis

If the reference of the research species is not listed on the menu, the user can contact us to add it to WBSA, or download WBSA’s program package from Downloads page and run jobs with the reference of research species on a local server.

Please upload the Lambda sequence file (file name must be chrLam.fa) if an unmethylated DNA fragment is used as the control for calculating the conversion rate of bisulphate sequencing. Otherwise, please provide the p-value, i.e., the false rate (p = 1 - conversion rate).

1.5  Mapping Parameters

WBSA uses BWA program to map reads to the reference.
For example, the parameters could be -l 32, -k 2, -n 4 if the read length is 80bp.

1.6  Advanced Options

WGBS supports three additional options: methylation level distribution in TE, sequence preference analysis of the mCG, mCHG, and mCHH and the correlation analysis between the gene expression and methylation level. Transposable element file should be provided if this option is chosen. The upload te data file format is as below:

species   TE_id   chromosome   strand   start position (count from 1)   end position (count from 1)

The gene expression data should be provided if this option is chosen. The upload expression data file format is as below:

gene id  express value   chromosome   gene start position (count from 0)   gene end position (count from 1)   strand

1.7  Email

WGBS will notify users by sending a message when each step of the task is finished. So users should provide a valid email address here to receive the messages.