Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

ChIP-seq Practice Exercises

 


****NOTE: When working on O2, be sure to run this in /n/scratch2/ rather than your home directory .****

# ChIP-Seq Analysis Workflow

1.  Create a directory called your eCommons ID within /n/scratch2/. Enter that directory and create a new directory called HCFC1_chipseq.

...

      1. Run FASTQC 
      2. Align reads with Bowtie2 using the parameters we used in class.
        NOTE: For the Bowtie2 index you will need to point to the hg19 index files from: /n/groups/shared_databases/igenome/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/
      3. Change alignment file format from SAM to BAM (can be done using samtools or sambamba)
      4. Sort the BAM file by read coordinate locations (can be done using sambamba or with samtools)
      5. Filter to keep only uniquely mapping reads (this will also remove any unmapped reads and duplicates) using sambamba

**NOTE: The script will require positional parameters, and using basename will help with the naming of output files**

...

c. Create a separate job submission script to run the shell script you created in b) on all four .fastq files. You have the option of running this in serial or parallel. Take a look at the lessons from the RNA-Seq session automation lesson to help with setting up the job submission script. 

...

e.  Use phantompeakqualtools to check the quality of the ChIP signal for HCFC1-rep1 and HCFC1-rep2. Report the NSC, RSC, and QualityTag values. What can you conclude about the quality of the data based on these quality measures?f. Run ChIPQC and upload the

OPTIONAL: If you feel very ambitious get X11 setup on your personal account and try running ChIPQC to create a report for HCFC1. Discuss the quality of the data using these

...

metrics. If you run into problems here please reach out to HMSRC folks, this is known to be tricky!

g. Sort each of the narrowPeak files using:

...

**NOTE 2: Just perform the 1 step we did in class to generate the IDR stats; there is no need to do the remaining 2 steps in the IDR workflow. (However, you can do them optionally if you are interested and we are happy to help you troubleshoot if need be.)**

i. These high confidence peaks from IDR can be used to explore downstream tools for functional enrichment and motif discovery. Use ChIPseeker to do the following:

Plot the read count frequency relative to the TSS (use a window of +/- 1000 bp). Upload this figure.

...

GREAT to annotate the peaks and write those annotations to file.

...

Plot a pie chart using the genomic region annotations. Which genomic feature is the most represented? 

...

Take a look at binding site locations relative to the TSS, what does this tell us?

Evaluate the GO enrichment analysis from GREAT, what terms do you see over-represented?

h. **OPTIONAL:** Try motif analysis using the MEME suite and comment on the results you find.

...