Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...


Answer key can be found here.

# Logging in and starting an interactive job

1. Log into Orchestra and start a single core interactive job.

# Practicing Unix commands and using vim

2. Make a new directory called homework inside your ~/ngs_course directory. Change directories to your homework directory.

  • What is the absolute path to your homework directory?

3. Using vim, create a file entitled mov10_fastq_reads.txt.  On the first line type the sentence “Read counts from irrelevant siRNA, Mov10 knock down and Mov10 over-expression samples.” Save and exit vim.

 

4. Change directories into the /groups/hbctraining/ngs-data-analysis2016 directory. Locate and enter the rnaseq folder, then enter the other folder.

  • What is the absolute path of your current directory?
  • Copy both of the Mov10_kd files to the homework directory inside of your ~/ngs_course directory. 

5. Change directories to your home directory. Locate and enter the raw_data folder (Hint: this was from the rna-seq portion of Session I).

...

6. Change directories to your homework directory. Pipe together the grep and wc commands to determine the number of reads generated for Mov10_oe_1.subset.fastq.  Look at your FastQC report for this sample and examine the statistics to determine whether your pipe command generated the correct number of reads for the sample. If not, fix your pipe command so that it gives you the same number of reads as the FastQC.

  • How many reads were obtained for this sample?
  • What was the command you used for determining the number of reads in your file?

# Practicing more advanced commands, including for_loops, redirection, scripts.

7. Using the pipe command from the previous question, write a for_loop to determine the number of reads obtained for every fastq sample in the homework folder.

  • Within the for_loop, echo the filename, then, use your pipe command from Question #6 to determine the read counts for each file.

Your output should be in the format:

Mov10_oe_1.fastq

100

Mov10_oe_2.fastq

50…

8. Open vim and write a new script called fastq_read_counts.sh. This script will contain the for_loop you created in the previous question.

  •  How many reads were generated for each of the files?

9. You should have seen the results of your script displayed in your shell. Now open your fastq_read_counts.sh script in vim. Edit your script, such that for every file you append the results to the file you created in Question #3 of this homework: mov10_fastq_reads.txt.

Hint: Use “>>” to redirect the output. What does redirecting with “>>” do (google is your friend)?

  • Run your script for all fastq files in your homework directory. Your output in the mov10_fastq_reads.txt should look something like this:

Read counts from irrelevant siRNA, Mov10 knock down and Mov10 over-expression samples.

Mov10_oe_1.fastq

100

Mov10_oe_2.fastq

50…

  • Upload the mov10_fastq_reads.txt script to your folder on the course wiki homework page (see link above).

# Practicing FastQC

10. Exit the interactive session (compute node) and go back to the login node (loge or mezzanine).

  • Make a copy of the mov10_fastqc.lsf script created during Session I, into your homework directory and give it the name hw_fastqc_run.lsf. Modify this newly copied script to perform FastQC on the following samples in parallel:

Irrel_kd_2.subset.fq, Mov10_kd_2.subset.fq, Mov10_oe_2.subset.fq

...

# Metadata

11. Enter the rnaseq folder, and create a README file (hint: use vim to create the file), as described in the Exercise section of the Data Management lesson. Give a short description of the project and brief descriptions of the types of files you will be storing within each of the sub-directories.

...

# Create an account on O2, if you don't already have one

1. To use O2 you will have to first create your own account, please do so by following the instructions below (note this can take several days): 

 

  • First, check that you are have an eCommons ID/Password: The eCommons login is required to create your account on O2. If you are unsure whether you have an account or forgot your password, please check using this self-service link on the eCommons website: https://ecommons.med.harvard.edu/.
  • After making sure that you have an eCommons login, please do the following:
      1. Go to: https://rc.hms.harvard.edu/#cluster
      2. Click the “Account Request” button (red). That will bring up a web-form on your screen for user account request
      3. Please fill out the required fields (Name, eCommons ID, HMS (or affiliated) email address, and Organization/Department you belong to).
  • Once the account gets created, you will get an email from HMS Research Computing with a confirmation. 

If you have any questions about the account creation process, please email rchelp@hms.harvard.edu.

# Pre-practice

2. Run through all the Workshop Lessons from start to finish.

 

# Practice Exercises

3. Run through the Practice Exercises.

4. Check against the Answer Key.