...
...
...
...
...
...
...
...
...
Answer key can be found here.
# Logging in and starting an interactive job
1. Log into Orchestra and start a single core interactive job.
# Practicing Unix commands and using vim
2. Make a new directory called homework
inside your ~/ngs_course
directory. Change directories to your homework
directory.
- What is the absolute path to your
homework
directory?
3. Using vim, create a file entitled mov10_fastq_reads.txt
. On the first line type the sentence “Read counts from irrelevant siRNA, Mov10 knock down and Mov10 over-expression samples.” Save and exit vim.
4. Change directories into the /groups/hbctraining/ngs-data-analysis2016
directory. Locate and enter the rnaseq
folder, then enter the other
folder.
- What is the absolute path of your current directory?
- Copy both of the
Mov10_kd
files to thehomework
directory inside of your ~/ngs_course directory.
5. Change directories to your home directory. Locate and enter the raw_data folder (Hint: this was from the rna-seq portion of Session I).
...
6. Change directories to your homework
directory. Pipe together the grep
and wc
commands to determine the number of reads generated for Mov10_oe_1.subset.fastq
. Look at your FastQC report for this sample and examine the statistics to determine whether your pipe command generated the correct number of reads for the sample. If not, fix your pipe command so that it gives you the same number of reads as the FastQC.
- How many reads were obtained for this sample?
- What was the command you used for determining the number of reads in your file?
# Practicing more advanced commands, including for_loops, redirection, scripts.
7. Using the pipe command from the previous question, write a for_loop
to determine the number of reads obtained for every fastq sample in the homework
folder.
- Within the
for_loop
, echo the filename, then, use your pipe command from Question #6 to determine the read counts for each file.
Your output should be in the format:
Mov10_oe_1.fastq
100
Mov10_oe_2.fastq
50…
8. Open vim and write a new script called fastq_read_counts.sh.
This script will contain the for_loop
you created in the previous question.
- How many reads were generated for each of the files?
9. You should have seen the results of your script displayed in your shell. Now open your fastq_read_counts.sh
script in vim. Edit your script, such that for every file you append the results to the file you created in Question #3 of this homework: mov10_fastq_reads.txt.
Hint: Use “>>” to redirect the output. What does redirecting with “>>” do (google is your friend)?
- Run your script for all fastq files in your
homework
directory. Your output in themov10_fastq_reads.txt
should look something like this:
Read counts from irrelevant siRNA, Mov10 knock down and Mov10 over-expression samples.
Mov10_oe_1.fastq
100
Mov10_oe_2.fastq
50…
- Upload the
mov10_fastq_reads.txt
script to your folder on the course wiki homework page (see link above).
# Practicing FastQC
10. Exit the interactive session (compute node) and go back to the login node (loge or mezzanine).
- Make a copy of the
mov10_fastqc.lsf
homework
directory and give it the namehw_
fastqc_run.lsf
. Modify this newly copied script to perform FastQC on the following samples in parallel:
Irrel_kd_2.subset.fq, Mov10_kd_2.subset.fq, Mov10_oe_2.subset.fq
...
# Metadata
11. Enter the rnaseq folder, and create a README file (hint: use vim
to create the file), as described in the Exercise section of the Data Management lesson. Give a short description of the project and brief descriptions of the types of files you will be storing within each of the sub-directories.
...
# Create an account on O2, if you don't already have one
1. To use O2 you will have to first create your own account, please do so by following the instructions below (note this can take several days):
- First, check that you are have an eCommons ID/Password: The eCommons login is required to create your account on O2. If you are unsure whether you have an account or forgot your password, please check using this self-service link on the eCommons website: https://ecommons.med.harvard.edu/.
- After making sure that you have an eCommons login, please do the following:
- Go to: https://rc.hms.harvard.edu/#cluster
- Click the “Account Request” button (red). That will bring up a web-form on your screen for user account request
- Please fill out the required fields (Name, eCommons ID, HMS (or affiliated) email address, and Organization/Department you belong to).
- Once the account gets created, you will get an email from HMS Research Computing with a confirmation.
If you have any questions about the account creation process, please email rchelp@hms.harvard.edu.
# Pre-practice
2. Run through all the Workshop Lessons from start to finish.
# Practice Exercises
3. Run through the Practice Exercises.
4. Check against the Answer Key.