UNIX and HPC exercises
Answer key can be found here.
# Logging in and starting an interactive job
1. Log into Orchestra and start a single core interactive job.
# Practicing Unix commands and using vim
2. Make a new directory called homework
inside your ~/ngs_course
directory. Change directories to your homework
directory.
- What is the absolute path to your
homework
directory?
3. Using vim, create a file entitled mov10_fastq_reads.txt
. On the first line type the sentence “Read counts from irrelevant siRNA, Mov10 knock down and Mov10 over-expression samples.” Save and exit vim.
4. Change directories into the /groups/hbctraining/ngs-data-analysis2016
directory. Locate and enter the rnaseq
folder, then enter the other
folder.
- What is the absolute path of your current directory?
- Copy both of the
Mov10_kd
files to thehomework
directory inside of your ~/ngs_course directory.
5. Change directories to your home directory. Locate and enter the raw_data folder (Hint: this was from the rna-seq portion of Session I).
- What is the absolute path of your current directory?
- Copy all of the fastq files from your current directory to your
homework
directory. You should now have 8 fastq files in yourhomework
directory.
6. Change directories to your homework
directory. Pipe together the grep
and wc
commands to determine the number of reads generated for Mov10_oe_1.subset.fastq
. Look at your FastQC report for this sample and examine the statistics to determine whether your pipe command generated the correct number of reads for the sample. If not, fix your pipe command so that it gives you the same number of reads as the FastQC.
- How many reads were obtained for this sample?
- What was the command you used for determining the number of reads in your file?
# Practicing more advanced commands, including for_loops, redirection, scripts.
7. Using the pipe command from the previous question, write a for_loop
to determine the number of reads obtained for every fastq sample in the homework
folder.
- Within the
for_loop
, echo the filename, then, use your pipe command from Question #6 to determine the read counts for each file.
Your output should be in the format:
Mov10_oe_1.fastq
100
Mov10_oe_2.fastq
50…
8. Open vim and write a new script called fastq_read_counts.sh.
This script will contain the for_loop
you created in the previous question.
- How many reads were generated for each of the files?
9. You should have seen the results of your script displayed in your shell. Now open your fastq_read_counts.sh
script in vim. Edit your script, such that for every file you append the results to the file you created in Question #3 of this homework: mov10_fastq_reads.txt.
Hint: Use “>>” to redirect the output. What does redirecting with “>>” do (google is your friend)?
- Run your script for all fastq files in your
homework
directory. Your output in themov10_fastq_reads.txt
should look something like this:
Read counts from irrelevant siRNA, Mov10 knock down and Mov10 over-expression samples.
Mov10_oe_1.fastq
100
Mov10_oe_2.fastq
50…
- Upload the
mov10_fastq_reads.txt
script to your folder on the course wiki homework page (see link above).
# Practicing FastQC
10. Exit the interactive session (compute node) and go back to the login node (loge or mezzanine).
- Make a copy of the
mov10_fastqc.lsf
homework
directory and give it the namehw_
fastqc_run.lsf
. Modify this newly copied script to perform FastQC on the following samples in parallel:
Irrel_kd_2.subset.fq,
Mov10_kd_2.subset.fq,
Mov10_oe_2.subset.fq
- Use Filezilla to download the zipped files to your own computer and examine the FastQC reports.
- Which samples have quality scores that drop below a score of 25?
- Do any samples have adapter contamination based on the information in the FastQC report?
- Upload the
hw_fastqc_run.lsf
script to your folder on the course wiki homework page (see link above).
# Metadata
11. Enter the rnaseq folder, and create a README file (hint: use vim
to create the file), as described in the Exercise section of the Data Management lesson. Give a short description of the project and brief descriptions of the types of files you will be storing within each of the sub-directories.
12. Upload your answers to the questions on this assignment to your folder on the course wiki homework page (see link above).