Introduction to Unix, Orchestra and RNA-Seq

General Information

This hands-on workshop teaches basic concepts, skills and tools for working more effectively with data.

We will use genomics data to demonstrate how to use the Linux/UNIX command line interface to perform basic text manipulations and effectively manage datasets. In addition, you will learn how to effectively use a high-performance compute environment on the Orchestra compute cluster (HMS-RC) in the context of a RNA-Seq workflow. We will not be teaching any particular bioinformatics tools, but the foundational skills that will allow you to conduct any analysis and analyze the output of a genomics pipeline. 

By the end of the workshop you should be able to more effectively manage and analyze data and be able to apply the tools and approaches directly to their ongoing research.

No prior computational experience is required, though we do expect familiarity with genomics concepts.

Requirements: Participants with their own laptops require a few specific software packages installed (listed below under Setup). 

Contact: Please email hbctraining@hsph.harvard.edu for more information. For any Orchestra-specific questions, please email rchelp@hms.harvard.edu.


Setup

1. Please have the following programs installed on your laptop: 

Mac users

1. Java

2. Filezilla (make sure you get the Filezilla Client, not the Server)

3. Integrative Genomics Viewer (IGV)

4. Text Wrangler or Sublime Text

Windows users

1. Git Bash

2. Java

3. Filezilla (make sure you get the Filezilla Client, not the server)

4. Integrative Genomics Viewer (IGV)

5. Notepad++

2. To use Orchestra, we will have temporary training accounts available for you to use for the duration of the workshop. But, if you do not have an account on Orchestra, and wish to create your own account for use during class please follow the instructions below: 

  • First, check that you are able to sign in with your an eCommons ID/Password: The eCommons login is required to create your account on Orchestra. If you are unsure whether you have an account or forgot your password, please check using this self-service link on the eCommons website: https://ecommons.med.harvard.edu/.
  • After making sure that you have an eCommons login, please do the following:
      1.  Go to: https://rc.hms.harvard.edu/#orchestra
      2. Click the “Account Request” button (red). That will bring up a web-form on your screen for user account request
      3. Please fill out the required fields (Name, eCommons ID, HMS (or affiliated) email address, and Organization/Department you belong to).
  •  Once the account gets created, you will get an email from HMS Research Computing with a confirmation. 

Schedule

******************************************** Day1 Schedule ********************************************
9:00 - 9:20Welcome and IntroductionsAll

9:20 - 9:40

Introduction to workshop

Radhika
9:40 - 10:30Unix 1: Introduction to the shellMary

10:30 - 10:45

Coffee  
10:45 - 11:35Unix 1: Introduction to the shell (continued)Radhika
11:35 - 12:20Unix 2: Searching and redirectionMary

12:20 - 13:20

Lunch 
13:20 - 14:35Unix 3: "for" loop and shell scriptsRadhika

14:35 - 15:15

Unix 4: Permissions and environment variables

Mary

15:15 - 15:30

Coffee 

15:30 - 16:15

Project and data management

Radhika
16:15 - 17:00Intro to RNA-SeqRadhika


******************************************** Day2 Schedule ********************************************
9:00 - 9:45Introduction to High Performance ComputingRadhika

9:45 - 10:10

Data QC - Intro and FastQC

Mary

10:10 - 10:20

Coffee  
10:20 - 11:10Data QC - Intro and FastQC (continued)Mary
11:10 - 11:50Data QC - TrimmingRadhika

11:50 - 12:50

Lunch 
12:50 - 14:00RNA-Seq workflow - AlignmentMary
14:00 - 14:20RNA-Seq workflow - Counting Radhika

14:20 - 14:30

Coffee 

14:30 - 15:05 

Introduction to Orchestra (Part 2)

Kris Holton, HMS-RC

15:05 - 16:40 

Automating RNA-Seq workflow

Radhika
16:40 - 17:00Wrap UpRadhika


Acknowledgements, Support & License:

These lessons have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Some of the materials used in this lesson are adapted from work that is Copyright © Data Carpentry (http://datacarpentry.org/). All Data Carpentry instructional material is made available under the Creative Commons Attribution license (CC BY 4.0).

 


Copyright © 2024 The President and Fellows of Harvard College * Accessibility * Support * Request Access * Terms of Use