Introduction to RNA-Seq with HPC

General Information

This hands-on workshop teaches basic concepts, skills and tools for working more effectively with data.

We will use genomics data to demonstrate how to use the Linux/UNIX command line interface to perform basic text manipulations and effectively manage datasets. In addition, you will learn how to effectively use a high-performance compute environment on the O2 compute cluster (HMS-RC) in the context of a RNA-seq workflow. We will not be teaching any particular bioinformatics tools, but the foundational skills that will allow you to conduct any analysis and analyze the output of a genomics pipeline. 

By the end of the workshop you should be able to more effectively manage and analyze data and be able to apply the tools and approaches directly to their ongoing research.

No prior computational experience is required, though we do expect familiarity with genomics concepts.

Requirements: Participants with their own laptops require a few specific software packages installed (listed below under Setup). 

Contact: Please email hbctraining@hsph.harvard.edu for more information. For any O2-specific questions, please email rchelp@hms.harvard.edu.


Setup

1. Please have the following programs installed on your laptop: 

Mac users

1. Java

2. Filezilla (make sure you get the Filezilla Client, not the Server)

3. Integrative Genomics Viewer (IGV)

4. Text Wrangler or Sublime Text

Windows users

1. Git Bash

2. Java

3. Filezilla (make sure you get the Filezilla Client, not the server)

4. Integrative Genomics Viewer (IGV)

5. Notepad++

2. To use O2, we will have temporary training accounts available for you to use for the duration of the workshop. But, if you do not have an account on O2, and wish to create your own account for use during class please follow the instructions below: 

  • First, check that you are able to sign in with your an eCommons ID/Password: The eCommons login is required to create your account on O2. If you are unsure whether you have an account or forgot your password, please check using this self-service link on the eCommons website: https://ecommons.med.harvard.edu/.
  • After making sure that you have an eCommons login, please do the following:
      1.  Go to: https://rc.hms.harvard.edu/#cluster
      2. Click the “Account Request” button (red). That will bring up a web-form on your screen for user account request
      3. Please fill out the required fields (Name, eCommons ID, HMS (or affiliated) email address, and Organization/Department you belong to).
  •  Once the account gets created, you will get an email from HMS Research Computing with a confirmation. 

Schedule

******************************************** Day1 Schedule ********************************************
9:00 - 9:15Welcome and IntroductionsAll

9:15 - 9:40

Introduction to workshop

Radhika
9:40 - 10:40UNIX 1: Introduction to the shellRadhika

10:40 - 10:55

Coffee  
10:55 - 11:25UNIX 1: Introduction to the shell (continued)Meeta
11:25 - 12:00UNIX 2: Searching and redirectionMary

12:00 - 13:00

Lunch 
13:00 - 13:20UNIX 3: Using the text editor "Vim"Mary
13:20 - 14:35UNIX 4: "for" loop and shell scriptsMeeta

14:35 - 15:05

UNIX 5: Permissions and Environment variables

Radhika

15:05 - 15:20

Coffee 

15:20 - 16:00

Project organization and data management

Meeta
16:00 - 17:00Introduction to RNA-seqRadhika


******************************************** Day2 Schedule ********************************************
09:00 - 09:45Introduction to High-Performance ComputingRadhika

09:45 - 10:30

Data QC - Intro and FastQC

Mary

10:30 - 10:45

Coffee  
10:45 - 11:15Data QC - Intro and FastQC (continued)Mary
11:15 - 12:00RNA-seq workflow - Alignment and CountingMeeta
12:00 - 13:00Lunch 
13:00 - 13:45RNA-seq workflow - Alignment and Counting (continued)Meeta

13:45 - 14:45

Automating RNA-seq workflow

Radhika

14:45 - 15:00

Coffee 

15:00 - 15:15

RNA-seq analysis methods

Radhika

15:15 - 16:45

Quantifying expression using alignment-free methods

Mary/Meeta
16:45 - 17:00Wrap UpRadhika


Acknowledgements, Support & License:

This workshop is sponsored by the Harvard Medical School Tools and Technology Committee (TnT), the Harvard NeuroDiscovery Center (HNDC), and the Harvard Stem Cell Institute (HSCI).

These lessons have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Some of the materials used in this lesson are adapted from work that is Copyright © Data Carpentry (http://datacarpentry.org/). All Data Carpentry instructional material is made available under the Creative Commons Attribution license (CC BY 4.0).

 


Copyright © 2024 The President and Fellows of Harvard College * Accessibility * Support * Request Access * Terms of Use