General Information

This hands-on workshop teaches basic concepts, skills and tools for working more effectively with data.

We will use genomics data to demonstrate how to use the Linux/UNIX command line interface to perform basic text manipulations and effectively manage datasets. In addition, you will learn how to effectively use a high-performance compute environment on the Orchestra compute cluster (HMS-RC) in the context of a RNA-Seq workflow. We will not be teaching any particular bioinformatics tools, but the foundational skills that will allow you to conduct any analysis and analyze the output of a genomics pipeline.

By the end of the workshop you should be able to more effectively manage and analyze data and be able to apply the tools and approaches directly to their ongoing research.

No prior computational experience is required, though we do expect familiarity with genomics concepts.

Requirements: Participants with their own laptops require a few specific software packages installed (listed below under Setup).

Contact: Please email hbctraining@hsph.harvard.edu for more information. For any Orchestra-specific questions, please email rchelp@hms.harvard.edu.

Setup

1. Please have the following programs installed on your laptop:

Mac users

1. Java

2. Filezilla (make sure you get the Filezilla Client, not the Server)

3. Integrative Genomics Viewer (IGV)

4. Text Wrangler or Sublime Text

Windows users

1. Git Bash

2. Java

3. Filezilla (make sure you get the Filezilla Client, not the server)

4. Integrative Genomics Viewer (IGV)

5. Notepad++

2. To use Orchestra, we will have temporary training accounts available for you to use for the duration of the workshop. But, if you do not have an account on Orchestra, and wish to create your own account for use during class please follow the instructions below:

First, check that you are able to sign in with your an eCommons ID/Password: The eCommons login is required to create your account on Orchestra. If you are unsure whether you have an account or forgot your password, please check using this self-service link on the eCommons website: https://ecommons.med.harvard.edu/.
After making sure that you have an eCommons login, please do the following:
1. 1. Go to: https://rc.hms.harvard.edu/#orchestra
  2. Click the “Account Request” button (red). That will bring up a web-form on your screen for user account request
  3. Please fill out the required fields (Name, eCommons ID, HMS (or affiliated) email address, and Organization/Department you belong to).

Once the account gets created, you will get an email from HMS Research Computing with a confirmation.

Schedule

****************************************** Day1 Schedule ******************************************
9:00 - 9:20	Welcome and Introductions	All
9:20 - 9:40	Introduction to workshop	Radhika
9:40 - 10:30	Unix 1: Introduction to the shell	Mary
10:30 - 10:45	Coffee
10:45 - 11:35	Unix 1: Introduction to the shell (continued)	Radhika
11:35 - 12:20	Unix 2: Searching and redirection	Mary
12:20 - 13:20	Lunch
13:20 - 14:35	Unix 3: "for" loop and shell scripts	Radhika
14:35 - 15:15	Unix 4: Permissions and environment variables	Mary
15:15 - 15:30	Coffee
15:30 - 16:15	Project and data management	Radhika
16:15 - 17:00	Intro to RNA-Seq	Radhika

****************************************** Day2 Schedule ******************************************
9:00 - 9:45	Introduction to High Performance Computing	Radhika
9:45 - 10:10	Data QC - Intro and FastQC	Mary
10:10 - 10:20	Coffee
10:20 - 11:10	Data QC - Intro and FastQC (continued)	Mary
11:10 - 11:50	Data QC - Trimming	Radhika
11:50 - 12:50	Lunch
12:50 - 14:00	RNA-Seq workflow - Alignment	Mary
14:00 - 14:20	RNA-Seq workflow - Counting	Radhika
14:20 - 14:30	Coffee
14:30 - 15:05	Introduction to Orchestra (Part 2)	Kris Holton, HMS-RC
15:05 - 16:40	Automating RNA-Seq workflow	Radhika
16:40 - 17:00	Wrap Up	Radhika

Acknowledgements, Support & License:

These lessons have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Some of the materials used in this lesson are adapted from work that is Copyright © Data Carpentry (http://datacarpentry.org/). All Data Carpentry instructional material is made available under the Creative Commons Attribution license (CC BY 4.0).

Bioinformatics Training at Harvard Chan Bioinformatics Core

Introduction to Unix, Orchestra and RNA-Seq

Analytics

General Information

Setup

1. Please have the following programs installed on your laptop:

Schedule

Acknowledgements, Support & License:

Related content