Next Generation Data Analysis Marathon

December 12-16, 2013

This 5-day workshop is for users who want to acquire the skills required to analyze next generation sequence (NGS) and other large-scale data sets independently and in a proficient manner. Most workshop modules will use the data analysis environment R/Bioconductor which is nowadays the Lingua Franca of data driven research. No prior knowledge of R is required for attending this workshop, but beginners should sign up for the introductory sections (on Thu & Fri) that will provide the basics required for the applied data analysis sections of this event. In addition, users will learn to run command-line tools under Linux, such as NGS aligners/assemblers, which is an indispensable skill for analyzing modern genome-scale data sets. The last module (Mon afternoon) will introduce the web-based NGS data analysis environment Galaxy that requires no special computer knowledge. 

Genomics Lecture Hall, UC Riverside [ Map ]
Parking: Non-UCR participants can obtain parking information and permits at the entrance to the UCR campus.


Thu, Dec 12, 2013

09:00-12:00 PM - (1) Introduction to R (Instructor: Thomas Girke)    Slides ]   [ Exercises ]   [ Manual ]
12:00-01:00 PM - Lunch Break
01:00-06:00 PM - (2) Programming in R (Instructor: Thomas Girke)     Slides ]   [ Exercises ]   [ Manual ]

Fri, Dec 13, 2013

09:00-12:00 PM - (3) Visualizing and Clustering High-Throughput Data (Instructor: Thomas Girke)     [ Slides ]   [ Exercises ]   [ Manual 
12:00-01:00 PM - Lunch Break
01:00-04:00 PM - (4) Linux Part I: Linux Essentials (Instructors: Jordan Hayes & Thomas Girke)    [ Slides ]   [ Exercises ]   [ Manual ]  
04:00-06:00 PM - (5) Linux Part II: Using IIGB's Linux Cluster (Instructor: Jordan Hayes)    [ Slides ]   [ Manual ] 

Sat, Dec 14, 2013

09:00-12:30 PM - (6) Basics on Analyzing Next Generation Sequencing Data with R/Bioconductor  (Instructor: Thomas Girke)    Slides ]   [ Exercises ]   [ Manual ] 
12:30-01:30 PM - Lunch Break
01:30-06:00 PM - (7) Analysis of RNA-Seq Data with R/Bioconductor (Instructor: Thomas Girke)     Slides ]   [ Exercises ]   [ Manual ] 

Sun, Dec 15, 2013

09:00-12:30 PM - (8) Analysis of ChIP-Seq Data (Instructor: Thomas Girke)    Slides ]   [ Exercises ]   [ Manual ] 
12:30-01:30 PM - Lunch Break
01:30-06:00 PM -  (9) Analysis of SNP/Var-Seq Data (Instructor: Thomas Girke)     Slides ]   [ Exercises ]   [ Manual ] 

Mon, Dec 16, 2013

09:00-11:00 AM - (10a) Cheminformatics of Drug-like Small Molecules (Instructor: Thomas Girke)     Slides ]   Exercises ]   Manual ]
11:00-12:00 AM - (10b) Analysis of High-Throughput Compound Screens (Instructor: Tyler Backman)     Slides ]   [ Exercises ]   [ Manual ]
12:00-01:30 PM - Lunch Break
01:30-05:30 PM - (11) Web-based Analysis of Next Generation Sequence Data (Instructor: Neerja Katiyar)     [ Slides ]   Exercises ]   Manual ] 

Laptop and Software Requirements

Laptop Requirements
  • Users are expected to bring a laptop with a functional wireless connection and a recent internet browser version (e.g. Firefox, Chrome or Safari) preinstalled. Wireless guest accounts will be provided for non-UCR participants. Also, don't forget to bring a power supply for your laptop to run it for an entire day!
  • If your laptop has special firewall settings (e.g. company owned laptops), then please make sure you know how to administer your firewall settings (e.g. lower restrictions or turn on/off) to allow ssh connections and biocLite installs. 
  • In addition, please follow the software install instructions for each event as outlined below. If you encounter problems, then please email Jordan Hayes ( to assist you with the installation.
Software Installs for R Events
  • Install latest R Version 3.0.2 from here:
  • Next, install RStudio from here: 
  • IGV (Integrative Genomics Viewer) will be used in some parts of the NGS analysis sections:
  • To install the R libraries required for the different course modules, copy & paste the following commands into the RStudio (or the R) console and execute them with the enter key:
    • Modules: Introduction to R and Programming in R
biocVersion() # Note: this should return Bioc Version '2.13' or higher!!
sessionInfo() # Note: this needs to return R Version 3.0.2!!
install.packages(c("ggplot2", "lattice"))
    • Modules: Visualizing and Clustering High-Throughput Data
biocVersion() # Note: this should return Bioc Version '2.13' or higher!! 
install.packages(c("ggplot2", "lattice", "ape", "pvclust", "biclust", "modeltools", "som", "flexclust", "cluster", "scatterplot3d", "gplots", "e1071", "kernlab"))
    • Modules: Basic NGS, RNA-Seq, ChIP-Seq, SNP-Seq, and Cheminformatics
biocVersion() # Note: this should return Bioc Version '2.13' or higher!!
biocLite(c("ShortRead", "Biostrings", "IRanges", "BSgenome", "rtracklayer", "biomaRt", "chipseq", "ChIPpeakAnno", "Rsamtools", "BayesPeak", "PICS", "GenomicRanges", "DESeq", "edgeR", "leeBamViews", "GenomicFeatures", "DEXSeq", "BCRANK", "VariantAnnotation", "SRAdb", "ChemmineR", "fmcsR", "QuasR", "Rbowtie", "AnnotationDbi", "bioassayR", "cellHTS2", "RCurl", "ape"))
biocLite(c("gmapR", "VariantTools"), type="source") # Last two packages are not supported on Windows!

Software Installs for Linux Events
  • On Windows Laptops
    • Please make sure you have an ssh client installed, which is usually the case.