Overview

This webpage contains teaching materials I have been using for courses. Each topic is suitable for 3~6 hrs of an intensive course. The labs consist of .Rnw or .Rmd files which require the installation of several packages. The necessary packages for most of them can be conveniently installed running the following script within R:

## script location
script <- "http://www-huber.embl.de/users/klaus/EMBO.R"
## installation 
source(script)

However, the necessary packages are also indicated at the beginning of each lab. The acknowledgments contain some references to the material used, which often represent links to additional documents suitable for further in-depth study of the topic.

Most of the labs have been built using knitr and BiocStyle in connection with Sweave-like .Rnw documents.

Licencing – MIT licence

The material on this page is licensed under a MIT licence, the full text of which you can find here. It basically means, you can do anything you like with the software and the associated materials.

Introduction to R using the tidyverse

This lab gives an introduction to R using many packages from the tidyverse. You can find its git repository here.

Files

Type Link
Slides Slides
Lab Lab pdf
R script R–script
Rmd .Rmd file

Introduction to R

This material gives a concise introduction to R.

Acknowledgements

Large parts of this material are based on the contributed documentation on CRAN. Notably, “Applied Statistics for Bioinformatics Using R” , “IcebreakeR” and “A (very) short Introduction to R” as well as the “Best first R tutorial” and introductory material from Laurent Gatto.

An in-depth resource for the details of R-programming is Advanced R.

Files

Type Link
Slides Slides
Lab Lab pdf
R script R–script
Rnw .Rnw file

Data handling and graphics with R using the tidyverse

This labs introduces essential data handling techniques and graphics in R. You can find its git repository here.

Files

Type Link
Slides Slides
Lab Lab pdf
R script R–script
Rmd .Rmd file

Data handling with R

This material gives an introduction to data handling and data reshaping with R, including a lot of data handling techniques using the dplyr package and reshaping using the tidyr and reshape2 packages. We also introduce chaining with the magrittr package.

Files

Type Link
Slides Slides
Lab Lab pdf
R script R–script
Rnw .Rnw file

Acknowledgements

The material on dplyr is to some extend based on tutorials by Kevin Markham and Dirk Schuhmacher. The illustration of the dplyr verbs is adapted from a presentation of H. Wickham.

Example data were provided by Michele Christovao, Elisabeth Zielonka and Ina Kalinina.

Exploratory data analysis and ggplot2

Typical summary statistics for location and scale as well as common diagnostic plots are presented. The plots are given both in base R as well as using ggplot2 commands.

Acknowledgements

Large parts of this material are based on the plots used in “Applied Statistics for Bioinformatics Using R”.

Lars Velten contributed the Protein exercise using ggplot. (Note the data used in the protein-example has been simulated and does not correspond to any real experimental data).

The ggplot explanations have been inspired by the ggplot2 book as well as the ggplot2 intro by Josef Fruehwald.

Wolfgang Huber provided nice thoughts & slides on color usage for graphics.

Files

Type Link
Slides Slides
Lab Lab pdf
R script R–script
Rnw .Rnw file

Statistical methods for Bioinformatics

This is a very concise introduction to important statistical methods in bioinformatics: dimensionality reduction, clustering and regression.

These techniques are illustrated in the context of the analysis of (single cell) RNA–Seq data.

Acknowledgements

The material on statistical distributions is based on “Applied Statistics for Bioinformatics Using R”.

I adapt the usage of the bodyfat data as an example data set for multivariate models such as regression and PCA from Michael Lavine’s Introduction to Statistical Thought, which is an excellent introductory statistics textbook in itself.

The slides are partially based on material by Wolfgang Huber and John Marioni (EMBL-EBI).

Files

Type Link
Slides Slides
Lab Lab pdf
R script R–script
Rnw .Rnw file

Hypothesis Testing

This is an introduction to hypothesis testing, including multiple testing as well as advanced topics such as regularized t-statistics, independent filtering and empirical null estimation.

Acknowledgements

The material on the basic tests is based on “Applied Statistics for Bioinformatics Using R”.

Almost all of the slides are by Wolfgang Huber.

The material on tests for categorical data is mainly based on the book Introductory Statistics with R, which is also great for learning R and statistics at the same time.

Files

Type Link
Slides Slides
Lab Lab pdf
R script R–script
Rnw .Rnw file

Hypothesis Testing – concise

This is a concise introduction to hypothesis testing, including only the most widely used tests but also explaining the idea of permutation tests, since these can be very useful in practice.

The material also covers multiple testing as well as advanced topics such as regularized t-statistics, independent filtering and empirical null estimation.

Acknowledgements

The permutation test explanations were inspired by Tim Hesterberg’s excellent review on resampling for undergraduates.

Files

Type Link
Slides Slides
Lab Lab pdf
R script R–script
Rnw .Rnw file

Differential expression analysis of RNA-Seq data with DESeq2

This material introduces a complete workflow for DE analysis of RNA–Seq data starting from the raw FASTQ files. It performs a re-analysis of the RNA-Seq data analyzed in Uslu et. al. – Long-range enhancers regulating Myc expression are required for normal facial morphogenesis, 2014

Acknowledgements

The material is largely based on the documentation of the DESeq2 package on Bioconductor by Mike Love, Simon Anders and Wolfgang Huber.

The first part of the lab, from FASTQ files to the count-table follows Anders et. al. - Count-based differential expression analysis of RNA sequencing data using R and Bioconductor, 2013 closely.

Simon Anders also provided the slides.

Type Link
Slides Slides DESeq2 - Wolfgang Huber
Slides Slides DESeq2
Slides Slides HT Sequencing
Lab Lab pdf
R script R–script
Rnw .Rnw file

Differential expression analysis of RNA-Seq – Predoc Course 2014

This material introduces a workflow for DE analysis of RNA-Seq data starting from the gene count table. It is similar to the worklflow above and performs a re–analysis of the RNA-Seq data analyzed in Uslu et. al. – Long-range enhancers regulating Myc expression are required for normal facial morphogenesis, 2014

It has been created for the DNA/RNA module of the 2014 EMBL predoc course and uses html instead of LaTeX.

Acknowledgements

The material is largely based on the documentation of the DESeq2 package and the rnaseqGene workflow on Bioconductor by Mike Love, Simon Anders and Wolfgang Huber.

Type Link
Lab Lab html
R script R–script
Rmd .Rmd file

Machine Learning – Predoc Course 2014

This is an introduction to Machine Learning, it is still work in progress.

Acknowledgements

This is heavily based on material form S. Arora (Bioc Seattle, Oct 14) and VJ Carey (Brixen 2011).

Type Link
Lab Lab html
R script R–script
Rmd .Rmd file

Analysis of a high throughput microscopy data set

This material discusses the analysis of a high content screening data set. It has been prepared for the EMBO course High–Throughput Microscopy for Systems Biology in October 2016.

You can find its git repository here.