This webpage contains teaching materials I have been using for courses. Each topic is suitable for 3~6 hrs of an intensive course. The labs consist of .Rnw or .Rmd files which require the installation of several packages. The necessary packages for most of them can be conveniently installed running the following script within R:

```
## script location
script <- "http://www-huber.embl.de/users/klaus/EMBO.R"
## installation
source(script)
```

However, the necessary packages are also indicated at the beginning of each lab. The acknowledgments contain some references to the material used, which often represent links to additional documents suitable for further in-depth study of the topic.

Most of the labs have been built using knitr and BiocStyle in connection with Sweave-like .Rnw documents.

The material on this page is licensed under a MIT licence, the full text of which you can find here. It basically means, you can do anything you like with the software and the associated materials.

This lab gives an introduction to R using many packages from the tidyverse. You can find its git repository here.

This material gives a concise introduction to R.

Large parts of this material are based on the contributed documentation on CRAN. Notably, “Applied Statistics for Bioinformatics Using R” , “IcebreakeR” and “A (very) short Introduction to R” as well as the “Best first R tutorial” and introductory material from Laurent Gatto.

An in-depth resource for the details of R-programming is Advanced R.

This labs introduces essential data handling techniques and graphics in R. You can find its git repository here.

This material gives an introduction to data handling and data reshaping with R, including a lot of data handling techniques using the dplyr package and reshaping using the tidyr and reshape2 packages. We also introduce chaining with the magrittr package.

The material on dplyr is to some extend based on tutorials by Kevin Markham and Dirk Schuhmacher. The illustration of the dplyr verbs is adapted from a presentation of H. Wickham.

Example data were provided by Michele Christovao, Elisabeth Zielonka and Ina Kalinina.

Typical summary statistics for location and scale as well as common diagnostic plots are presented. The plots are given both in base R as well as using ggplot2 commands.

Large parts of this material are based on the plots used in “Applied Statistics for Bioinformatics Using R”.

Lars Velten contributed the Protein exercise using ggplot. (Note the data used in the protein-example has been simulated and does not correspond to any real experimental data).

The ggplot explanations have been inspired by the ggplot2 book as well as the ggplot2 intro by Josef Fruehwald.

Wolfgang Huber provided nice thoughts & slides on color usage for graphics.

This is a very concise introduction to important statistical methods in bioinformatics: dimensionality reduction, clustering and regression.

These techniques are illustrated in the context of the analysis of (single cell) RNA–Seq data.

The material on statistical distributions is based on “Applied Statistics for Bioinformatics Using R”.

I adapt the usage of the bodyfat data as an example data set for multivariate models such as regression and PCA from Michael Lavine’s Introduction to Statistical Thought, which is an excellent introductory statistics textbook in itself.

The slides are partially based on material by Wolfgang Huber and John Marioni (EMBL-EBI).

This is an introduction to hypothesis testing, including multiple testing as well as advanced topics such as regularized t-statistics, independent filtering and empirical null estimation.

The material on the basic tests is based on “Applied Statistics for Bioinformatics Using R”.

Almost all of the slides are by Wolfgang Huber.

The material on tests for categorical data is mainly based on the book Introductory Statistics with R, which is also great for learning R and statistics at the same time.

This is a concise introduction to hypothesis testing, including only the most widely used tests but also explaining the idea of permutation tests, since these can be very useful in practice.

The material also covers multiple testing as well as advanced topics such as regularized t-statistics, independent filtering and empirical null estimation.

The permutation test explanations were inspired by Tim Hesterberg’s excellent review on resampling for undergraduates.

This material introduces a complete workflow for DE analysis of RNA–Seq data starting from the raw FASTQ files. It performs a re-analysis of the RNA-Seq data analyzed in Uslu et. al. – Long-range enhancers regulating Myc expression are required for normal facial morphogenesis, 2014

The material is largely based on the documentation of the DESeq2 package on Bioconductor by Mike Love, Simon Anders and Wolfgang Huber.

The first part of the lab, from FASTQ files to the count-table follows Anders et. al. - Count-based differential expression analysis of RNA sequencing data using R and Bioconductor, 2013 closely.

Simon Anders also provided the slides.

Type | Link |
---|---|

Slides | Slides DESeq2 - Wolfgang Huber |

Slides | Slides DESeq2 |

Slides | Slides HT Sequencing |

Lab | Lab pdf |

R script | R–script |

Rnw | .Rnw file |

This material introduces a workflow for DE analysis of RNA-Seq data starting from the gene count table. It is similar to the worklflow above and performs a re–analysis of the RNA-Seq data analyzed in Uslu et. al. – Long-range enhancers regulating Myc expression are required for normal facial morphogenesis, 2014

It has been created for the DNA/RNA module of the 2014 EMBL predoc course and uses html instead of LaTeX.

The material is largely based on the documentation of the DESeq2 package and the rnaseqGene workflow on Bioconductor by Mike Love, Simon Anders and Wolfgang Huber.

Type | Link |
---|---|

Lab | Lab html |

R script | R–script |

Rmd | .Rmd file |

This is an introduction to Machine Learning, it is still work in progress.

This is heavily based on material form S. Arora (Bioc Seattle, Oct 14) and VJ Carey (Brixen 2011).

Type | Link |
---|---|

Lab | Lab html |

R script | R–script |

Rmd | .Rmd file |

This material discusses the analysis of a high content screening data set. It has been prepared for the EMBO course High–Throughput Microscopy for Systems Biology in October 2016.

You can find its git repository here.