Preliminary Program

Because this workshop is focused on a topic that the participants will likely have little-to-no experience with, it is important that the learning material, presentations, and exercises retain consistency throughout the workshop to minimize potential confusion. To this end, the personnel for the workshop will consist of a core set of presenters who will work closely together to produce the main materials for the workshop, along with a group of teaching assistants (~1 TA per 4 – 6 people) to actively work with participants to answer questions and assist with exercises.


Kylie Bemis (Northeastern University)
Ryan Benz (Seer Inc.)
Meena Choi (Northeastern University)
Heath Patterson (Vanderbilt University)
Olga Vitek (Northeastern University)

Teaching Assistants (1 TA per 4 - 6 people)

Dan Guo (Northeastern University)
Jeffrey Jones (SoCal Bioinformatics)
Ting Huang (Northeastern University)
Mateusz Staniak (University of Wroclaw)

The information below describes some of the main topics that will be covered during the sessions. These specific topics may be adjusted and tuned while the workshop material is created. Each 1 hr. 30 min. session will be roughly divided into 45 min. of lecture/presentation and 45 min. of hands-on exercises, except for session 8 which will be devoted to lectures only.

Saturday, November 14

8:00 - 9:00 am, Registration & Continental Breakfast

9:00 - 10:30 am, Session 1: R Fundamentals

  • Introduce RStudio and its main components
  • Basic R syntax: math expressions, variables, assignment, common functions
  • Working with vectors and data frames

10:30- 11:00 am, Break

11:00 am  - 12:30 pm, Session 2: Basic Data Manipulation

  • Reading data into R, read_* functions
  • dplyr data manipulation verbs
  • Using dplyr to transform tabular data (i.e. data frames, tibbles)
  • Rearranging data, wide vs. long data formats, moving between them

12:30- 1:30 pm, Group Photo & Lunch

1:30 - 3:00 pm, Session 3: Basic Data Visualization with ggplot2

  • Building-up a graphic from individual components
  • Mapping variables in a data set to the aesthetic properties of a graphic
  • The ggplot2 graphing template
  • Making a basic plot

3:00 - 3:30 pm, Break

3:30 - 5:00 pm, Session 4:  Reproducible Data Analysis, Part 1

  • What is reproducible data analysis and why is it important?
  • Using scripts (or literate programming) to capture all stages of an analysis
  • Using functions to reduce code duplication, what makes a good function, how to write functions

5:00- 6:00 pm, Happy Hour / Networking

6:00 pm, Dinner (on your own)

Friday, November 15

8:00 - 9:00am, Continental Breakfast

9:00- 10:30 am, Session 5:  Understanding Ionization II

  • What is RMarkdown and why is it useful?
  • Markdown overview, adding code blocks to mix R with Markdown in a single document including plots, tables, and commentary/documentation
  • Using RMarkdown to create reports that encompass a complete analysis
  • Parameterized RMarkdown documents to create reusable report templates

10:30 - 11:00am, Break

11:00 am - 12:30 pm, Session 6:  Working Session with a Case Study

  • Given an example data set, create an RMarkdown report that characterises/summarizes the data, including plots, tables, commentary/documentation

12:30- 1:30 pm, Group Lunch

1:30 - 3:00 pm, Session 7:  Research Project Organization, Workflows and Best Practices

  • Practical guidance on how to structure and work on research projects that supports reproducible work
  • Workflows that capture the entire process of a research project, from accessing raw data all the way to final reporting
  • Practice of essential skills: reading data from formatted text files and Excel spreadsheets, how to spot and troubleshoot data import problems

3:00- 3:30 pm, Break

3:30 - 5:00 pm, Session 8: Overview of R Packages for General Research and Mass Spectrometry

  • Tidyverse highlights: stringr, lubridate
  • Visualization: patchwork, UpSetR, working with heatmaps and PCA, exporting plots to PowerPoint
  • MS specific: MSStats, Cardinal, MSnBase