Bioinformatics Foundations
On this page:
- Class 0: Getting oriented.
- Class 1. Transcriptomics and the analysis of RNA-Seq data.
- Class 2: RNA-Seq analysis mini-project.
Class 0: Getting oriented
Topics:
Course introduction, Learning goals & expectations, Meet the instructional team. Seting up your computer with required software. Refresh your knoweledge of basic UNIX and R.
Goals:
- Understand course scope, expectations, logistics and ethics code.
- Setup your computer for this course.
- Familiarity with major R data structures (vectors, data.frames and lists),
- Understand the basics of using R functions (arguments, vectorizion and re-cycling).
- Be able to install R packages from CRAN and BioConductor.
- Use UNIX command-line tools for file system navigation and text file manipulation.
Supporting material:
- Handout: Class Structure,
- Computer Setup Instructions.
Optional Recap Videos from BGGN213:
- 0.1.1 - Introduction to bioinformatics (what, where and why of bioinformatics),
- 0.1.2 - Major bioinformatics resource providers (NCBI and EBI),
0.1.3 - A quick tour of the GENE, UniProt, GO, OMIM, PDB and PFAM.
- 0.2.1 - Major R data structures, data types, and using functions,
- 0.2.2 - Introduction to ggplot,
- 0.2.3 - Introduction to CRAN & BioConductor,
0.2.4 - Quick introduction to RMarkdown,
- 0.3.1 - Essential UNIX for bioinformatics I,
- 0.3.2 - Essential UNIX for bioinformatics II,
- 0.3.3 - Manipulating files on UNIX machines
- 0.3.4 - UNIX superpowers: using pipes and conecting to remote machines.
Class 1. Transcriptomics and the analysis of RNA-Seq data
Topics: Analysis of RNA-Seq data with R, Differential expression tests, RNA-Seq statistics, Counts and FPKMs, Normalizing for sequencing depth, DESeq2 analysis. Gene finding and functional annotation from high throughput sequencing data, Functional databases KEGG, InterPro, GO ontologies and functional enrichment.
Goals:
- Given an RNA-Seq dataset, find the set of significantly differentially expressed genes and their annotations.
- Gain competency with data import, processing and analysis with DESeq2 and other bioconductor packages.
- Understand the structure of count data and metadata required for running analysis.
- Be able to extract, explore, visualize and export results.
- Perform a GO analysis to identify the pathways relevant to a set of genes (e.g. identified by transcriptomic study or a proteomic experiment). Use both Bioconductor packages and online tools to interpret gene lists and annotate potential gene functions.
Videos:
- 1.1 - Differential expression analysis of RNA-Seq data,
- 1.2 - Differential expression tests and pathway analysis,
- 1.3 - Installing Bioconductor and DESeq2,
- 1.4 - Principal component analysis (PCA) Pt.1,
Supporting material:
- Slides: Large PDF,
- Lab: Hands-on section worksheet ,
- Detailed Bioconductor setup instructions.
- WebApp: Introduction to PCA,
- Muddy point assessment.
Readings:
- Excellent review article: Conesa et al. A survey of best practices for RNA-seq data analysis. Genome Biology 17:13 (2016).
- An oldey but a goodie: Soneson et al. “Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences.” F1000Research 4 (2015).
Homework:
- Submit your completed PDF lab report to GradeScope,
Class 2: RNA-Seq analysis mini-project
Topics: Differential expression analysis project, Working with GEO and DESeq2 followed by gene enrichment and functional annotation with KEGG and GO ontologies.
- Lab: DESeq2 analysis mini-project.
- Muddy point assessment. TO UPDATE
Homework:
- Submit your completed PDF lab report to GradeScope,