Learn Metagenomics

Last spring I taught an introductory course to metagenomics at Princeton University. I wanted to share that the lecture notes and lab exercises are now available online here. Check it out if you’re looking to learn more about metagenomics!



The past few months I have been working with a microbial ecology toolkit called mothur.  So far, it is the most flexible tool I have found to calculate distance matrices from 16S alignments and, subsequently, cluster these sequences into OTUs.  Many people have asked me about how to use mothur so this post will serve as a tutorial.

Continue reading


In my last post, I described what a De Bruijn Graph assembler is and will now go into a short tutorial of how to begin using MetaVelvet.  MetaVelvet is an extension of the popular single genome De Bruijn Graph assembler, Velvet, and is optimized to handle the varying coverage and diversity of genomes in metagenomic samples and is executed through 3 steps: velveth, velvetg, and meta-velvetg.

Continue reading

Additions for your terminal

Something I realized when I began to work with the command line is that there are few extra additions that can be installed to make everything run smoothly within the terminal (Mac OSX 10.9).

1. Xcode 5.0 – This is an integrated development environment made by Apple Developer that provides a nice space to write and debug code.  More importantly is that Xcode contains Command Line Tools that needed to properly compile and install packages you make need in the future.  Note that you may need to have an Apple Developer ID to access the download site

2. Xcode’s Command Line Tools – This package can be added via Xcode or directly downloaded from here.  Note that you may need to have an Apple Developer ID to access the download site

3. MacPorts – allows for easy installation of over 17,000 (e.g. emacs, scipy, NumPy) you may end up needing using the a simple <sudo port install packagename> command

5. Python – Python is a programming language extremely good at handling sequence data.  Most Macs are already setup to use python version 2.7.  To test if you have python simply open the terminal and type “python”


The Command Line & Some Tricks

Almost all metagenomic tools are executed from the command line so, unsurprisingly, learning how to use the  command line was the first (and essential!) step for me to be able to perform analysis on metagenomic data.  Not only does the command line allow you to write and execute scripts but it also saves time and frustration when viewing and moving through large fasta/fastq files.

For those reading, I am executing all commands through the Mac Terminal on OS X version 10.9

Continue reading