There were some really cool sessions at ISME today. The morning started off with a plenary talk by Lars Peter Nielsen about cable bacteria and I sampled the sessions on “Unusual strategies of microbial energy acquisition,” “Network microbial ecology,” and “Meta-ome information to microbial ecology.” For each session I have picked a favorite talk starting with Victoria Orphan‘s talk on the archaeal-bacterial partnerships responsible for sulfate-coupled anaerobic oxidation of methane (AOM). Continue reading
This week I am at the International Society of Microbial Ecology (ISME15) Conference in Seoul, South Korea! I want to share with you some of my favorite talks from each day. Today, I spent most of my time in the “Microbiomes of marine ecosystems” session and made some stops at other sequencing related talks. Perhaps my favorite talk of the day goes to… Antje Boetius! Her talk focused on the microbial diversity of the surface of seafloor sediment and Continue reading
The World Cup is underway and I am back to the blog! I was recently inspired by the World Cup Paninis to write a post about sequencing depth. In sequencing, the problem we are often faced with is whether or not enough sequences have been generated to be representative of a population. Tools we often use to determine whether we have sampled enough are rarefaction curves and the Chao estimate but how many sequences would we need to generate in order to capture a 16S from every organism present in an environment? Continue reading
In my last post, I described what a De Bruijn Graph assembler is and will now go into a short tutorial of how to begin using MetaVelvet. MetaVelvet is an extension of the popular single genome De Bruijn Graph assembler, Velvet, and is optimized to handle the varying coverage and diversity of genomes in metagenomic samples and is executed through 3 steps: velveth, velvetg, and meta-velvetg.
The N50 is metric that is often associated with the length of contigs post-assembly. In my mind, I find it best to think of it as a weighted median where longer contigs are given a greater weight. The N50 is defined as follows: Continue reading
When our lab got its first metagenomic dataset, the first thing we did was upload our QC filtered and merged paired-end Illumina reads (mean length 160 bp) onto MG-RAST for annotation. However, when the annotations came back, some organisms whose genomes were known to be present in both our sample and the m5nr reference dataset were missing and, for those sequences that were annotated, the designated e-values centered around 1e-10. In order to improve the annotation of our data, we decided to perform an assembly. Searching the literature, I found that a class of assemblers — called De Bruijn Graph assemblers — were the popular choice for assembly of short read metagenomic data; however, the intuition behind how these assemblers worked was a little less clear. Continue reading
For those who only know a bow tie as something worn by hipsters or really fancy people — Bowtie is a very powerful bioinformatics tool that has a diverse array of applications. Most often I will use Bowtie to map RNA transcripts back to a known genome; however, you can also use Bowtie to assess how well your assembly performed or for any instance where you want to find how many of your high throughput sequences map back to a [longer] sequence or genomes of interest.
What makes Bowtie special is that it requires little RAM (can easily run on your laptop) and is very fast — or as the creators of Bowtie declare: ultrafast (aligning more than 25 million reads to the human genome in 1 CPU hour) . Continue reading
Python is great programming language for processing genomic data. Instead of having to waste mindless hours copying, pasting and clicking through Excel spreadsheets, this easy to learn language has provided me an avenue to write my own scripts to quickly organize, analyze, and process large genomic datasets. Some of the reasons I love Python are:
1. BioPython Package – Within this package lies the tools to easily manipulate and process your sequence data. Time and time again I have found myself needing to count the number of bases, find the lengths of the sequences etc in a sequence file. Bio.SeqIO can be implemented to do just that. SeqIO.parse is a command built into the BioPython Package to iterate a function across all sequences in a fasta file or other genomic format you may be needing. Check out the example in the following code. Here, I can open a fasta file containing thousands of sequences, retrieve the sequence name and the number of adenines in each sequence over a matter of seconds and never even leave the terminal!
Something I realized when I began to work with the command line is that there are few extra additions that can be installed to make everything run smoothly within the terminal (Mac OSX 10.9).
1. Xcode 5.0 – This is an integrated development environment made by Apple Developer that provides a nice space to write and debug code. More importantly is that Xcode contains Command Line Tools that needed to properly compile and install packages you make need in the future. Note that you may need to have an Apple Developer ID to access the download site
5. Python – Python is a programming language extremely good at handling sequence data. Most Macs are already setup to use python version 2.7. To test if you have python simply open the terminal and type “python”