Today was a busy day but, at the end of the day, the one talk that stands out in my mind is Ruth Ley’s talk on the human microbiome. Personally, I find this topic to be extremely interesting and Ruth’s presentation did not disappoint. The talk began with an introduction to a phenomenon in human history: the introduction of agriculture. Now, we can all imagine that during the shift from hunting and gathering to farming the human microbiome changed to adjust to the new diet. Perhaps one of the most important changes was the introduction of starch heavy foods that were not the basis of hunter-gather diets. One gene in particular, AMY1, codes for Continue reading
There were some really cool sessions at ISME today. The morning started off with a plenary talk by Lars Peter Nielsen about cable bacteria and I sampled the sessions on “Unusual strategies of microbial energy acquisition,” “Network microbial ecology,” and “Meta-ome information to microbial ecology.” For each session I have picked a favorite talk starting with Victoria Orphan‘s talk on the archaeal-bacterial partnerships responsible for sulfate-coupled anaerobic oxidation of methane (AOM). Continue reading
This week I am at the International Society of Microbial Ecology (ISME15) Conference in Seoul, South Korea! I want to share with you some of my favorite talks from each day. Today, I spent most of my time in the “Microbiomes of marine ecosystems” session and made some stops at other sequencing related talks. Perhaps my favorite talk of the day goes to… Antje Boetius! Her talk focused on the microbial diversity of the surface of seafloor sediment and Continue reading
The World Cup is underway and I am back to the blog! I was recently inspired by the World Cup Paninis to write a post about sequencing depth. In sequencing, the problem we are often faced with is whether or not enough sequences have been generated to be representative of a population. Tools we often use to determine whether we have sampled enough are rarefaction curves and the Chao estimate but how many sequences would we need to generate in order to capture a 16S from every organism present in an environment? Continue reading
In my last post, I described what a De Bruijn Graph assembler is and will now go into a short tutorial of how to begin using MetaVelvet. MetaVelvet is an extension of the popular single genome De Bruijn Graph assembler, Velvet, and is optimized to handle the varying coverage and diversity of genomes in metagenomic samples and is executed through 3 steps: velveth, velvetg, and meta-velvetg.
The N50 is metric that is often associated with the length of contigs post-assembly. In my mind, I find it best to think of it as a weighted median where longer contigs are given a greater weight. The N50 is defined as follows: Continue reading
When our lab got its first metagenomic dataset, the first thing we did was upload our QC filtered and merged paired-end Illumina reads (mean length 160 bp) onto MG-RAST for annotation. However, when the annotations came back, some organisms whose genomes were known to be present in both our sample and the m5nr reference dataset were missing and, for those sequences that were annotated, the designated e-values centered around 1e-10. In order to improve the annotation of our data, we decided to perform an assembly. Searching the literature, I found that a class of assemblers — called De Bruijn Graph assemblers — were the popular choice for assembly of short read metagenomic data; however, the intuition behind how these assemblers worked was a little less clear. Continue reading
For those who only know a bow tie as something worn by hipsters or really fancy people — Bowtie is a very powerful bioinformatics tool that has a diverse array of applications. Most often I will use Bowtie to map RNA transcripts back to a known genome; however, you can also use Bowtie to assess how well your assembly performed or for any instance where you want to find how many of your high throughput sequences map back to a [longer] sequence or genomes of interest.
What makes Bowtie special is that it requires little RAM (can easily run on your laptop) and is very fast — or as the creators of Bowtie declare: ultrafast (aligning more than 25 million reads to the human genome in 1 CPU hour) . Continue reading
Python is great programming language for processing genomic data. Instead of having to waste mindless hours copying, pasting and clicking through Excel spreadsheets, this easy to learn language has provided me an avenue to write my own scripts to quickly organize, analyze, and process large genomic datasets. Some of the reasons I love Python are:
1. BioPython Package – Within this package lies the tools to easily manipulate and process your sequence data. Time and time again I have found myself needing to count the number of bases, find the lengths of the sequences etc in a sequence file. Bio.SeqIO can be implemented to do just that. SeqIO.parse is a command built into the BioPython Package to iterate a function across all sequences in a fasta file or other genomic format you may be needing. Check out the example in the following code. Here, I can open a fasta file containing thousands of sequences, retrieve the sequence name and the number of adenines in each sequence over a matter of seconds and never even leave the terminal!