The Command Line & Some Tricks

Almost all metagenomic tools are executed from the command line so, unsurprisingly, learning how to use the  command line was the first (and essential!) step for me to be able to perform analysis on metagenomic data.  Not only does the command line allow you to write and execute scripts but it also saves time and frustration when viewing and moving through large fasta/fastq files.

For those reading, I am executing all commands through the Mac Terminal on OS X version 10.9

Continue reading

What we know so far…

cropped-biofilms_pic1.jpg

A bacterial biofilm growing on nutrient rich fracture water 1.3 km below the surface in Beatrix Gold Mine, South Africa. NOTE: This image does not represent the bacterial communities I sample.

In order to study life kilometers below the earth’s surface, subterraneauts travel underground through deep mine shafts around the globe.  These scientists collect and analyze fracture waters that have been locked away for thousands of years — completely removed from the sun.  The deepest and most well-studied mines are located in South Africa.  Scientists that study these deep sites use  a “prawn” or similar device (see below) attached to a borehole to sample water that hides meters beyond the mine’s walls.

One of the most recognized discoveries of deep subsurface research is the unprecedented identification of “an ecosystem of one”.  Here, scientists performed a metagenomic study on a 2.8 km deep fracture water community and found that a novel bacterium, Candidatus Desulforudis audaxviator,  accounted for >99.9% of the microbial community (Chivian et al., 2008).  In order to survive on its own, the genome of D. audaxviator  reveals Continue reading

Welcome!

The subterranaut (pronounced: sub\cdotterrain\cdotnot) was inspired by my work on high throughput sequence data of the deep terrestrial subsurface.  

Coming from a background in biochemistry, I realized that there was a steep learning curve to begin working with high throughput sequence data.  Raw data files were larger than my iTunes library and scanning through endless rows on an Excel file felt like a hopeless task.  I knew that there must be better way to handle my large datasets and analyze them.  This blog contains the steps I took to understand my sequence data.  Many of the posts are direct answers to the wide range of questions that I had as I began to enter this field.  I hope others can use this blog as a roadmap to ease their transition from working in the lab to working on high throughput sequence data.

I hope you enjoy!