In my last post, I described what a De Bruijn Graph assembler is and will now go into a short tutorial of how to begin using MetaVelvet.  MetaVelvet is an extension of the popular single genome De Bruijn Graph assembler, Velvet, and is optimized to handle the varying coverage and diversity of genomes in metagenomic samples and is executed through 3 steps: velveth, velvetg, and meta-velvetg.

To begin, you should first download the Velvet source code from here.  Decompress the tar ball, change to the Velvet directory, and compile the Velvet execution files.

~$ cd Velvet-vX.X.XX

Velvet-vX.X.XX$ make


Velvet-vX.X.XX$ make [MAXKMERLENGTH = k] 

The option ‘MAXKMERLENGTH’ is optional.  Default k is 31.

Next, you will need the MetaVelvet extension which can be found here.  As with the Velvet source code, decompressing the tar ball, change to the MetaVelvet directory and compile the MetaVelvet execution files.

~$ cd MetaVelvet-vX.X.XX

MetaVelvet-vX.X.XX$ make


MetaVelvet-vX.X.XX$ make [MAXKMERLENGTH = k] 

The option ‘MAXKMERLENGTH’ is optional.  Default k is 31.

If you are new to installing source codes, a good way to see if you have compiled the code properly is to try and run an executable using (eg. ./velveth).  If everything is working properly, you will se a long list of options as to how to format the velveth execution code.  It may also be helpful to combine your Velvet and MetaVelvet executables in a single directory or move them to your /usr/bin directory.

Hopefully, at this point your Velvet and MetaVelvet executables are working properly and you can begin your first assembly.  As mentioned earlier, you will need to execute a series of commands (velveth, velvetg, and meta-velvetg) to complete the assembly on your FASTA or FASTQ sequence files (also supports sam, bam, eland, and gerald formats).  The first step is a simple command of:

velveth directory hash_length {[-file_format][-read_type][-separate|-interleaved] filename1 [filename2 …]} {…} [options]

This step will take your sequence file(s), produce a hashtable and generate 2 output files in your designated directory — “Sequences” and “Roadmaps” — necessary for the use of the velvetg command that constructs and traverses the De Bruijn Graph to generate a contigs file.  Notably, in order to be able to execute meta-velvetg, you need to add an additional command (“-exp_cov auto”) to the velvetg default options to generate a “Graphs2” file.

velvetg directory -exp_cov auto [options]

The final step of the MetaVelvet pipeline is the execution of meta-velvetg that involves decompressing the De Bruijn Graph built by velvetg into individual subgraphs to resolve and lengthen contigs of individual species genomes.  To execute:

meta-velvetg directory [options] | tee logfile

After completing these 3 steps, your directory will now contain 2 important file types: the contigs.fa files and the stats.txt files.  You can check various stats on your contigs.fa file using scripts like the ones listed here and, with the stats.txt file, you can very easily visualize coverage across your contigs by downloading R and the plotrix library and using the following R commands:

(R) > library(plotrix)

(R) > data = read.table(“stats.txt”, header=TRUE)

(R) > weighted.hist(data$short1_cov, data$lgth)


As you can see, there are multiple peaks around 14, 22,36, and 58.  Each individual peak should contain sequences derived from individual genomes or groups of genomes at similar abundance.  Manually setting your coverage peaks can greatly improve your assembly, so I would recommend you to define your coverage peaks during the execution of meta-velvetg as follows:

meta-velvetg directory -exp_covs 58_36_22_14

A side note: for the assembly of our metagenomes ~ 20 – 30 million reads, I did not use my own personal computer.  The construction of the De Bruijn Graph requires a large amount of memory and, thus, I used a Princeton University cluster instead.


2 thoughts on “MetaVelvet

  1. Thanks for starting this blog. I just started using MetaVelvet and hope that you will post some future blogs on how you used the different parameters to optimize your assembly. It would be great to see some examples of what kind of data goes in and what kind of outputs can result from the various options. Great website!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s