Alignment with Bowtie (and Bowtie2)

For those who only know a bow tie as something worn by hipsters or really fancy people — Bowtie is a very powerful bioinformatics tool that has a diverse array of applications.  Most often I will use Bowtie to map RNA transcripts back to a known genome; however, you can also use Bowtie to assess how well your assembly performed or for any instance where you want to find how many of your high throughput sequences map back to a [longer] sequence or genomes of interest.

What makes Bowtie special is that it requires little RAM (can easily run on your laptop) and is very fast — or as the creators of Bowtie declare: ultrafast (aligning more than 25 million reads to the human genome in 1 CPU hour) .  To achieve such speed at low memory, Bowtie first indexes the reference genome (or set of sequences) using a scheme based on the Burrows Wheeler Transform (BWT) and FM-index and then searches the index for inexact alignments.  Note that in sequence alignment, an inexact match is desired because there may be sequencing error or small genetic variations between the reference sequence and the query sequence.  However,  although Bowtie is a powerful tool it may not always be the best choice your alignment needs.

If you are interested in learning more about how the Bowtie algorithm works, keep posted on this blog but for now, let me show you the general commands used in operating Bowtie:

For this example I will use the dataset provided in the download of Bowtie (version 1.0.0).  If you are using Bowtie for the first time, please refer to the bottom of this post for how to make Bowtie operational on MacOS.  The zip file you download contains the executables (bowtie and bowtie-build) as well as a “genomes”, “reads”, and “indexes” folder.  Within the genome folder is the complete genome of E. coli 536 which was used to build the indexes found in the index folder.

To build these indexes, the creators of Bowtie ran the bowtie-build command.  You will need to build a Bowtie index for each reference set you intend on using.  To do this, enter your Bowtie directory and type the command:

bowtie-build [options] <reference_in> <ebwt_outfile_prefix>

or in the case of this example

bowtie-build -f  genomes/NC_008253.fna  e_coli_example

**Note if commands have not been moved to your path, you need to use the “./bowtie-build” and “./bowtie” command

After this code completes, you will have 6 files:

e_coli_example.1.ebwt

e_coli_example.2.ebwt

e_coli_example.3.ebwt

e_coli_example.4.ebwt

e_coli_example.rev.1.ebwt

e_coli_example.rev.2.ebwt

These files will be used in the next command where you execute Bowtie.  Note that all six files have the same prefix as this prefix will be used to indicate the index in the “bowtie” command.  To execute “bowtie” we will use the following command options to align the paired end reads e_coli_1000_1.fa and e_coli_1000_2.fa to the E. coli genome:

bowtiecommand

and get the output

bowtieout

The -f option indicates that you are using a fasta file and -t keeps track of how long the alignment takes.  “–al” will write a fasta file containing all aligned sequences.  Finally, “-1” indicates you are using the “_1” fasta file of paired end sequencing and  “-2” indicates you are using the “_2” fasta file.  Please refer to the manual for other options that you may be interested in using.

For First Time Users

To build Bowtie from the source code provided, you will need a GNU-like environment that includes GCC, GNU make, etc.  For working on a Mac please refer to the command line tools discussed here.

To build Bowtie, you will need to extract the sources (unzip the file), enter your Bowtie directory on the terminal and type the command “make”.  From here, the Bowtie executables can be run.  If you want be able to call Bowtie from any directory in your terminal, you will need to move “bowtie-build” and “bowtie” directly to your PATH.   You can do this by using the command

sudo mv bowtie /usr/bin

Now if you simply type “bowtie” or “bowtie-build” in your command line you will see the warning that:

No input sequence or sequence file specified!

Usage: bowtie-build [options]* <reference_in> <ebwt_outfile_base>

    reference_in            comma-separated list of files with ref sequences

    ebwt_outfile_base       write Ebwt data to files with this dir/basename

Options:

…..

This result means that bowtie-build [or bowtie] has been successfully moved to your PATH.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s