The N50 is metric that is often associated with the length of contigs post-assembly. In my mind, I find it best to think of it as a weighted median where longer contigs are given a greater weight. The N50 is defined as follows:

Given a set of contigs, each with its own length, the *N50* is the “length *N* for which 50% of all bases in the sequences are in a sequence of length *L < N*. This can be found mathematically as follows: Take a list *L* of positive integers. Create another list *L’ *, which is identical to *L*, except that every element *n* in *L* has been replaced with *n* copies of itself. Then the median of *L’ *is the N50 of *L*. For example: If *L* = {2, 2, 2, 3, 3, 4, 8, 8}, then *L’ *consists of six 2’s, six 3’s, four 4’s, and sixteen 8’s; the N50 of *L* is the median of *L’ *, which is 6.” (Broad Institute).

Based on this definition, I came up with this python script to calculate the N50 of a contigs file.

### Like this:

Like Loading...

*Related*