zooming out

Surprising Degree Of Large-scale Variation In The Human Genome (the Science paper is here).
Researchers at Cold Spring Harbor Labs using ROMA (representational oligonucleotide microarray analysis) to investigate the differences between tumour and normal cells included a normal-normal control to establish lower limits of variability. What they found was that the genomes of normal individuals vary not just at the level of the individual nucleotide or even gene, but also on a much larger scale, with deletions and duplications from 100,000 b to 1 Mb (b = base, or more accurately base pair, a single “rung” on the familiar twisted rope ladder image of DNA).
What ROMA does (there’s a good explanatory paper here) is to compare reduced-complexity representations of two genomes. The current average resolution is one probe every 35 kb. The authors say that 10-15 kb is feasible, but the more granular comparison may be more interesting, at least initially, because it shows the “big picture” — like zooming out on a map. (There is some tradeoff, of course; earlier lower-resolution studies found far fewer polymorphisms.)
So, how big is 100 kb – 1 Mb? The entire genome is about 3000 Mb, and contains about 30,000 genes, so the “average gene” is about 100 kb. This is a bit misleading since a typical gene is a few hundred to several thousand bases of coding sequence, which may be spread out across hundreds of kb but is more usually contained within, say, a few tens of kb. So, 100-1000 kb is easily big enough to encompass a whole gene, or even quite a few entire genes. Indeed, the authors found variation in some 70 genes, including the gene which causes Cohen syndrome and genes known to be involved in neurodevelopment, leukaemia, drug resistance in breast cancer and body weight regulation.
The team compared twenty individual genomes and found 76 unique CNPs (copy number polymorphisms, the authors’ name for the large deletions/duplications they are screening). The average CNP was 465 kb (median 222 kb) and individuals differed from each other by an average of 11 CNPs, so if the sample is representative (and the subjects were from a variety of geographic backgrounds) people differ from each other by around 4-5 million bases out of 3 billion, or 0.13-0.16%. The authors give multiple reasons to expect the observed CNPs to represent only a subset of the total, which they estimate to be 226 CNPs covering 44 Mb, or around 1.5%.