What we can learn from sequencing 1 million human genomes with big data

The first draft of the human genome was published 20 years ago in 2001, took nearly three years and cost between US$500 million and $1 billion. The Human Genome Project has allowed scientists to read, almost end to end, the 3 billion pairs of DNA bases – or “letters” – that biologically define a human being.

That project has allowed a new generation of researchers like me, currently a postdoctoral fellow at the National Cancer Institute, to identify novel targets for cancer treatments, engineer mice with human immune systems and even build a webpage where anyone can navigate the entire human genome with the same ease with which you use Google Maps.

The first complete genome was generated from a handful of anonymous donors to try to produce a reference genome that represented more than just one single individual. But this fell far short of encompassing the wide diversity of human populations in the world. No two people are the same and no two genomes are the same, either. If researchers wanted to understand humanity in all its diversity, it would take sequencing thousands or millions of complete genomes. Now, a project like that is underway.

