This month marks the 10th anniversary of the completion of the Human Genome Project, the project to determine the sequence of the letters that make up human DNA.
Looking back now, Eric Green, director of the National Human Genome Research Institute at the National Institutes of Health, says the milestone opened the door to understanding many diseases. But as far as the science has come in the 10 years since the first human genome was sequenced, Green says there’s so much further to go in terms of finding the genetic causes for disease and using that knowledge to develop new treatments.
“At best, today we have a Cliff’s Notes version of the human genome,” Green said. “It is like a great novel that will take decades and decades to interpret.”
Green’s comments came during the kickoff event Tuesday for the National Consortium for Data Science, a new national big data organization that has launched in Chapel Hill. NCDS’ members include industry, research institutions and universities. The consortium aims to find solutions for the challenges of working with large data sets.
Besides Green’s talk, NCDS also held a “leadership summit” that attracted about 75 data and genomics leaders from around the country. The sum mitt is expected to become an annual event with each summit addressing a major problem in Big Data. For the first sum mitt, attendees discussed the application of genomic data to health care.
The first human genome was sequenced at a cost of about $1 billion. Getting the cost down to $1,000 was seen by some as an unrealistic goal, Green said. But sequencing technologies have brought down these costs considerably to the point where they are now around $,3,000 or $4,000. New technologies are not only cutting the costs, they’re also shaving the time it takes for genomic sequencing.
“I don’t stay up at night worrying about how we’re going to get to the $1,000 genome,” Green said.
Big Data bottlenecks
But that doesn’t mean that Green doesn’t worry. He sees several bottlenecks that are already slowing the progress of understanding and finding solutions from Big Data. The new technologies are producing data faster than scientists can assimilate it. While the cost of sequencing is coming down, the costs to analyze the data are going up. The sheer volume of data presents challenges in storing and managing the data. And facilities need tremendous bandwidth to push that data from site to site.
Big Data also presents human challenges. Green worries whether the next generation of data scientists are being prepared to address these bottlenecks.
The National Institutes of Health will play a role in addressing biomedical Big Data challenges. Late last year, the NIH launched its “Big Data to Knowledge initiative,” or BD2K for short. The initiative’s goals include improving how biomedical Big Data are shared and used. The NIH created a new position, Associated Director for Data Science, which will be responsible for overseeing the effort. At the request of NIH Director Francis Collins, Green took on this role on a temporary basis. But Green said that the search for a permanent Associate Director for Data Science is expected to conclude in coming weeks.