The analysis of Next Generation Sequencing data can lend itself extremely well to “cloud” computing. Cloud services can offer biologists easy to use applications—much like Google Earth, Google Maps or web-search itself—to do sophisticated, domain specific analyses. A challenge in applying the cloud paradigm to Next Generation Sequencing, however, is that network bottlenecks to commercial clouds can make the otherwise favorable economic benefits more complex. CloudHealth Genomics have overcome these bottlenecks by partnering with the Shanghai Supercomputer Center and building a local cloud “Next Generation Sequencing, Supercompute Cloud to achieve the following:

1. Alleviate the computational bottlenecks brought on by the massive quantities of new data generated by high-throughput biomedical data acquisition platforms and, in particular, Next Generation Sequencing;

2. Help customers understand how to benefit from the cloud paradigm by building a hardened, secure “cloud” environment that should be familiar to commercial (e.g. Amazon) cloud users while still offering new hardware (such as solid-state-disk, GPUs, FPGAs etc.) or software (WX2, Hadoop, etc.);

3. Nurture the development of a critical-mass of Next-Generation-Sequencing informatics users—focused on clinical and translational research—in the otherwise chaotic and decentralized Chinese community.

CloudHealth Genomics “NGS cloud” will provide the necessary software and hardware building blocks to deploy secure, “petascale” informatics pipelines and databases for all CloudHealth’s NGS instruments throughout our facilities. One application, built on the cloud, could be a warehouse of genomic variants from both public and private samples integrated directly with PHRs/EMRs. Other applications could include state-of-the-art data management, as well as, primary data processing, base-calling, read-placement and variant detection pipelines for all next-generation-sequencing platforms of interest to our customers.

CloudHealth Genomics has built a large analytic system with capacity of greater than 5 petabytes to support the Bioinformatics infrastructure.