Everyone (or so it seems) is talking about the cloud…
I have the privilege of being on the ground, working every day with these technologies and I’m seeing the actual transformation – the people who are beginning to embrace it, the scientists who are interested in using it, and some of the problems it has actually solved. Rather than hyping an idea or pushing a particular technology, I thought I’d take this opportunity to discuss some examples of the work we’ve been involved with and where we think it is headed.
Moderate-Scale High Performance Computing
At ORNL we have a number of significant computational platforms that are used to support scientific work around the country. These are certainly impressive platforms and the arsenal is unparalleled world-wide – which might cause one to wonder why we would be interested in cloud computing at all. While there are a number of subtleties to the answer, the key points include: the machines are often fully utilized; many scientific codes don’t express themselves well to the degree of parallelism supported by the largest machines; and there are a number of solutions to scientific problems that utilize a different resource set (i.e. extreme differences in the compute-to-communication ratio) than is provided by the leadership-class platforms. ORNL is home to a number of small- and mid-range clusters, but these are often overloaded and queue times are longer than desired. It is in this space – the lower-end of HPC (work targeted at 64-1000 processing cores) where we see some of the greatest interest in cloud computing.
Many of the jobs in this arena are classified as “embarrassingly parallel” or “data parallelized” and tend to work very well in current cloud computing environments. Like many others, we have seen computational biology work run well in these environments (BLAST, HUMMR, and others). Post-processing of supercomputer-generated data is another area of growth: data parsing, publishing, image generation, and visualization assembly activities – all based on climate simulations run on traditional supercomputers – have been performed using both the Microsoft Windows Azure and Amazon Web Services platforms. Recently announced (Summer 2010) new offerings by cloud vendors such as Amazon Web Services have us excited as some codes that would have traditionally been considered poor candidates for cloud computing (e.g. algorithmically parallel codes such as computational fluid dynamics and astrophysics codes) are now increasingly viable. Changes such as these are promising as they provide a no-code-change approach to performing scientific research in the cloud with very few technical barriers.
Entryway to Larger-Scale Computing
At the other end of the spectrum are scientists who have been developing codes that have traditionally run on a single workstation and are now no longer finding that sufficient. Sometimes this situation arises due to pure organic growth of the research initiative (changes to codes impose greater constraints on the machine, the size of the data set increases significantly, etc.) while at other times it is caused by a desire to avoid the complexities inherent in HPC-style programming. In a number of these situations, the simplified development paradigm of some of the Platform as a Service (PaaS) offerings (such as Microsoft’s Windows Azure) has introduced an ease of scale that would have been otherwise out of reach. In one example, based on work done by members of my team this summer, we took Geographic Information Systems (GIS) codes that process multi-mode satellite imagery and developed an end-to-end solution for expressing the problem set, deploying to the cloud, performing the calculations, and returning the data to the user – all of this without requiring the domain scientist to know the intricacies of the particular cloud computing platform. Similar work is currently on-going for text-analysis and document clustering codes. It is this level of simplicity (abstraction of the cloud, if you will) that excites us about the future of PaaS for science.
The final area where we are seeing significant interest and some early work in is that of using cloud computing platforms as data distribution and staging platforms. The data platforms offered by the major cloud vendors expose a number of features that would be difficult for a normal research facility to provide such as seemingly infinite scale and flexible yet simple content distribution networks. After a large simulation run on one of the leadership class machines, the project is often tasked with making the results available to collaborators around the world. When these results scale in the Terabyte and towards the Petabyte range, this is no small challenge. In comparison, the major cloud vendors allow for relatively easy uploading and distribution of that data. The built-in, contract-free content delivery networks allow researchers to benefit from geo-located caches to improve performance of distribution while reducing the load on their existing infrastructures.
Additional early work is being performed using cloud storage as a buffer for moving data in and out of HPC centers just prior to and immediately following large-scale jobs. This sort of activity holds the potential to allow HPC centers to constrain their spending on expensive local scratch space (extremely fast disks and storage platforms) while still allowing users the appropriate time to stage their data and to analyze the results afterwards.
These are exciting times, and the industry is moving fast. As the public cloud vendors do more to remove the barriers between government workers and their offerings, demand for utility-style computing is likely to do nothing but grow. As far as our little corner of the world is concerned, we are anticipating the changes and looking for ways to reduce the time to insight for our researchers and deliver more science in the most economically responsible fashion possible.