Digital Forensics and the Cloud


Written by

I work with a research group that generally focuses on text analysis/mining and Bayesian networks, but recently applied their strengths to the area of digital forensics. Specifically, they developed tools that are used by local police departments to aid in the prosecution of child pornographers. In one of our more recent meetings, we began discussing the role that cloud computing can play in this problem domain – how can it help, how can it hurt, what work needs to be done to address the resultant issues. While our collaboration effort is still in the nascent stages, we’ve established a handful of “known’s” that are worthy of broader conversation. As with most technologies, there is both a good and a dark side to the use of cloud computing. My goal in this article is not to paint the cloud with a black brush, but rather to highlight some unique issues and call to mind challenges that exist and must be dealt with.

Before digging in, I’d like to define some terms. For the purposes of this article, cloud computing refers to large on-demand compute and storage platforms such as Amazon Web Services, Microsoft Windows Azure, Rackspace and others. Cloud-based storage solutions such as DropBox, SkyDrive and others could be included as well. Also, the term criminal in this article is, based on the definitions provided by the US Computer Emergency Readiness Team (US-CERT), referring to those classified in the categories of hackers and organized crime groups although all of the categories could apply to one degree or another. For a more specific example, we can refer to the original target of the research group – those involved in the distribution and consumption of child pornography.

1. Digital Forensics can be greatly assisted by cloud computing. While this statement may seem like marketing hype, it is actually quite true. Many of the steps involved in digital forensics are tasks that benefit from large-scale compute – something that isn’t readily available at most local police departments. Three examples come to mind… the first is log processing at scale. If you are tasked with analyzing a few TB of log files attempting to find the correlations between a suspect and specific content networks, a platform such as Hadoop (Map/Reduce) could be quite useful. Companies such as Amazon Web Services offer hadoop-on-demand (Elastic Map Reduce) platforms that can provide an investigator access to a Hadoop cluster, when he needs it, for just the time he needs it, without the overhead to the unit of having to manage the associated hardware and software. The second example is that of cracking encryption algorithms. Many times the algorithms used by criminals are difficult but not impossible to crack given the requisite knowledge and patience. However, platforms such as the recently-announced Amazon EC2 Cluster GPU Instances with GPU accelerators offer these investigators uniquely-able hardware that is designed for this type of work in a cost-effective manner – a simple example is provided by Thomas Roth’s use of these instances to illustrate the weaknesses of SHA1. Finally there are a number of operations involved in the processing of digital evidence that are simply time intensive. This time intensiveness, combined with an increasing load, results in a significant backlog of data to be analyzed, prolonging justice for many, and increasing the likelihood that others will escape capture. The sheer scale of cloud computing environments allow agencies to scale their analysis processes across a number of machines rather than simply waiting while a few local platforms crunch on the data while others sit in the queue.

If you’ve gotten to this point and you are wondering about the legal ramifications of shipping data off to a cloud platform for forensic analysis – you are thinking correctly. Some of the use cases I just described are not currently viable options due to chain of custody issues and territorial boundaries. See point #4 for more on this topic.

2. Law enforcement agencies at the local, state, federal, and international levels must work now to develop tools and expertise in the area of cloud computing. Many criminals are technically savvy and, in an effort to avoid capture and further their interests, they are often early adopters of new and possibly untested technology. Unfortunately this often puts law enforcement agencies at a disadvantage (funding, expertise, etc) and they can find themselves playing a constant game of cat and mouse. Market and technology indicators predict that cloud computing is here to stay, and if it is available, it will likely be used both for good and evil. A quick Internet search returns a plethora of stories of agencies both local and international who desire to advance their cyber abilities but lack of funding or available tools block their progress. While it is certain that technology alone will not solve the problems, the complete lack of tools for gathering forensic evidence in cloud computing environments places the “good guys” at an insurmountable disadvantage. In his book Digital Forensics for Network, Internet, and Cloud Computing: A Forensic Evidence Guide for Moving Targets and Data, author Terence Lillard paints a bleak picture of the effectiveness of current digital forensic tools when applied to a cloud environment. The current generation of tools are designed for local-to-the-network forensics and have little effect in a black-box, heavily virtualized & sandboxed cloud environment.

3. Cloud computing can be used to enable new methods of crime. I’ve spent the last few years preaching the benefits of cloud computing including features such as instant/massive scale, programming interfaces (APIs) for dynamically deploying entirely new platforms, security features that allow you to know with certainty that when you turn off a machine, it is gone forever, and many others. However, like so many things in life, if you look at the features I just described with malicious intent, you can see the foundations for a fairly robust evasion network as well as a platform that is full of plausible deniability. As a thought exercise, a colleague and I sat down and attempted to figure out how we would develop a platform to distribute illegal digital content using cloud computing technologies to help avoid capture. What was most disturbing about the exercise was that, not long into it we became convinced that such an operation was not only plausible, but relatively simple to construct. It is now possible to get a collection of servers, in a handful of geo-dispersed locations, all provisioned with a few scripts, and (assuming you are using a stolen credit card) all done with complete anonymity. The ease of use of the APIs allow for one to “move” their entire operation via a handful of script calls as soon as they feel any pressure from law enforcement (or, they could simply move on a pre-defined schedule to help avoid detection). Combined with existing and well known evasion techniques such as Fast Flux DNS, cloud technologies can make capturing and successfully prosecuting perpetrators extremely difficult. Further adding to the weight of the situation was a meeting a few weeks later with actual law enforcement officials who confirmed that they had seen digital hints that, at least parts of our theory, were already being used by suspects.

4. As cloud computing becomes an increasingly used tool by criminals, laws must be adjusted to deal with the differences between traditional territorial boundaries. If you have read this far and have any knowledge of US Law, you are aware of the litany of issues that have been hinted at or glossed over by some of my prior comments. I am certainly no legal expert, but I view this aspect of the problem as possibly more challenging than the technology issues. Take, for example, the conversation I mentioned earlier – the law enforcement officials we spoke with were from one state, while the digital fingerprints they uncovered pointed to a cloud service that physically resides in a different state. The question that immediately arises is that of jurisdiction. If this becomes a federal case, how do the local agencies convince the feds of the import of the case? Further, a benefit to the technologies involved is that it is possible to continue to trace digital “hints” to develop a network of individuals involved in a particular crime however these hints will likely carry the investigation not only across state lines, but also national borders. One wonders how many cases are simply dropped, or drastically limited due to jurisdiction or other legal issues. While there are well established reasons for national and state autonomy, cloud computing introduces an ease of cross-border crime that presents unique challenges to the existing laws.

Additionally, enhanced communication and cooperation between agencies is paramount and any barriers to this need to be addressed. In speaking with some officials, there are partnerships and collaboration activities that occur amongst some agencies, but this could certainly be expanded. In a simple example one cyber security analyst discussed a hacking case wherein they knew who had attacked them but it wasn’t until after they spoke with other partner organizations that they were able to gather enough evidence to successfully prosecute.


As a research community, it is incumbent upon us to support law enforcement agencies with the development of tools and methodologies by which they can effectively and efficiently prosecute those who are perpetrating crimes. Tools are needed that are able to correlate data from disparate systems and assemble a legally solid history of activity. Digital fingerprints must be captured from confiscated machines to provide highly-detailed subpoena requests such that vendors asked to comply can do so in such a manner as to provide only the data pertinent to the investigation and protect the privacy and rights of the rest of their users. Key to this work, however, is respecting the delicate balance between the benefits of given technologies, the laws governing the user of such technologies, and the desire to avoid stifling innovation.

To learn more, join Rob for his webcast Cloud Computing: Beyond the Buzz, Thursday, January 20th from 2-3 EST. Register here.

-In this Story-

Amazon Web Services (AWS), DropBox, Hadoop, Microsoft Azure, Rackspace, SkyDrive, US-CERT
TwitterFacebookLinkedInRedditGoogle Gmail