This feature first appeared in the Spring 2015 issue of Certification Magazine. Click here to get your own print or digital copy.
Big Data. It's a buzz phrase, a marketing term, an IT framework and an employment category. That's a heavy load for two words to carry, but Big Data is all about dealing with large burdens. The need to wrangle large datasets has been around for decades, particularly in the theoretical science and engineering communities. Why then has Big Data only recently become a major IT industry phenomenon?
Put simply, Big Data couldn't exist until the technology that made it possible became widely available and affordable to those who could benefit from it. Once that happened, it was just a matter of showing these organizations what Big Data was (and wasn't), and how they could use it to achieve greater success. Big Data is very new, but there are already a couple of developments on the horizon that will fundamentally change what Big Data is, and who will (or won't) be employed in its specialty.
A Brief History Lesson
Back in the 1960s and '70s, very large datasets could only be processed using supercomputers � monster machines manufactured by companies like Cray Research and IBM. Supercomputers were wildly expensive to house, operate, and maintain, putting them out of the price range of most organizations. This resulted in a limited number of supercomputers in the world � with any number of government departments, corporations and universities all fighting each other to get access to processing time on the few models out there.
While supercomputer price tags would become more reasonable as time passed, they remained very expensive to own and operate, keeping them out of reach for several groups that could have benefitted from their use. In 1994, a pair of NASA computer scientists figured out how to connect a group of regular PCs (often referred to as commodity hardware) so that they could perform the same massive parallel processing as a supercomputer. This was the first "Beowulf cluster," and it was a total game changer for power computing.
A Beowulf cluster consists of a local area network made up of standard PC clients, each client running a UNIX-based operating system and additional software that enables it to share processing duties with every other client in the network. This combination of inexpensive hardware and open source software made it possible to create a supercomputing system at a fraction of the cost.
Traditional supercomputers would slowly drop in price as time passed. In 2008, Cray Research and Microsoft released the CX1, a "personal supercomputer" with a relatively inexpensive $25,000 price tag. The CX1 offered a very viable supercomputing option for organizations with smaller budgets.
One big piece of the Big Data puzzle was the creation of Apache Hadoop in 2005. Hadoop is an open source software platform used to work with massive data sets distributed across multiple commodity servers. Hadoop works particularly well when dealing with a mix of structured and complex data. All of these developments contributed to the creation of what we now call Big Data. But how did Big Data become a successful IT industry specialty?
Turning Big Data into Big Money
Turning raw data into information is not a new challenge - businesses and governments have been performing this trick for decades. What Big Data fundamentally improved are these key elements:
- The amount of data than can be worked with
- The sophistication of the analysis that can be performed
- The accuracy of the information produced
- The cost of the required infrastructure
As noted earlier, the rush to get involved in Big Data was sparked by the technology that made it possible. In order to truly take off, however, Big Data had to be turned into a product that could be sold to large corporations and other potential clients. Big Data was given market legitimacy by industry powerhouses like IBM, SAP, Cloudera, Microsoft and Amazon Web Services. These and other vendors created the value proposition behind the Big Data buzz, turning it into a product that could be understood by potential buyers.
Big Data adoption was helped greatly by the technology wave that preceded it: cloud computing. The concepts and benefits of cloud computing had already been accepted by several industries by the time that Big Data really began to gain traction. For many organizations, the addition of Big Data was a natural extension of their existing cloud computing services.
Obviously, for Big Data to be beneficial, you need to have...well, big data. As it is, more data is generated and captured in today's world than at any other time in history. The mobile computing boom in particular has created a massive data collecting engine that captures multiple events from our daily lives. A smartphone creates new data points every second that it's powered on.
This mobile data honeypot will grow larger from the nascent wearables market. Smartwatches, fitness bands, health monitors and other small devices that track what their owners are doing and where they are doing it, are growing in popularity. The wearables market will likely see a huge boost from the recent release of Apple's Watch.
Then, there is the so-called Internet of Things, also known as the Internet of Everything. As more everyday objects become internet-enabled, they will all be adding more data into the mix. All of this activity is generating a huge glut of gigabytes for Big Data specialists to spin into gold, or possibly Bitcoin. But, where have today's Big Data specialists come from?
Big Data Miners
The growth of Big Data has resulted in the evolution of one or more traditional IT industry job roles. While the job titles have changed, the Big Data job descriptions are similar to those of their more familiar counterparts, but with a few important tweaks.
One new Big Data job role is that of data scientist. Now, if you were to compare a data scientist to a traditional data analyst, you would not be totally off the mark. Both roles require a deep knowledge of mathematics, statistics, computer modeling and analytics. The data scientist role, however, has some upgraded responsibilities.
Data scientists are expected to have high levels of business acumen, the better to help them focus on the most strategically important questions an organization has. Data scientists must also be able to effectively communicate with business executives and department leaders, using the information they've generated to recommend specific courses of action. Some companies using Big Data have gone a step further and created a new C-level position: the Chief Analytics Officer. This business executive has responsibility for all actions taken based on the recommendations of the data scientist(s) working under them.
As with other technology frameworks, Big Data requires a number of software developers and hardware experts to provide support. Hadoop developers are currently in demand on job boards. Organizations that want to host their own Big Data solution need to have hardware engineers and support technicians who are knowledgeable in Big Data clustering infrastructure.
Bold predictions for Big Data
Is Big Data here to stay? Yes and no. There are two significant factors that will come into play in the not-too-distant future, which will change how Big Data exists today.
In the short term, Big Data is going to continue to make its presence felt in a growing number of industries. Big Data's relatively low cost has already empowered many smaller scientific institutions around the world, who can do complex analyses of huge data sets without breaking their modest budgets. The same can be said for startups � new businesses created with limited personal or venture capital will be able to get their hands on Big Data tools for a fraction of the cost of traditional supercomputing.
So, how will Big Data change in the near future? First of all, Big Data will become just Data. As a growing number of people generate greater amounts of data, and as the related technology continues to become less expensive and more accessible, the distinction between Big Data and "regular" data management systems will fade.
In particular, some Big Data tools will end up as consumer-level products. We are already seeing the early stages of this in the personal health tracking industry, where a fitness band and a smartphone can automatically collect a number of vital statistics in real-time, and then perform rudimentary trend analysis on the entire data collection.
A second important development is that the next wave of data scientists, will be machines. The extremely young data scientist job role is very likely already on the path to automation. Many of the duties of the data scientist - trend analysis and prediction, for example - will be performed quicker, cheaper and better by future versions of existing Big Data tools. The growing power and sophistication of computer algorithms and machine intelligence will eventually outstrip enough of the data scientist's capabilities to make them obsolete.
This isn't to say that you shouldn't pursue a career as a data scientist...as long as you know that it could end up with you configuring the software that is about to replace you.