In the not-too-distant past, all-things-to-all-surfers internet portal Yahoo! ran a marketing campaign that asked internet-savvy (and non-savvy) individuals one simple question: Do you, uh, Yahoo!? As Big Data continues to attract attention from businesses and organizations, a similar-sounding question is probably being asked of many IT professionals: Do you, uh, Hadoop?
If you don't, uh, Hadoop, and maybe don't even know what a "Hadoop" is, then a new Massive Open Online Course (MOOC) sponsored by The Linux Foundation and edX could be right up your alley. Starting in early June, LFS103x: Introduction to Apache Hadoop will invited newcomers into the innermost reaches of the framework that's at the heart of modern data storage technology.
Hadoop, which was born from related technologies in 2008, is defined by data analytics firm SAS Institute as being "an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs."
If you don't think that data storage and data processing are relevant, consider the following statistics from IBM: There are (roughly) 2.5 quintillion bytes of data created every day. Out of all the data that currently exists in the world, it's estimated that 90 percent has been created just in the past two years.
So a) there's a massive amount of data out there, and b) the global stockpile of information is accumulating at an exponential rate. There are doubtless billions of fascinating needles in that gargantuan haystack, and individuals who are skilled at data storage, data retrieval, data processing and data analysis can expect to be key players in the greatest treasure hunt of all time.
The new course from The Linux Foundation, which is offering the content in connection with its open data platform initiative, ODPi, aims to teach the basics of Hadoop and its associated technologies, including the following:
- The origins of Apache Hadoop and its big data ecosystem
- Deploying Hadoop in a clustered environment of a modern day enterprise IT
- Building data lake management architectures around Apache Hadoop
- Leveraging the YARN framework to effectively enable heterogeneous analytical workloads on Hadoop clusters
- Leveraging Apache Hive for an SQL-centric view into the enterprise data lake
- An introduction to managing key Hadoop components (HDFS, YARN and Hive) from the command line
- Securing and scaling your data lakes in multi-tenant enterprise environments
The self-paced course has six chapters, each of which includes a graded quiz upon completion. Taking and completing the course is free, but you can add a stamp to your educational passport, so to speak, by paying a modest $99 fee to get a "verified certificate," the means used by edX to formally document course completion.