When we call someone a data scientist, what exactly are we referring to? A good data scientist is part mathematician, part computer scientist, and part pattern spotter. They are business analysts able to study and digest information from both a business perspective and an IT perspective. They are highly sought-after and generously compensated.
On top of all that, they have perhaps the coolest name in the entire information technology world. (The battle for IT job role that sounds most cool is probably between data scientists and ethical hackers.)
The emergence of the data scientist profession is a sign of the times. Big Data, computers taking over our thoughts, money being poured into predictions of markets — these are all things the world is focused on. They are things that corporations are focused on.
Data scientists weren't on many people's radars a decade ago. Their sudden popularity reflects the extent to which businesses have become preoccupied with data analysis, machine learning and artificial intelligence.
All of our unwieldy masses of unstructured information can no longer be ignored and forgotten. Data reveals trends and properly assessing those trends can unlock a virtual gold mine that helps boost revenue, predict markets, and make the employees like working for you.
Provided, of course, that there's someone around who can dig in and unearth business insights that no one thought to look for before. This is where the data scientist will reign supreme.
A special approach to science
A data scientist generally starts by looking at a dataset, determining what can be learned from the data, and then picking interesting threads to follow. A more traditional scientist, by contrast, chooses a problem, formulates a hypothesis, and then gathers specific data to either prove or disprove the hypothesis.
This off-kilter approach to data is precisely why interest in, and demand for, data scientists has increased, according to one study, 70 percent in the last three years. That is an upward spike that, for the moment, no other IT position can match.
Data scientists are a hot commodity because sound data science crosses over from IT into finance and statistics and mathematics and all of the areas or inquiry where data exists. A good data scientist is like a treasure hunter who can look at trends, anomalies, or statistics and find predictable, recurring patterns that can be turned into predictable, recurring profits.
A day at the office
So what does a data scientist really do? What are the tasks that fill up their hours? The short of it is that they analyze data and, even more generally, a data scientist is someone who knows how to extract meaning from and interpret data — how to learn from it.
This requires both tools and methods from statistics and machine learning, as well as human intuition. A successful data scientist spends a lot of time in the process of collecting, cleaning, and munging (or manipulating) data, because data is never clean.
Stats and mathematics are at the core of a data scientist's abilities, and no matter which department they are in within your company, they will utilize math as their main deductive reasoning tool. Many data scientists began their careers as statisticians or data analysts. As Big Data (and Big Data storage and processing technologies such as Hadoop) began to grow and evolve, those roles evolved as well.
Data is no longer just an afterthought — something for the IT department to handle. It's key information that requires analysis, creative curiosity, and a knack for translating high-tech ideas into new ways to turn a profit.
If a data scientist works in finance, they are likely to be tasked with spotting trends. Market data for trading purposes, called quants or quantitative analysis algorithms, are all the rage when it comes to data science. There are some firms out there with above average returns based on this work and they wouldn't ever sell their secret, no matter what.
A data scientist involved in agriculture might look at crop yields and predict which plant strains will grow the best year-over-year. I recently talked with an individual who was figuring out, using data, how best to maximize the size of the peas in a pod, across the entire harvest. Wherever you find them, data scientists are explorers and discoverers.
Skills and educational background
The data scientist role also has academic origins. A few years ago, universities began to recognize that employers wanted people who were programmers and team players. Professors tweaked their classes to accommodate this.
Some programs, such as the Institute for Advanced Analytics at North Carolina State University, prepared to churn out the next generation of data scientists. There are now more than 60 similar programs at universities around the country.
Data scientists are highly educated — research indicates that 88 percent have at least a master's degree, while and 46 percent have a Ph.D. There are certain to be exceptions, but a strong educational background is typically required for a person to develop the depth of knowledge necessary to be a data scientist.
For the technical side of the skillset, there are simply some things you must know. Chief among these are programming languages like SAS and R, Python for scripting, Hadoop for storage and movement of the data you are analyzing, along with strong foundational knowledge of data, both structured and unstructured.
SAS and R both also refer to analytical tools, and in-depth knowledge of one or both is highly recommended (though for data science R is generally preferred). Likewise, in addition to scripting in Python, Python is the most common coding language typically required in data science roles, along with Java, Perl, or C/C++.
Aspiring data scientists should also be familiar with the Hadoop platform. Although this isn't always a requirement, it is heavily preferred in many cases. Having experience with Hive or Pig is also a strong selling point, and familiarity with cloud tools such as Amazon S3 can also be beneficial.
Another vital skill is SQL database/coding. Even though NoSQL and Hadoop have become a large component of data science, it is still expected that a candidate will be able to write and execute complex queries in SQL. Finally, it is critical that a data scientist be able to work with unstructured data, whatever its source.
Data science certification
Just like any other certifications, data science certs build up and refine your skillset, in addition to helping you stay abreast of changing technology. For a data scientist, you should focus first on programming languages, or take classes that improve your ability to code.
CertifiedAnalytics.org is the premier site for data analytics certifications. They will walk you through getting your Certified Analytics Professional Certification. Becoming a CAP is a great achievement. Follow that up with Hadoop and R certs and classes, and you will be making a huge salary, almost instantly.
Whether you are aiming to become just another data scientist or have aspirations of becoming the best one who ever lived, your goal setting, communication ability and discipline will be the things that get you there. As always, I wish you the best of luck, and happy certifying!