Can you believe that 90 percent of all the data throughout history was generated in the past two years?
The days of storing easy-to-collect, neatly structured data in a series of databases are well behind us. Nowadays, humans are generating larger quantities of data at much faster speeds than ever before, and the variety of this data is far more complex than it was a few decades ago.
This rapid explosion of information is formally called “big data.” Such a simple name for something so all-encompassing. But what exactly is big data? Let’s take a look.
Big Data Definition
Big data is a collection of structured, unstructured, and semi-structured data from traditional and digital sources. These sources include databases, text messages, video files, emails, social media platforms, photos, embedded sensors, and much more.
The volume, velocity, and variety of big data is what makes it so "big." Once big data is gathered, it can be ran through big data analytics software by data science professionals. This is where the value is big data is determined.
The insight derived from data using big data software can be used to help marketers target their campaigns more strategically, help environmentalists understand sustainability in the future, help healthcare professionals predict epidemics, and much more.
To understand the sheer scale of big data, we first need to look into its history and how far we have come in such a short period of time.
The practice of gathering and storing large amounts of information, and then attempting to make sense of that information has been around for centuries. For example, the U.S. Census Bureau started recording population data on punch cards in 1790. Fast forward 100 years, and the invention of the “Tabulating Machine” processed information on these punch cards hundreds of times faster than humans could.
With the “information explosion” of the 1940s, society desperately needed a better way to both store and access large amounts of data. In 1970, IBM Research Labs published the first paper on relational databases – allowing for more efficient ways to locate data in large databases. Think of something similar to an Excel spreadsheet.
The commercialization of the internet in 1995 paved the way for Web 2.0. In its infancy, the internet was information-only, and featured static websites that provided dull user experiences. When Web 2.0 launched in 2004, end-users were now able to generate, distribute, and store their own content in a virtual community.
Internet users flooded social media networks like Facebook and Twitter in the mid-2000s, which led to the distribution of even more data. Around this time, YouTube and Netflix forever changed the ways we would view and stream video content. Data used from these platforms provided near-real-time insight into consumer behavior as well.
With the 2011 launch of Hadoop, a powerful open-source framework for storing data and running applications, experts agreed that big data was the next frontier for innovation and competition.
The internet of things (IoT) revolutionized big data in 2014. With an internet-connected world, more businesses decided to shift spending towards big data to reduce operational costs, boost efficiency, and develop new products and services.
Now, the scope of big data is nearly endless. Researchers in “smart cities” are using real-time data to look at electricity consumption, pollution, traffic, and much more. Emerging technologies like artificial intelligence and machine learning are harnessing big data for future automation and helping humans unveil new solutions.
All of these milestones were made possible when the world decided to go digital.
The big data market is accelerating at seriously mind-boggling speeds. In 2014, big data was just an $18.3 billion market. The most recent Wikibon report on big data forecasts that by 2026, the total revenue generated from hardware, software, and professional services associated with big data will reach $92.2 billion. However, don’t be surprised if that number sharply rises over the coming years.
One of the main reasons for this acceleration can be tied to IoT. For better or for worse, humans are constantly engaged with internet-connected devices that contribute to the constant flow of data. By 2021, the average North American is expected to own around 13 internet-connected devices.
The devices we own today come in the forms of smartphones, laptops, tablets, smart televisions, gaming consoles, smartwatches, your Amazon Echo, and even our vehicles. But in the very near future, you can expect the emergence of smart home appliances like toasters, refrigerators, smart locks, and others to contribute to this mix (for some home owners, they already have).
The hardware itself simply allows for more efficient ways to share data, but the real volume of big data comes from the ways we interact with these devices. For example, a wearable device, like a smartwatch, may gather all types of data on you. This device can track heart rate, sleep quality, blood-glucose levels, and even fertility cycles.
In turn, data from your smartwatch can be shared with healthcare providers for more personalized patient care. Theoretically, insurance companies can also use this data (with your discretion) to customize your rates. That’s a lot of data from just one device.
But big data is more than just user-to-device interaction. Massive datasets can be fed into a deep learning neural network (think of a digital, artificial super-brain) to understand efficiencies from a business standpoint. An example of this would be analyzing manufacturing machinery for predictive maintenance and power savings.
Making sense of all this data, and using it to derive unique, cost-effective, and potentially groundbreaking discoveries, is where the real value of big data lies.
Big data is certainly not easy to grasp, especially with such vast amounts and varieties of data today. To help make sense of big data, experts have broken it down into three easier-to-understand segments. These segments are referred to as the 3 V’s of big data: volume, velocity, and variety.
The first V of big data is perhaps the most prominent one, and it refers to the “big” volume of data available now and in the future.
There’s a lot of data out there -- an almost incomprehensible amount. With 90 percent of all data throughout history generated in the past two years, that amounts to roughly 2.5 quintillion bytes of data created every single day. To put this number into perspective, if 2.5 quintillion pennies were laid flat, it would cover the Earth five times.
But if you thought 2.5 quintillion was big, get a load of this: A report commissioned by Seagate and performed by IDC estimates that by 2025, the digital universe will reach 163 zettabytes of data, or 163 trillion gigabytes!
Let’s look at volume from a social media standpoint, since social media has had a substantial impact on big data. As of 2016, there are nearly 2 trillion total posts on Facebook. Since Facebook first launched in 2004, there have been more than 250 billion photos uploaded to the platform.
Facebook has amounted a serious wealth of personal data, and its 2.2 billion users are sharing a staggering amount of it every second of the day. This simply would not be possible without the growth of big data.
The second V of big data refers to the velocity at which the universe of big data is expanding.
Initially, the acceleration of big data can present exciting opportunities. There’s so much data at hand, and when we harness this data, it can be used to uncover new realities.
Sadly, the rate at which data is growing is quickly outpacing our ability to decipher it. A Digital Universe study by IDC revealed that the amount of data in the world is doubling in size every two years. Even more unfortunate is the fact that 3 percent of the world’s data is organized and “tagged,” with only 0.5 percent actually ready to be analyzed.
Big data isn’t just “big,” it’s also growing exponentially fast. Let’s put this velocity in perspective by continuing our series of astonishing Facebook facts. According to insight from the Social Skinny, there are 510,000 comments posted, 293,000 statuses updated, and 136,000 photos uploaded to Facebook every minute!
I love analogies. So for me, the big data universe is expanding much like our physical universe of stars, planets, galaxies, and dark matter.
Big data technologies and metadata (data about data) paired with AI and machine learning will need to be used to their fullest potentials to give us the best snapshot of future frontiers – like the Hubble Telescope peers off into space for new and exciting discoveries.
The last V of big data refers to the variety, or many different types, of data that’s being generated today.
Data is big, data is fast, but data is also extremely diverse. Just a few decades ago, data would have most likely been plain text and neatly structured in a relational database. There weren’t a whole lot of options to use this data, aside from simple classification or perhaps finding a trend.
Big data has drastically changed the data landscape. There’s still a place for plain text data, but data formats like digital audio, video, images, geospatial, and many others have come into play.
Each data type has its own uniqueness in terms of size and how it’s stored and classified in a cloud, database, etc. What also makes each format unique is how we analyze them to derive valuable solutions.
Veracity and Value
But wait, there’s more! Two additional V’s, known as veracity and value, may not be a part of the original 3 V’s, but they have become increasingly important as big data expands.
Veracity simply refers to accuracy of data. Not all data is precise or consistent, and with the growth of big data, it’s becoming harder to determine which data actually brings value. A good example of inconsistent data is social media data, which is often volatile and trending one way or another. Consistent data would be weather forecasts, which are much easier to predict and track.
Value is the most straightforward V of big data. It’s asking the question, “How can we use all of this data to extract something meaningful for our users and the business?” Big data won’t bring much value it’s being analyzed without purpose.
We know that with the influx of more devices, platforms, and storage options, this is not only going to increase the volume of data, but also the varieties of data that is out there.
But not all data is created equal. By this I mean that the way you’ll store and search for an ID number in a relational database is completely different than extracting value from a piece of video content.
One type of data is what we call structured, and another is called unstructured. But there’s also a third type of data called semi-structured. Let’s examine the differences of each data type.
Structured data, for the most part, is highly organized in a relational database. If you needed to access a piece of information within the database, you could easily do so with a quick search.
Structured data is actually quite similar to machine language, or the only language a computer is capable of understanding. This type of data sits neatly in a fixed field within a record or file.
One of the most common examples of structured data is something you’d see in a spreadsheet. If you’re on the phone with a student loan representative and they ask you for your personal identification, chances are they’re working with structured data.
It would be nice if all data could be neatly structured, but human-generated data like photos on social media, voicemails, text messages, and more are highly unstructured.
As a matter of fact, 80 percent of all data is unstructured -- which makes sense why we’ve only been able to “tag” 3 percent of the world’s data. But what does unstructured refer to? It means data that isn’t easily identifiable by machine language, and it doesn’t conform to a standard database or spreadsheet.
You may be surprised, but most unstructured data is actually text-heavy. For example, text messages are unstructured because as far as machines are concerned, humans don’t talk or type in a logical way. This is why machine learning and natural language processing are used to dissect human languages, slangs, jargons, and more.
There’s also machine-generated unstructured data, which is a bit easier for machines to process. An example of this would be satellite images capturing weather forecasts.
The third type of data falls somewhere between structured and unstructured, also known as semi-structured data.
Things like XML files or emails are examples of semi-structured data, because while they do contain tags such as dates, times, and sender/receiver information, the language used in them isn’t structured.
For a more in-depth look at the differences between structured vs unstructured data, feel free to check out our complete guide.
Big data analytics essentially picks up where conventional business intelligence and other analytics platforms leave off, looking at large volumes of structured and (mostly) unstructured data. Let’s do a quick comparison of the two.
BI software helps businesses make more calculated decisions by analyzing data within an organization’s data warehouse. The focus of BI is more on data management and increasing overall performance and operations.
Big data analytics, on the other hand, looks at more raw data in an attempt to uncover patterns, market trends, and customer preferences to make informed predictions. There are a number of ways in which big data analytics does this.
Descriptive analysis creates simple reports, graphs, and other visualizations which allow companies to understand what happened at a particular point. It’s important to note that descriptive analysis only pertains to events that happened in the past.
Diagnostic analysis gives deeper insight into a specific problem, whereas descriptive analysis is more of an overview. Companies can use diagnostic analysis to understand why a problem occurred. This analysis is a bit more complex, and may even incorporate aspects of AI or machine learning.
By pairing advanced algorithms with AI and machine learning, companies may be able to predict what will likely happen next. Being able to give an informed answer about the future can obviously bring a ton of value to a business. This insight is useful for trend forecasting and uncovering patterns.
Prescriptive analysis is extremely complex, which is why it is not yet widely incorporated. While other analytic tools can be used to draw your own conclusions, prescriptive analysis provides you with actual answers. A high level of machine learning usage is needed for these type of reports.
Data is entwined in nearly every part of our society nowadays. Whether it’s a user updating their Facebook status through a mobile device, or a business harnessing data to improve product functionality, we’re all contributing to the universe of big data.
In a Tableau-sponsored report by the Economist Intelligence Unit, 76 percent of executives agreed that data is essential in their decision-making processes. More data-driven companies across all industries are emerging constantly. Here’s what some industries plan to do with all this data.
With billions of mobile users worldwide, telecom is ripe for big data innovation. Using big data analytics, service providers could recover from a network outage much faster by pinpointing its root cause with real-time data. Analytics can also be applied to discover more accurate and personalized ways to bill customers. Sentiment data from social media, geospatial data, and other mobile data can be used to offer targeted media and entertainment options.
More banks are moving away from being product-centric and are focusing on being customer-centric. Big data can help segment customer preferences through an omnichannel marketing approach. Perhaps the most obvious use of big data in financial services is fraud detection and prevention. Big data analytics and machine learning can study a customer’s tendencies and distinguish them from unusual behavior.
We mentioned how smartwatch data can be used for personalized patient care and customized healthcare insurance rates. Predictive analysis can have phenomenal applications in the healthcare industry – allowing for earlier detections of diseases and more accurate associations to certain risk factors.
One educational model doesn’t suit all students. Some are visual learners, others are audio learners. Some prefer online, others thrive during in-person lectures. Big data analytics can be used to build more customized learning models for all students. Big data is also being used on some college campuses to reduce dropout rates by identifying risk factors in students who are falling behind in their classes.
The big data market has undergone massive growth for a reason. More companies are realizing the importance of taking a data-driven approach not only for internal processes, but also for improving the experiences of their customers.
Emerging technologies like AI, machine learning, and NLP are utilizing big data to break ground on new products, user experiences, cost efficiencies, and more.
So where do we go from here? What is the future of big data? Though the picture isn’t fully clear, we do have some sort of idea.
Going off of IDC’s research, we can predict that IoT is driving most of this growth. By 2021, the average U.S. consumer will interact with 601 internet connected devices every day. By 2025, that number jumps to 4,785 interactions. That’s nearly an 800 percent increase over four years!
One of the main reasons for this spike in interactions is the rise of intelligent assistants and conversational UI. Do you enjoy chatting with Siri or Alexa? Good news: prepare to make many more of these friends in the near future.
But IoT won’t just increase user-to-device interactions, it’ll play a crucial role in machine-to-machine (M2M) interactions as well. Sensors will be a driving technology linking machines to the internet. One way we’ll use data from M2M interactions is to monitor human impact on the environment, forest fires, earthquakes, and other forces of nature.
With the digital universe expected to reach 163 zettabytes by 2025, the focus will slowly shift from volume of data to veracity of data. We not only have to be able to trust the data we’re analyzing, but also ensure it’ll serve a purpose at some point.
This graph from IDC estimates how much data will be critical for day-to-day operations by 2025. While big data will still be crucial for marketing, sales, and product development, the stakes are higher when we rely on data for things like self-driving cars or automated mass transit. This is why veracity is becoming increasingly important.
Wrapping up big data
The emergence of big data has put customer-centricity at the forefront. Big data is helping businesses make faster, more calculated decisions. Through the use of big data analytics, we’re able to predict where future problems may occur, and apply data-driven reasoning to resolve these problems. This just wasn’t a reality a few decades ago.
But the road ahead for big data is still a long one. Advancements in emerging technologies like AI and machine learning will only make big data more valuable. We live in a time where big data is really gaining momentum – which can be both exciting and overwhelming.
For a quick snapshot of big data (and to impress your friends working in IT), here are five fast-facts of big data to have your disposal.