With all the buzz surrounding big data and the ways companies will leverage it, you may find yourself asking, “which type of data are we referring to?”
Structured vs unstructured data
For the data newcomers, the first thing to understand is that not all data is created equal. By that, I mean the data generated from our social media applications are completely different from the data generated by sensors on industrial machinery.
Some data is structured, but most is unstructured. The way this data is stored, classified, and analyzed all depends on its type.
But before we go into the different types of data, let’s define data first.
What is data
There are many ways to define data, but in regards to computing, data refers to bits of facts and details that have been digitally formatted.
Data is created by internet-connected devices. This includes mobile phones, tablets, laptops, watches, televisions, cars – you name it.
Activity on these devices is monitored and eventually analyzed so data can evolve from seemingly random facts and details to useful information.
The introduction of the commercialized internet, mobile phones, and social media networks led to an explosion of data generated over such a short period of time. This explosion is referred to as big data.
The term big data is applied to amounts in the petabyte range or larger. For those unsure of how large a petabyte is, we compiled this data storage cheat sheet:
|Data Storage Units|
For reference, the average laptop has around 500 gigabytes of hard drive storage.
Research by IDC estimates the digital universe to reach an astonishing 163 zettabytes of data by 2025. This rapid expansion is mainly due to the internet of things, an ecosystem of internet-connected devices.
Data can reveal a lot about a person. For example, conducting transactions on an e-commerce website may reveal age, gender, email addresses, home addresses, phone numbers, browsing history, social media history, and more.
Companies leverage this data to gain a competitive advantage, and may even create customer profiles to give them a better understanding of their audience.
All data, however, has a structure that makes it unique. The way data is structured will determine how it is collected, processed, and then analyzed by those looking to use it.
What is structured data
Structured data is the type of data most of us are probably used to working with. Think of data that fits neatly within fixed fields and columns in relational databases and spreadsheets. Types of structured data include numbers, currency, alphabetical, names, dates, and addresses.
Structured data is highly organized and easily understood by machine language. Those working within relational databases can input, search, and manipulate structured data relatively quickly. This is the most attractive feature of structured data.
The programming language used for managing structured data is called structured query language, also known as SQL. This language was developed by IBM in the early 1970s and is particularly useful for handling relationships in databases.
Sounds confusing? The picture below should help clear things up.
From the top-down, we can see that UserID 1 refers to the customer Alice, who had two OrderIDs of ‘1234’ and ‘5678’.
Next, Alice had two ProductIDs of ‘765’ and ‘987’. Finally, we can see Alice purchased two packages of potatoes and one package of dried spaghetti.
Is this data useful on the surface? Not really. But running it in analytic programs can help unveil patterns and trends about a specific customer or customer base. This type of data is commonly seen in CRM software.
Structured data revolutionized paper-based systems that companies relied on for business intelligence decades ago. While structured data is still useful, more companies are looking to deconstruct unstructured data for future opportunities.
What is unstructured data
Unstructured data is the chaotic brother of structured data, as it cannot be processed and analyzed using conventional tools.
Examples of unstructured data include text files, video files, audio files, mobile activity, social media activity, sensor activity, geolocation activity, satellite imagery, surveillance imagery – honestly, the list goes on and on.
Unstructured data is difficult to make sense of because it has no pre-defined structure that makes it easy to classify in a relational database. Instead, non-relational, or NoSQL databases, are best fit for managing unstructured data.
An astonishing 80 percent of all data generated today is considered unstructured – and this number will continue to rise as new internet-connected devices come online.
Finding the insight buried within unstructured data isn’t an easy task. It requires new software, processes, and a high level of technical expertise to really make a difference. This can be an expensive shift for many companies.
Those able to harness unstructured data, however, are at a competitive advantage. While structured data gives us a birds-eye view of customers, unstructured data can give us a much deeper understanding of customer behavior and intent.
For example, applying machine learning software to unstructured data can help companies learn buying habits and timing, patterns in purchases, sentiment toward a specific product, and much more.
Unstructured data is key for predicting next steps, and this doesn’t just apply to customers.
For example, data given off by sensors attached to industrial machinery can notify manufacturers of strange activity ahead of time. With this information at hand, a repair can be made before the machine suffers a costly breakdown.
The future of data
The volume of big data is continuing to rise, but in the near future, the importance of having large volumes will cease to exist.
Regardless if it’s structured or unstructured data, having the most accurate and relevant data at hand will be key for companies looking to gain an advantage.
Utilizing the right data will allow companies to:
- Reduce operational costs.
- Track current metrics and create new ones.
- Understand its customers on a far deeper level.
- Unveil smarter and more targeted marketing and sales efforts.
- Find new product opportunities and offerings.
Research from IDC states that companies with the right data will see an additional $430 billion in productivity gains by 2020. It’s no wonder why IBM estimates there will be roughly 2.72 million data science jobs posted over the next few years.
The more varieties of data created will lead to new and advanced algorithms – toeing the line of GDPR compliance.
Here’s an algorithm that might creep you out, courtesy of The Institute:
“Facebook last year filed a patent for an algorithm that attempts to analyze users’ emotions by how they type and compare that to their baseline. If people are tapping their phone’s keyboard harder or typing slower than usual, that could indicate they are angry or depressed.”
Of course, this is a testament to just how powerful big data can be when leveraged in unique ways. At the end of the day, it’s up to the consumer to determine how comfortable they are with the ways their data is used. So stay informed!
For more information about how G2 Crowd utilizes data, check out our webpage on marketing solutions for vendors.