Structured data refers to data that has a clear and pre-defined schema, allowing it to be easily understood by both humans and machines. Structured data provides a clear, organized and standardized way of representing information, making it possible for computers to process, analyze and draw insights from it. Some common examples of structured data formats include CSV (Comma-Separated Values), JSON (JavaScript Object Notation), and XML (eXtensible Markup Language).
Unstructured data on the other hand, refers to information that is not organized in a pre-defined format and is typically more difficult for computers to process and analyze. This makes it challenging to extract insights and information from it. Unstructured data is more free-form and can include a wide variety of data types, such as text, images, audio, and video.
The following table compares and contrasts structured data from unstructured data:
Structured Data | Unstructured Data | |
---|---|---|
Structure | Pre-defined schema | No pre-defined schema |
Format | Text and Numbers | Wide variety of formats including text, images, video, sound, etc. |
Storage | Relational Databases and Warehouses | NoSQL Databases and Datalakes |
Insights | Easy to search, query and analyze | Difficult to analyze |
Procurement | Difficult and time-consuming to procure | Easy to Procure |
Examples | Sensor Data Financial Data Internet Forms | Internet blogs and websites Images Phone Calls |
Video Explanation
The following video provides a nice overview of Structured and Unstructured data, and also compares the two (Runtime: 7 mins)