Wednesday, February 18, 2015

Blog#2: Big Unstructured Data v/s Structured Relational Data


The differences between unstructured and structured data

Structured data is information, usually text files, displayed in titled columns and rows which can easily be ordered and processed by data mining tools. This could be visualized as a perfectly organized filing cabinet where everything is identified, labeled and easy to access.

Unstructured data is information that either does not have a pre-defined data model and/or not organized in a predefined manner.
Common forms of unstructured data: 
  • Word Doc’s, PDF’s and Other Text Files - Books, letters, other written documents, audio and video transcripts
  • Audio Files - Customer service recordings, voicemails, 911 phone calls
  •  Presentations - PowerPoints, SlideShares
  •  Videos - Police dash cam, personal video, YouTube uploads
  •  Images - Pictures, illustrations, memes



Data types:

+Identity data helps businesses to relate all other information to a unique person, group, corporation, institution, digital asset or otherwise
+Descriptive data includes all objective information that is used to describe the identity
+Activity data is for actions
+Subjective data is about opinions offered by the identity about other identities
+Relationship data refers to information about how identities relate indirectly to other identities

Data warehouse “is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.” Those data are organized which is relevant and meaningful. The diagram shows data warehouse’s architecture:




Data warehouse helps businesses to collect data and creates big data. With the big data, businesses can build an analytics tool to extract data, make data become more meaningful. Businesses then can make decision based on these data



Benefit from data warehousing

1. Competitive advantage is gained by allowing decision-makers access to data that can reveal previously unavailable, unknown, and untapped information on, for example, customers, trends, and demands.
2. More cost-effective decision-making: Data warehousing helps to reduce the overall cost of the· product· by reducing the number of channels
3. Increased productivity of corporate decision-makers by creating an integrated database of consistent, subject-oriented, historical data

Limitations of data warehousing:

1. Extra Reporting Work
2. Cost/Benefit Ratio
3. Data Ownership Concerns
4. Complexity of integration
5. Underestimation of resources of data loading
6. Required data not captured
7. High maintenance

The role of data warehouses

Data warehouse will become more critical to future business operations. With more than 95% unstructured data around the world, businesses need warehouses to store and process those data into meaningful information. This will enable businesses to forge ahead with unprecedented speed and agility. List below is the top trends of data warehouse in year 2014
1. Hadoop optimizes data warehousing environments by accelerating data transformation.
2. Customer experience (CX) strategies gain real-time insight to improve marketing campaigns.
3. Engineered systems become the de facto standard for large-scale information management activities.
4. On-demand sandbox analytics environments meet rising demand for rapid prototyping and information discovery.
5. In-database analytics simplifies data-driven analysis



Source:

No comments:

Post a Comment