Wednesday, February 18, 2015

Blog#2: Big Unstructured Data v/s Structured Relational Data


The differences between unstructured and structured data

Structured data is information, usually text files, displayed in titled columns and rows which can easily be ordered and processed by data mining tools. This could be visualized as a perfectly organized filing cabinet where everything is identified, labeled and easy to access.

Unstructured data is information that either does not have a pre-defined data model and/or not organized in a predefined manner.
Common forms of unstructured data: 
  • Word Doc’s, PDF’s and Other Text Files - Books, letters, other written documents, audio and video transcripts
  • Audio Files - Customer service recordings, voicemails, 911 phone calls
  •  Presentations - PowerPoints, SlideShares
  •  Videos - Police dash cam, personal video, YouTube uploads
  •  Images - Pictures, illustrations, memes



Data types:

+Identity data helps businesses to relate all other information to a unique person, group, corporation, institution, digital asset or otherwise
+Descriptive data includes all objective information that is used to describe the identity
+Activity data is for actions
+Subjective data is about opinions offered by the identity about other identities
+Relationship data refers to information about how identities relate indirectly to other identities

Data warehouse “is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.” Those data are organized which is relevant and meaningful. The diagram shows data warehouse’s architecture:




Data warehouse helps businesses to collect data and creates big data. With the big data, businesses can build an analytics tool to extract data, make data become more meaningful. Businesses then can make decision based on these data



Benefit from data warehousing

1. Competitive advantage is gained by allowing decision-makers access to data that can reveal previously unavailable, unknown, and untapped information on, for example, customers, trends, and demands.
2. More cost-effective decision-making: Data warehousing helps to reduce the overall cost of the· product· by reducing the number of channels
3. Increased productivity of corporate decision-makers by creating an integrated database of consistent, subject-oriented, historical data

Limitations of data warehousing:

1. Extra Reporting Work
2. Cost/Benefit Ratio
3. Data Ownership Concerns
4. Complexity of integration
5. Underestimation of resources of data loading
6. Required data not captured
7. High maintenance

The role of data warehouses

Data warehouse will become more critical to future business operations. With more than 95% unstructured data around the world, businesses need warehouses to store and process those data into meaningful information. This will enable businesses to forge ahead with unprecedented speed and agility. List below is the top trends of data warehouse in year 2014
1. Hadoop optimizes data warehousing environments by accelerating data transformation.
2. Customer experience (CX) strategies gain real-time insight to improve marketing campaigns.
3. Engineered systems become the de facto standard for large-scale information management activities.
4. On-demand sandbox analytics environments meet rising demand for rapid prototyping and information discovery.
5. In-database analytics simplifies data-driven analysis



Source:

Tuesday, February 3, 2015

Blog Assignment 1: Comparison of 5 BI products

Comparison of 5 BI products

Tableu:
Strength
·         Tableau can handle a huge amount of complex data
·         It is user friendly software development environment
·         Visualization is one of the focus in the product. It has rich interactivity like click-to-filter, formatted and responsive tooltips, and responsive web layouts.
·         High amount of user forums
Weaknesses
·         Tableau painfully in doing intuitive calculations.
·         Tableau has some very basic support for database joins and in-memory joins
·         Tableau doesn’t really have a concept of Dev and Production.  There’s no versioning and pushing to production
·         The Web SDK is pretty bad and effectively useless. 

MicroStrategy:
Strength
·         MicroStrategy provides an easy drag and drop semantic layer to build reports based on database. 
·         MicroStrategy has great SQL optimization for a very wide array of platforms which pushes the demand to the database. 
·         Security, tools, statistics and object management enable administrators to keep everything in order. 
·         MicroStrategy has the premiere Mobile BI platform in the market. 
·         MicroStrategy has a great ability to send out what is seen on the web in HTML or PDF form. 
Weaknesses
·         Development can be done in either a Desktop application or a Web application that doesn’t have parity with Desktop and worse: many options that do exist in both are in different places with different styled UIs. 
·         They have a poor visualization
·         Low amount of user forums

Pentaho:
Strength
·         Pentaho is an intuitive platform, where IT as well as business people can access and visualize data easily.
·         Easy access to data from diverse sources ranging from Excel to Hadoop.
·         Reporting is fast due to in-memory caching techniques. The output can be generated in various formats, as desired.
·         Detailed visualization and easy to understand infographics, with drilling and filters available. Seamless integration with third party applications, such as Google Maps.
·         The devices supported covers almost every platform: Android, iPhone, iPad, Mac, Web-based, Windows.
Weaknesses
·         All the products in Pentaho suite are inconsistent in the manner in which they work. It can be inconvenient to get around, initially.
·         The metadata layer is cumbersome to use and understand. The documentation also is of little help at times.
·         There is no system of perpetual licensing. The usage rights have to be bought every year, at the same price.
·         Advanced analytics and corresponding data visualisation needs more improvement, when compared with the same in Tableau.

Qlikview:
Strength
·         Unique technology -- associative search. Using "in memory" processing, Qlik is able to analyze data in a more "brain-like" way than traditional BI solutions.
·         Highly differentiated product: The entire architecture of QlikView is built in a way that's completely different from how the traditional BI products handle it.
·         Easy-to-use, and everyone can understand analytical tools.
Weaknesses
·         Undiversified product portfolio
·         Lack of integration with existing business-user software.

Logi Analytics:
Strength
·         Wide array of ways to gather data into the tool
·         Extremely flexible components that are also easy to use for beginners
·         Great support from the company on getting the tool to adhere to your needs
·         Able to run on either Windows or Linux systems
·         Detailed and highly available documentation within the tool on how to use every piece of functionality
Weaknesses
·         Low amount of user forums
·         Resulting application file set is heavy

5 Criteria to select a BI product & explanation

·         Ease of use – Make sure technology does not get in the way
·         Interactivity - Flexibility to explore, manipulate and analyze reports
·         Speed of report delivery – no one wants an excessive wait for their report to run.  Five seconds max is a good benchmark on the web.
·         Security – feeling confident that confidential data will be secure
·         Anywhere, anytime access - Mobile BI and Report scheduling

Evaluation & ranking product


Weight
Tableau
MicroStrategy
Pentaho
QlikView
Logi Analytics
Ease of use
22%
7
4
5
9
9
Interactivity
10%
9
5
5
8
7
Speed of report delivery
15%
8
4
4
7
6
Security
37%
6
8
5
7
5
Anywhere, anytime access
16%
8
6
4
7
5
Points
100%
7.14
5.9
4.69
7.54
6.23
Rank

2
4
5
1
3

·         Security is very important for today’s businesses. As a BI tool, the software has to be able to handle and project company’s data. Tableau has been one of the top products that can guarantee a security software
·         Ease of use and interactivity become important factors for today’s business solution. It helps businesses to be on top of its own data and information with less time consuming to learn how to use the tool. QlikView and Logi Analytics are doing a very good job to provide a user friendly software
·         Anywhere, anytime access is more focus on mobile access. It helps clients to access to their data on-the-go
·         Speed of report delivery will be very important for large businesses where number of information and data is huge. Tableau and QlikView are two leading software which can give that solution.



Sources: