Monday, March 30, 2015

Moore's Law & Business Intelligence/ Data Warehouse

Data Stack is the new data economies that draw people to participate because we offer a better data experience. The data stack describes capabilities for delivering that data to those who use it in multiple ways, through subscriptions or through APIs. In other words, the data stack refers to a set of categories that describe the different capabilities needed to transform data into more valuable forms. Moore’s Law is linked in this new idea because of the more data, the more value.


The other idea is data warehouse will become virtual, with bits and pieces of the data spread across the landscape, owned by numerous and sundry services. All of the different kinds of queries might be satisfied by spreading the work across all of these presumably asymmetric processors. Also there would be effective orchestration to manage the federation of the queries to provide the kind of service that the applications require.



Machine learning (ML) and statistical techniques are the key to transforming big data into actionable knowledge.
MLbase is still a novel system harnessing the power of machine learning for both end-users and ML researchers. MLbase provides
(1) A simple declarative way to specify ML tasks
(2) A novel optimizer to select and dynamically adapt the choice of learning algorithm
(3) A set of high-level operators to enable ML researchers to scalable implement a wide range of ML methods without deep systems knowledge
(4) A new run-time optimized for the data-access patterns of these high-level operators.







Reason for all the ideas is money. Businesses want to understand consumer’s behavior in order to make strategic decision. Many people want to discover about Big Data which means the more data people can process, the more value they might get. All of these reasons will make data warehouse and business intelligence become better. In the other words, the world will understand itself better.


Source:

Thursday, March 5, 2015

How to visualize data

Most businesses today have to deal with a huge amount of data. If they don’t know how to present it in a meaningful way, those valuable data becomes useless. Therefore, more and more Business Intelligence tools have been developed to meet businesses’ need. Those tools help business’ owner/executive to understand data in visual way with charts, graphs, figures, etc. From understanding their own business’ performance, the owner/executive can make strategic decisions for the company.


This blog will discuss how businesses in order management, accounting and insurance area should present their data.


1Accounting

Recommendation for optimal method of presentation of set of Expense and Revenue data:
-Those data should be put together in a chart type. Business owners need to know both data at the same time because it would not make sense if they only know one or the others. Then, the graph should present profit and loss by calculate expense and revenue. This step will make the data become more meaningful. It will help the owners know how their business is doing.

Provide Illustration of the method







2. Human Resource Management

Recommendation for optimal method of presentation of some set of data
-HR department has to deal with data about their own employees such as salary, PTO, working hours per day/week/month, etc. Then from those data, they calculate the efficient of each employees and how those employees contributed to company’s revenue as whole.

-In a set of PTO data, a company can show PTO of all employees in a line chart to have an overview. Then, they can use dashboard to show each individual or by department. This method will help HR to have an overview of the entire company as well as each employees, so that managers/executive can balance the workflow as well as know employees’ stress level (maybe)

Provide Illustration of the methods






3. Insurance

Recommendation for optimal method of presentation of some set of data
-Insurance companies often deal with data about how much customer payment and how much customer claim. Those data should be presented together in one graph to show the comparison. This visual graph will help insurance companies define if they are doing well or not

-Another set of data is how satisfied their customers are comparing to different type of insurance. It can be presented in a stack graph. So insurance companies can determine which products are doing well and which products need to be improved in customer services, for example.

Provide Illustration of the methods







Source:




Wednesday, February 18, 2015

Blog#2: Big Unstructured Data v/s Structured Relational Data


The differences between unstructured and structured data

Structured data is information, usually text files, displayed in titled columns and rows which can easily be ordered and processed by data mining tools. This could be visualized as a perfectly organized filing cabinet where everything is identified, labeled and easy to access.

Unstructured data is information that either does not have a pre-defined data model and/or not organized in a predefined manner.
Common forms of unstructured data: 
  • Word Doc’s, PDF’s and Other Text Files - Books, letters, other written documents, audio and video transcripts
  • Audio Files - Customer service recordings, voicemails, 911 phone calls
  •  Presentations - PowerPoints, SlideShares
  •  Videos - Police dash cam, personal video, YouTube uploads
  •  Images - Pictures, illustrations, memes



Data types:

+Identity data helps businesses to relate all other information to a unique person, group, corporation, institution, digital asset or otherwise
+Descriptive data includes all objective information that is used to describe the identity
+Activity data is for actions
+Subjective data is about opinions offered by the identity about other identities
+Relationship data refers to information about how identities relate indirectly to other identities

Data warehouse “is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.” Those data are organized which is relevant and meaningful. The diagram shows data warehouse’s architecture:




Data warehouse helps businesses to collect data and creates big data. With the big data, businesses can build an analytics tool to extract data, make data become more meaningful. Businesses then can make decision based on these data



Benefit from data warehousing

1. Competitive advantage is gained by allowing decision-makers access to data that can reveal previously unavailable, unknown, and untapped information on, for example, customers, trends, and demands.
2. More cost-effective decision-making: Data warehousing helps to reduce the overall cost of the· product· by reducing the number of channels
3. Increased productivity of corporate decision-makers by creating an integrated database of consistent, subject-oriented, historical data

Limitations of data warehousing:

1. Extra Reporting Work
2. Cost/Benefit Ratio
3. Data Ownership Concerns
4. Complexity of integration
5. Underestimation of resources of data loading
6. Required data not captured
7. High maintenance

The role of data warehouses

Data warehouse will become more critical to future business operations. With more than 95% unstructured data around the world, businesses need warehouses to store and process those data into meaningful information. This will enable businesses to forge ahead with unprecedented speed and agility. List below is the top trends of data warehouse in year 2014
1. Hadoop optimizes data warehousing environments by accelerating data transformation.
2. Customer experience (CX) strategies gain real-time insight to improve marketing campaigns.
3. Engineered systems become the de facto standard for large-scale information management activities.
4. On-demand sandbox analytics environments meet rising demand for rapid prototyping and information discovery.
5. In-database analytics simplifies data-driven analysis



Source:

Tuesday, February 3, 2015

Blog Assignment 1: Comparison of 5 BI products

Comparison of 5 BI products

Tableu:
Strength
·         Tableau can handle a huge amount of complex data
·         It is user friendly software development environment
·         Visualization is one of the focus in the product. It has rich interactivity like click-to-filter, formatted and responsive tooltips, and responsive web layouts.
·         High amount of user forums
Weaknesses
·         Tableau painfully in doing intuitive calculations.
·         Tableau has some very basic support for database joins and in-memory joins
·         Tableau doesn’t really have a concept of Dev and Production.  There’s no versioning and pushing to production
·         The Web SDK is pretty bad and effectively useless. 

MicroStrategy:
Strength
·         MicroStrategy provides an easy drag and drop semantic layer to build reports based on database. 
·         MicroStrategy has great SQL optimization for a very wide array of platforms which pushes the demand to the database. 
·         Security, tools, statistics and object management enable administrators to keep everything in order. 
·         MicroStrategy has the premiere Mobile BI platform in the market. 
·         MicroStrategy has a great ability to send out what is seen on the web in HTML or PDF form. 
Weaknesses
·         Development can be done in either a Desktop application or a Web application that doesn’t have parity with Desktop and worse: many options that do exist in both are in different places with different styled UIs. 
·         They have a poor visualization
·         Low amount of user forums

Pentaho:
Strength
·         Pentaho is an intuitive platform, where IT as well as business people can access and visualize data easily.
·         Easy access to data from diverse sources ranging from Excel to Hadoop.
·         Reporting is fast due to in-memory caching techniques. The output can be generated in various formats, as desired.
·         Detailed visualization and easy to understand infographics, with drilling and filters available. Seamless integration with third party applications, such as Google Maps.
·         The devices supported covers almost every platform: Android, iPhone, iPad, Mac, Web-based, Windows.
Weaknesses
·         All the products in Pentaho suite are inconsistent in the manner in which they work. It can be inconvenient to get around, initially.
·         The metadata layer is cumbersome to use and understand. The documentation also is of little help at times.
·         There is no system of perpetual licensing. The usage rights have to be bought every year, at the same price.
·         Advanced analytics and corresponding data visualisation needs more improvement, when compared with the same in Tableau.

Qlikview:
Strength
·         Unique technology -- associative search. Using "in memory" processing, Qlik is able to analyze data in a more "brain-like" way than traditional BI solutions.
·         Highly differentiated product: The entire architecture of QlikView is built in a way that's completely different from how the traditional BI products handle it.
·         Easy-to-use, and everyone can understand analytical tools.
Weaknesses
·         Undiversified product portfolio
·         Lack of integration with existing business-user software.

Logi Analytics:
Strength
·         Wide array of ways to gather data into the tool
·         Extremely flexible components that are also easy to use for beginners
·         Great support from the company on getting the tool to adhere to your needs
·         Able to run on either Windows or Linux systems
·         Detailed and highly available documentation within the tool on how to use every piece of functionality
Weaknesses
·         Low amount of user forums
·         Resulting application file set is heavy

5 Criteria to select a BI product & explanation

·         Ease of use – Make sure technology does not get in the way
·         Interactivity - Flexibility to explore, manipulate and analyze reports
·         Speed of report delivery – no one wants an excessive wait for their report to run.  Five seconds max is a good benchmark on the web.
·         Security – feeling confident that confidential data will be secure
·         Anywhere, anytime access - Mobile BI and Report scheduling

Evaluation & ranking product


Weight
Tableau
MicroStrategy
Pentaho
QlikView
Logi Analytics
Ease of use
22%
7
4
5
9
9
Interactivity
10%
9
5
5
8
7
Speed of report delivery
15%
8
4
4
7
6
Security
37%
6
8
5
7
5
Anywhere, anytime access
16%
8
6
4
7
5
Points
100%
7.14
5.9
4.69
7.54
6.23
Rank

2
4
5
1
3

·         Security is very important for today’s businesses. As a BI tool, the software has to be able to handle and project company’s data. Tableau has been one of the top products that can guarantee a security software
·         Ease of use and interactivity become important factors for today’s business solution. It helps businesses to be on top of its own data and information with less time consuming to learn how to use the tool. QlikView and Logi Analytics are doing a very good job to provide a user friendly software
·         Anywhere, anytime access is more focus on mobile access. It helps clients to access to their data on-the-go
·         Speed of report delivery will be very important for large businesses where number of information and data is huge. Tableau and QlikView are two leading software which can give that solution.



Sources: