Category Archives: Data Warehousing
A Understanding Data Warehousing Look That’s Entirely New
store.theartofservice.com/itil.html
Understanding Data Warehousing
Introduction
Introduction
Data has always been an essential ingredient to decision-making and, in modern business, the need to obtain, store, and use data has increased dramatically as the complexities and scope of the global marketplace has expanded.
Data warehousing is an environment established for the sole purpose of gathering, integrating, and delivering data from across multiple data sources for use in enterprise decision-making. However, its effectiveness can be expanded to support any person, process, or system needing current and historical data which is consistent and relatable.
Defining Data Warehouse
Data warehouse is a computing environment composed of several technologies and products, including:
Data Acquisition
Data Management
Data Modeling
Data Quality
Data Analysis
Metadata Management
Development Tools
Storage Management
Applications
Administrative Functions
Defining Data Warehouse (Part 2)
Data Warehousing is about managing the data. The following data features are key reasons for having a data warehouse:
Subject Orientation
Data Integration
Non-volatile
Time Variance
Data Granularity
Benefits of Data Warehousing
Data Warehousing provides the following benefits:
A comprehensive and integrated perspective of the enterprise
Availability of current and historical information for strategic decision making
Mitigating operational risks related to supporting the decision-making process
Providing a flexible and interactive source of information
Introducing Business Intelligence
Business Intelligence is a set of disciplines designed specifically to establish a consistent decision-making environment.
Business Intelligence does not replace Data Warehousing, but uses it extensively in it processes.
Business Intelligence can be described as a two-step process:
Transforming data into information
Transforming information into knowledge
Functional Components of a Data Warehouse
Physical Components of a Data Warehouse
Source Data
Data Sources can include:
Operational Data
Internal Data
Archived Data
External Data
Data can consist of structured or unstructured, prepared or raw formats.
Data Staging
The activities of data staging are:
Extracting data from the data sources
Transforming the data into usable information
Loading the data and metadata into data storage
ETL (Data Extraction, Transformation, and Loading) is considered the most time-consuming and human-intensive activities in data warehousing.
Data Quality
One purpose of Data Staging is to raise the quality of the data used in decision making: bad data will lead to bad decisions.
Data Quality is influenced by:
Inadequate database designs
Aging of data
Dummy or absent data
Non-unique identifies
Ineffective primary keys
Violation of business rules
Lack of policies and procedures
Input errors
Data Storage
Organizations must establish the storage requirements for:
Data staging
Corporate data warehouse
Individual data marts
OLAP-based multidimensional databases
Information Delivery
The requirements for Information Delivery reside in expectations related to:
Query types and frequencies
Report types and frequencies
Types of analysis
Distribution of information
Real-time requirements
Applications for decision support
Potential growth and expansion
Metadata
The core of the data warehouse is its
METADATA
What is a Data Mart?
A data mart is a subset of a data warehouse. A data warehouse will typically contain data relevant to the entire enterprise, while a data mart contains data relevant to a line of business or department within the enterprise.
Deployment of data warehouses and data marts will usually take one of the following approaches:
Top-down (data warehouse first, data marts second)
Bottom-up (data marts first, data warehouse second)
Data Warehouse Architecture
There are five basic architectures in data warehousing:
Centralized Data Warehouse – one data warehouse with no data marts.
Independent Data Marts – several autonomous data marts with no central data warehouse.
Federated Data Marts – several data marts operating under standardized controls with no central warehouse.
Hub-and-Spoke – several data marts with a central data warehouse.
Data-Mart Bus – several data marts are created and conform to the standards and controls of the original data mart.
Why Data Warehousing?
What does a data warehouse provide the user?
Ability to run simple queries and reports against current and historical data
Ability to perform “what if” scenarios
Ability to iteratively query and analyze deeper into the data
Ability to identify historical trends and apply them effectively to future situations.
Challenges in Data Acquisition
The typical challenges facing data acquisition activities are:
Large number of data sources
Disparate data sources
External data sources
Ongoing data feeds
Different computing platforms
Data replication
Data integration
Data cleansing
Complex data transformations
Challenges in Data Storage
The typical challenges facing data storage activities are:
Large data volumes
Large data sets
New data types
Data storage in staging area
Multiple index types
Parallel processing
Data archiving
Tool compatibilities
Challenges in Information Delivery
The typical challenges facing information delivery activities are:
Multiple user types
Multiple query types
Complex queries
OLAP
Multidimensional analysis
Web-enablement
Metadata management
Tools from multiple vendors
Relevant Data Warehouse Standards
Relevant standards for data warehousing, specifically metadata, are provided through:
Meta Data Coalition
Object Management Group
OLAP Council for Multi-dimensional Application Programmers Interface (MDAPI)
Basic Project Plan
The basic plan for a data warehouse project is:
Planning
Defining requirements
Design
Build
Deploy
Maintain
The Toolkit
The Toolkit is designed to be holistic to the enterprise’s relationship with data, not just data warehousing. As part of its scope, a second presentation is available to introduce Data Analytics and Data Mining, which is related to the second step of Business Intelligence.
The goal of the Data Warehouse/Analytics Toolkit is to define the contributing factors, major components, and their relationships, while providing the basic tools to take action based on the organization’s needs.
Moving Forward
The participant can take two directions in using the toolkit at this point. To continue with the data warehouse discussion, the next document of interest is, Developing Warehouse Capabilities, which is intended to be a step-by-step guide in creating a Big Data foundation in your organization. To learn more about data-related activities within an enterprise, see the presentation, Introduction to Data Analytics and Mining.
. Multiple templates have been created to support the process and aid organizations in their efforts to improve their Data Warehouse and Data Analytic capabilities.
Experience Understanding Data Warehousing
store.theartofservice.com/itil.html
Understanding Data Warehousing
Introduction
Introduction
Data has always been an essential ingredient to decision-making and, in modern business, the need to obtain, store, and use data has increased dramatically as the complexities and scope of the global marketplace has expanded.
Data warehousing is an environment established for the sole purpose of gathering, integrating, and delivering data from across multiple data sources for use in enterprise decision-making. However, its effectiveness can be expanded to support any person, process, or system needing current and historical data which is consistent and relatable.
Defining Data Warehouse
Data warehouse is a computing environment composed of several technologies and products, including:
Data Acquisition
Data Management
Data Modeling
Data Quality
Data Analysis
Metadata Management
Development Tools
Storage Management
Applications
Administrative Functions
Defining Data Warehouse (Part 2)
Data Warehousing is about managing the data. The following data features are key reasons for having a data warehouse:
Subject Orientation
Data Integration
Non-volatile
Time Variance
Data Granularity
Benefits of Data Warehousing
Data Warehousing provides the following benefits:
A comprehensive and integrated perspective of the enterprise
Availability of current and historical information for strategic decision making
Mitigating operational risks related to supporting the decision-making process
Providing a flexible and interactive source of information
Introducing Business Intelligence
Business Intelligence is a set of disciplines designed specifically to establish a consistent decision-making environment.
Business Intelligence does not replace Data Warehousing, but uses it extensively in it processes.
Business Intelligence can be described as a two-step process:
Transforming data into information
Transforming information into knowledge
Functional Components of a Data Warehouse
Physical Components of a Data Warehouse
Source Data
Data Sources can include:
Operational Data
Internal Data
Archived Data
External Data
Data can consist of structured or unstructured, prepared or raw formats.
Data Staging
The activities of data staging are:
Extracting data from the data sources
Transforming the data into usable information
Loading the data and metadata into data storage
ETL (Data Extraction, Transformation, and Loading) is considered the most time-consuming and human-intensive activities in data warehousing.
Data Quality
One purpose of Data Staging is to raise the quality of the data used in decision making: bad data will lead to bad decisions.
Data Quality is influenced by:
Inadequate database designs
Aging of data
Dummy or absent data
Non-unique identifies
Ineffective primary keys
Violation of business rules
Lack of policies and procedures
Input errors
Data Storage
Organizations must establish the storage requirements for:
Data staging
Corporate data warehouse
Individual data marts
OLAP-based multidimensional databases
Information Delivery
The requirements for Information Delivery reside in expectations related to:
Query types and frequencies
Report types and frequencies
Types of analysis
Distribution of information
Real-time requirements
Applications for decision support
Potential growth and expansion
Metadata
The core of the data warehouse is its
METADATA
What is a Data Mart?
A data mart is a subset of a data warehouse. A data warehouse will typically contain data relevant to the entire enterprise, while a data mart contains data relevant to a line of business or department within the enterprise.
Deployment of data warehouses and data marts will usually take one of the following approaches:
Top-down (data warehouse first, data marts second)
Bottom-up (data marts first, data warehouse second)
Data Warehouse Architecture
There are five basic architectures in data warehousing:
Centralized Data Warehouse – one data warehouse with no data marts.
Independent Data Marts – several autonomous data marts with no central data warehouse.
Federated Data Marts – several data marts operating under standardized controls with no central warehouse.
Hub-and-Spoke – several data marts with a central data warehouse.
Data-Mart Bus – several data marts are created and conform to the standards and controls of the original data mart.
Why Data Warehousing?
What does a data warehouse provide the user?
Ability to run simple queries and reports against current and historical data
Ability to perform “what if” scenarios
Ability to iteratively query and analyze deeper into the data
Ability to identify historical trends and apply them effectively to future situations.
Challenges in Data Acquisition
The typical challenges facing data acquisition activities are:
Large number of data sources
Disparate data sources
External data sources
Ongoing data feeds
Different computing platforms
Data replication
Data integration
Data cleansing
Complex data transformations
Challenges in Data Storage
The typical challenges facing data storage activities are:
Large data volumes
Large data sets
New data types
Data storage in staging area
Multiple index types
Parallel processing
Data archiving
Tool compatibilities
Challenges in Information Delivery
The typical challenges facing information delivery activities are:
Multiple user types
Multiple query types
Complex queries
OLAP
Multidimensional analysis
Web-enablement
Metadata management
Tools from multiple vendors
Relevant Data Warehouse Standards
Relevant standards for data warehousing, specifically metadata, are provided through:
Meta Data Coalition
Object Management Group
OLAP Council for Multi-dimensional Application Programmers Interface (MDAPI)
Basic Project Plan
The basic plan for a data warehouse project is:
Planning
Defining requirements
Design
Build
Deploy
Maintain
The Toolkit
The Toolkit is designed to be holistic to the enterprise’s relationship with data, not just data warehousing. As part of its scope, a second presentation is available to introduce Data Analytics and Data Mining, which is related to the second step of Business Intelligence.
The goal of the Data Warehouse/Analytics Toolkit is to define the contributing factors, major components, and their relationships, while providing the basic tools to take action based on the organization’s needs.
Moving Forward
The participant can take two directions in using the toolkit at this point. To continue with the data warehouse discussion, the next document of interest is, Developing Warehouse Capabilities, which is intended to be a step-by-step guide in creating a Big Data foundation in your organization. To learn more about data-related activities within an enterprise, see the presentation, Introduction to Data Analytics and Mining.
. Multiple templates have been created to support the process and aid organizations in their efforts to improve their Data Warehouse and Data Analytic capabilities.
The importance of data warehousing in the age of Big Data
In a sense, a data warehouse is the central repository for all the collected and stored information that a business creates and compiles. Moreover, analysis and reporting has always been a facet of most contemporary data warehousing strategies. So it’s a bit perplexing (for most people) when it comes time to draw a distinction between what might be categorized as a “data warehousing” vs. “Big Data” strategy. After all, we’re talking about two approaches to amassing data and analyzing it here.
The rise of Big Data actually refers to the situation that most businesses and organizations find themselves in these days; namely, that data is pouring in so fast that there’s confusion over where to put it. In other words, traditional data warehousing methods are being seriously strained when it comes to dealing with the constant onslaught of information being accumulated. Naturally, companies that specialize in dealing with Big Data are jumping at the opportunity to solve this problem, and we’re seeing quite a bit of growth across the board in this area as well (which is also good for the IT sector in general). However, this doesn’t mean that data warehousing itself is going the way of the dinosaur; if anything, it’s forcing everyone to reevaluate their systems and look for ways to integrate new tools.
Before we go any further, let’s actually identify what a “data warehouse” actually is. In short, a data warehouse is:
From Wikipedia:
“It is a central repository of data which is created by integrating data from one or more disparate sources. Data warehouses store current as well as historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.”
Basically, a data warehouse is a vital component of any business or organization and it is responsible for routinely providing critical insights to individuals in management positions. Likewise, data warehouses tend to be mostly on-site; meaning, companies set up and run them in a very direct, hands-on manner. But, as previously stated, problems are popping up because of the influence of Big Data.
So what’s the solution? In a nutshell, we’re seeing businesses turn toward integrating Big Data solutions into their data warehouse schemes. This is an especially interesting concept, especially when you consider the fact that more and more people are looking at cloud-based approaches to dealing with Big Data storage and analysis. Here’s a little snippet from Gartner research on the subject:
“In 2012 Gartner recorded a significant increase in inquiries from organizations seeking to deploy data warehouses and analytic data stores for the first time,” the report said. “This might sound incredible over 25 years into the data warehousing era, but we asked these clients several questions to confirm that they were contemplating truly ‘greenfield’ data warehousing initiatives.”
This information pretty much speaks for itself – the demand for 3rd party data warehousing services is definitely on the rise. So how does Big Data fit into all of this, you ask? Well, if organizations are interesting in warehousing and analysis, it only follows logic that they would want to take a look at some of the amazing developments in the field of Big Data with regards to cloud-based services. A great example of this would be Google BigQuery. In short, BigQuery is an SQL-based analysis and storage solution that offers users extremely fast results as well as hundreds of terabytes worth of storage space. Additionally, we’re talking about a “metered” service here, so you only pay for what you actually use, so the costs are going to be much lower than any potential investment you would have to make for building and maintaining on-site resources.
Tools like Google’s BigQuery can (and probably should) be used in tandem with a more traditional approach to data warehousing. In other words, institutions could look at diverting their data overflow to services like BigQuery while at the same time keeping “business-crucial” data on-site, and closer to home. Conversely, it’s entirely within reason to also assume that some type of integrated approach might also be attempted where everything flows through the data warehouse first and is subverted to cloud-based services used in processing and storing Big Data.
Regardless, this is pretty good news for IT professionals, because it means that there is a growing demand for people who know how to deal with data warehousing and/or Big Data. In truth, it might be recommended that a person complete some training and certification course in both areas, if for no other reason than become more versatile across a larger career space.
Need guidance? Click here for a great data warehousing toolkit
Click here for complete certification in Big Data
Storing Information Via Business Intelligence Data Warehousing
Because a company has so many things to do and so many outputs to consider wherever can they actually find the time and the means to organize every little (but important) piece of information they have every processed? The answer, of course, lies in the idea of business intelligence data warehousing. A business intelligence data warehouse can be called the main repository of the historical data of any organization that possesses it. One can even say that it serves as a silo of an enterprise s corporate memory. It holds many things, such as the raw materials that are used whenever the management needs to come up with decision support systems. There is a critical factor which leads to the usage of the data warehousing, and that is the data analyst himself can come perform many complex queries and solutions including data mining on the different type of information without needlessly slowing down the operations of the entire system or its many interwoven parts.
How does it work? For example, a data warehouse might prove useful to figure out which exact day of the week a particular company was able to sell a particular software during the year 2001 and in the month of May. It can also tell the difference between the amount of sick leaves of employees in California and employees in New York are from the years 1998-2001 during a week or two before the winter season. Data warehousing is important because it is able to provide services not just to the managers of the company, but other members of the entire organization as well.