In a sense, a data warehouse is the central repository for all the collected and stored information that a business creates and compiles. Moreover, analysis and reporting has always been a facet of most contemporary data warehousing strategies. So it’s a bit perplexing (for most people) when it comes time to draw a distinction between what might be categorized as a “data warehousing” vs. “Big Data” strategy. After all, we’re talking about two approaches to amassing data and analyzing it here.
The rise of Big Data actually refers to the situation that most businesses and organizations find themselves in these days; namely, that data is pouring in so fast that there’s confusion over where to put it. In other words, traditional data warehousing methods are being seriously strained when it comes to dealing with the constant onslaught of information being accumulated. Naturally, companies that specialize in dealing with Big Data are jumping at the opportunity to solve this problem, and we’re seeing quite a bit of growth across the board in this area as well (which is also good for the IT sector in general). However, this doesn’t mean that data warehousing itself is going the way of the dinosaur; if anything, it’s forcing everyone to reevaluate their systems and look for ways to integrate new tools.
Before we go any further, let’s actually identify what a “data warehouse” actually is. In short, a data warehouse is:
From Wikipedia:
“It is a central repository of data which is created by integrating data from one or more disparate sources. Data warehouses store current as well as historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.”
Basically, a data warehouse is a vital component of any business or organization and it is responsible for routinely providing critical insights to individuals in management positions. Likewise, data warehouses tend to be mostly on-site; meaning, companies set up and run them in a very direct, hands-on manner. But, as previously stated, problems are popping up because of the influence of Big Data.
So what’s the solution? In a nutshell, we’re seeing businesses turn toward integrating Big Data solutions into their data warehouse schemes. This is an especially interesting concept, especially when you consider the fact that more and more people are looking at cloud-based approaches to dealing with Big Data storage and analysis. Here’s a little snippet from Gartner research on the subject:
“In 2012 Gartner recorded a significant increase in inquiries from organizations seeking to deploy data warehouses and analytic data stores for the first time,” the report said. “This might sound incredible over 25 years into the data warehousing era, but we asked these clients several questions to confirm that they were contemplating truly ‘greenfield’ data warehousing initiatives.”
This information pretty much speaks for itself – the demand for 3rd party data warehousing services is definitely on the rise. So how does Big Data fit into all of this, you ask? Well, if organizations are interesting in warehousing and analysis, it only follows logic that they would want to take a look at some of the amazing developments in the field of Big Data with regards to cloud-based services. A great example of this would be Google BigQuery. In short, BigQuery is an SQL-based analysis and storage solution that offers users extremely fast results as well as hundreds of terabytes worth of storage space. Additionally, we’re talking about a “metered” service here, so you only pay for what you actually use, so the costs are going to be much lower than any potential investment you would have to make for building and maintaining on-site resources.
Tools like Google’s BigQuery can (and probably should) be used in tandem with a more traditional approach to data warehousing. In other words, institutions could look at diverting their data overflow to services like BigQuery while at the same time keeping “business-crucial” data on-site, and closer to home. Conversely, it’s entirely within reason to also assume that some type of integrated approach might also be attempted where everything flows through the data warehouse first and is subverted to cloud-based services used in processing and storing Big Data.
Regardless, this is pretty good news for IT professionals, because it means that there is a growing demand for people who know how to deal with data warehousing and/or Big Data. In truth, it might be recommended that a person complete some training and certification course in both areas, if for no other reason than become more versatile across a larger career space.
Need guidance? Click here for a great data warehousing toolkit
Click here for complete certification in Big Data