Skip to main content

Data warehouse operations


Introduction

Running the daily operations of a data warehouse is no easy task. Operators need a lot of knowledge on the internals of the ‘ELT’ processes to perform their work correctly. In most cases no monitoring- or auditing-facilities are available. What is available, is provided by the ELT-tool vendor or has been created by the operators themselves and mostly consist out of a bunch of SQL-scripts which they try to interpret.
Result is an error prone and diffuse manner of maintaining the operations. Only one, maybe two operators have an in-depth understanding of the processing.



Why do we implement data warehouses like that? Why is the operations-aspect almost ignored in the design and development of a data warehouse?
Well, I think we focus too much on what ‘hot’ technologies ELT-toolsets can give us, we do not adhere enough to the philosophy that everything we implement also needs to be maintained in production.
We lack standards and a framework for operations and maintenance of data warehouses.


Plea for an operations framework for data warehouses

No doubt, data warehouses are expensive beasts.

 


Not only building them, but also maintaining them. If we are not careful, the operations (over years) can be much more expensive then creating them.  Currently, in our striving to reduce costs we focus primarily on the implementation: agile management, automation, data vault etc. But after release into production we lose focus. We care less for the struggle operators face every day and the amount of time they spend on running and maintaining the system.
I think it is more than time to start using an operations-framework in our creation of data warehouses. A framework that is easy to use in the implementation and that provides a standard method of operations in production. Including a standard monitoring and auditing facilities.
 

 



Do we have standards or a definition for such an operations-framework? Not really, with luck some architects, designers introduce some tools for logging and/or monitoring and introduce some notion around separating data-flows from process-flows.

In a following article I will go into my definition of an advanced operation framework for data processing. Over the years, a framework that I have used in several projects. A definition of a framework that may help you start thinking more in the philosophy of ‘Operations start with the design’.






























Comments

Popular posts from this blog

Connecting SQL server to IFTTT

I know I promised a follow up on 'Aspects of Data warehousing' but I got side tracked. I have a fond interest in IoT and all those new technologies. Especially technologies like IFTTT and MQTT. The challenge is to bring these in relation to my work field: data warehousing and business intelligence. I just came up with the idea of linking an event in a database to blink a hue-lamp in my house. Crazy or not! But think of it, something happens during the nightly processing of the data warehouse and my hue-lamp at home turns red. Funny and useful. How to crack this challenge ... 1) we need a kind of FIFO in the database to register our events. 2) we want to use IFTTT, because IFTTT can control my hue-lights. 3) we need to create a bridge to link the dequeuing of the FIFO to IFTTT (built in Java) 4) we will use Microsoft SQL server as database (just as example) Lets define a table for our FIFO: CREATE TABLE [ttm].[ADM_TELEMETRY_LOG](     [ID] [numeric](8, 0) IDENTITY(1,1

Aspects of Data warehousing

When I google around, reading about data warehousing, I am amazed about the many aspects related to this subject and ditto discussions. Do we still know why we have all these discussions? What is so important about architecture, agile, data lakes, data vault 2.0, DevOps etc, etc? In my opinion we discuss and implement all these theories, methodologies and frameworks to be able to design and implement 'standardized' data warehouse solutions which are maintainable and sustainable in the long run. As an exercise over Christmas 2017, I created a mind-map containing many “keywords” buzzing around on fora, LinkedIn and articles related to data warehousing. It is not my intention to discuss the details of this mind-map; to be complete or to be correct. But I do like to point out something. For a long time, I have a nagging feeling that there are aspects which are ignored or not much talked about. Aspects which are at least so important as where the buzzing is about right now.