Skip to main content

Aspects of Data warehousing

When I google around, reading about data warehousing, I am amazed about the many aspects related to this subject and ditto discussions. Do we still know why we have all these discussions? What is so important about architecture, agile, data lakes, data vault 2.0, DevOps etc, etc?

In my opinion we discuss and implement all these theories, methodologies and frameworks to be able to design and implement 'standardized' data warehouse solutions which are maintainable and sustainable in the long run.

As an exercise over Christmas 2017, I created a mind-map containing many “keywords” buzzing around on fora, LinkedIn and articles related to data warehousing.


It is not my intention to discuss the details of this mind-map; to be complete or to be correct. But I do like to point out something. For a long time, I have a nagging feeling that there are aspects which are ignored or not much talked about. Aspects which are at least so important as where the buzzing is about right now.

For unknown reasons we do not have firm theories about Control-management. How do we make systems that are easy to maintain. Where are the execution-frameworks? Which methodology to follow to implement an 'abstraction' for the control of process-execution dependencies, restart-ability, crash-recovery etc, etc?

And, we talk a lot about data quality and control, but where are the frameworks? Where are the theories and methodologies to implement into a solution?

I am curious: am I the only one who sees this? Are you also recognizing these 'neglected' aspects and what are you doing about them?

I know a lot of '?'-signs in this article. My intention is to publish more on these 'neglected' aspects and fuel the discussion.



Comments

Popular posts from this blog

Data warehouse operations

Introduction Running the daily operations of a data warehouse is no easy task. Operators need a lot of knowledge on the internals of the ‘ELT’ processes to perform their work correctly. In most cases no monitoring- or auditing-facilities are available. What is available, is provided by the ELT-tool vendor or has been created by the operators themselves and mostly consist out of a bunch of SQL-scripts which they try to interpret. Result is an error prone and diffuse manner of maintaining the operations. Only one, maybe two operators have an in-depth understanding of the processing. Why do we implement data warehouses like that? Why is the operations-aspect almost ignored in the design and development of a data warehouse? Well, I think we focus too much on what ‘hot’ technologies ELT-toolsets can give us, we do not adhere enough to the philosophy that everything we implement also needs to be maintained in production. We lack standards and a framew

Connecting SQL server to IFTTT

I know I promised a follow up on 'Aspects of Data warehousing' but I got side tracked. I have a fond interest in IoT and all those new technologies. Especially technologies like IFTTT and MQTT. The challenge is to bring these in relation to my work field: data warehousing and business intelligence. I just came up with the idea of linking an event in a database to blink a hue-lamp in my house. Crazy or not! But think of it, something happens during the nightly processing of the data warehouse and my hue-lamp at home turns red. Funny and useful. How to crack this challenge ... 1) we need a kind of FIFO in the database to register our events. 2) we want to use IFTTT, because IFTTT can control my hue-lights. 3) we need to create a bridge to link the dequeuing of the FIFO to IFTTT (built in Java) 4) we will use Microsoft SQL server as database (just as example) Lets define a table for our FIFO: CREATE TABLE [ttm].[ADM_TELEMETRY_LOG](     [ID] [numeric](8, 0) IDENTITY(1,1