Introduction
Running the daily operations of a data
warehouse is no easy task. Operators need a lot of knowledge on the internals
of the ‘ELT’ processes to perform their work correctly. In most cases no
monitoring- or auditing-facilities are available. What is available, is
provided by the ELT-tool vendor or has been created by the operators themselves
and mostly consist out of a bunch of SQL-scripts which they try to interpret.
Result is an error prone and diffuse manner
of maintaining the operations. Only one, maybe two operators have an in-depth understanding of
the processing.
Why do we implement data warehouses like
that? Why is the operations-aspect almost ignored in the design and development
of a data warehouse?
Well, I think we focus too much on what ‘hot’
technologies ELT-toolsets can give us, we do not adhere enough to the philosophy
that everything we implement also needs to be maintained in production.
We lack standards and a framework for operations and
maintenance of data warehouses.
Plea for an operations framework for data
warehouses
No doubt, data warehouses are expensive
beasts.
Not only building them, but also
maintaining them. If we are not careful, the operations (over years) can be much
more expensive then creating them. Currently,
in our striving to reduce costs we focus primarily on the implementation: agile
management, automation, data vault etc. But after release into production we lose
focus. We care less for the struggle operators face every day and the amount of
time they spend on running and maintaining the system.
I think it is more than time to start using
an operations-framework in our creation of data warehouses. A framework that is
easy to use in the implementation and that provides a standard method of
operations in production. Including a standard monitoring and auditing facilities.
Do we have standards or a definition for such an
operations-framework? Not really, with luck some architects, designers introduce some tools for logging and/or monitoring and introduce some notion around separating data-flows from process-flows.
In a following article I will go into my
definition of an advanced operation framework for data processing. Over the
years, a framework that I have used in several projects. A definition of a
framework that may help you start thinking more in the philosophy of ‘Operations
start with the design’.
Comments
Post a Comment