Data Modelling in Practice

I try to keep my hand in at creating data models- for whatever strange reason I still enjoy sitting down with a set of requirements documents and working out the logical and then physical data model to meet those requirements (As a quick aside, I still prefer Oracle as my target database, but that is just because the oil business has been dominated by it for so long that it has become my ‘comfort food’ of database systems. Either that or the fact that there was a particularly hot woman who worked there and was one of my (many) not girlfriends. But I like to think it is the former reason)

I studied the theory behind data modelling whilst at University, but for the past 9 years, my approach has pretty much been do it this way (or that way) because it ‘feels’ right. If I had to put a formal qualification on my natural data modelling tendancies then I would have figured it was probably somewhere around 3rd Normal Form. Except that some recent work has caused me to reconsider- or at least hit the textbooks again to verify what level I normalise to.

First of all I had to compare the way an application was storing its data compared to how it used to in the old version. My first reaction was that the new way was simply correct and that is how I would have modelled it (in fact I could find an example of doing exactly that in one of my models). Imagine my surprise when I find out that the vendor’s model is actually in 5th Normal Form. Wow, so my gut feel approach actually skips over 3rd, BCNF and 4th and goes straight to 5th. Cool.

Except that now I have just finished reviewing a guideline document and apparently the simple act of adding the effective_date, expiry_date type of information that we all do means that I am now creating a 6th Normal Form. Even more cool, I am doing really smart things, without really knowing about it.

Except that when I go back to my textbooks and read up on these really rather academic discussions I am not entirely sure that I have even addressed all the subtleties of 5th Normal Form. Then I can’t even find any mention on 6NF in the text book (it was only published in 1994 after all, so there have been a number of advances since then) and have to hit the internet to find out what it means. Again, I find that I am not sure if what I do is actually meeting the subtleties of 6NF.

So I come to 1 of 2 conclusions:

  1. I am not normalising to 5NF or 6NF in which case I am being sold a bunch of horse…. by the vendors and technical authors who are trying to justify their existence, but don’t actually know what they are talking about. This is highly likely
  2. I am normalising to 5NF and 6NF and the academic understanding behind what I do is beyond my comprehension. I just understand that I do what I do

Option 2 is quite possible, after all, it is highly unlikely that Wayne Gretzky could explain why he was so much better at doing what he did on a hockey rink than anyone else, likewise for Pele on a football field or Picasso in front of a canvas etc etc. In which case I must be afflicted with a data modellers equivalent of the Photographer’s Eye. Which may well be the case- I have worked with people (1 person in particular) and we just saw the solution to a modelling problem in the same way. We couldn’t necessarily explain why we saw it that way, but invariably the solution stood up to whatever tests were thrown at it and eventually got accepted by the naysayers.

So, to bring this back to modelling in practice; if you find someone who is good at data modelling (and don’t take their word for it, get a few other data modellers to verify their work) they are probably doing many, if not all, of the things to create a model that is in 5th, 6th or even more Normal Form. They just won’t know why they are doing it that way. But, and this is crucial to the practicality bit: when I created the temporal aspect of a database, I occasionally had to go against every principle in my body and not only create effective_date and expiry_date but also an active_flag. Now, obviously there are situations where a row can be effective, not yet expired but not active- but in my case I had to add the extra attribute because the user could then simply click a check box in the UI and a trigger in the DB would populate the relevant date columns. So was I in 6th NF or not had I not even made it to 3NF? At the end of the day it didn’t matter, because the app met the user requirements, the developers could code against it and the DBA could manage the tables. And all of those issues combined is the true test of a practical data model