Database geek observation

bflynn

Final Approach
Joined
Apr 24, 2012
Messages
9,543
Location
KTTA
Display Name

Display name:
Brian Flynn
Warning, this is a true geek joke.

My son is taking a summer course, this one on database design. They're in a section on schema, relationships, etc and of course the notion of normalized forms has come up, specifically 3rd normal form (3NF), a highly theoretical way of arranging data so you never duplicate values.

The "simple" explanation is that when values repeat - for example, in addresses, combinations of city / state / zip - you move that data off to a new table and refer to it by a key value, so that you never repeat that data anywhere except in that new table. It greatly complicates the tables in the database, but it does save disk space.

After struggling with it, reading multiple web pages, and watching multiple videos he concluded that whatever disk space has been saved by arranging data in 3NF has more than been lost to all the documentation explaining what 3NF is.

The more I think about that, the funnier it gets.

In 25 years of working with databases, I've never seen a database in full 3NF. The reason is that it just isn't worth it.
 
Warning, this is a true geek joke.

My son is taking a summer course, this one on database design. They're in a section on schema, relationships, etc and of course the notion of normalized forms has come up, specifically 3rd normal form (3NF), a highly theoretical way of arranging data so you never duplicate values.

The "simple" explanation is that when values repeat - for example, in addresses, combinations of city / state / zip - you move that data off to a new table and refer to it by a key value, so that you never repeat that data anywhere except in that new table. It greatly complicates the tables in the database, but it does save disk space.

After struggling with it, reading multiple web pages, and watching multiple videos he concluded that whatever disk space has been saved by arranging data in 3NF has more than been lost to all the documentation explaining what 3NF is.

The more I think about that, the funnier it gets.

In 25 years of working with databases, I've never seen a database in full 3NF. The reason is that it just isn't worth it.

It seems like for some of those items, the key values might be longer than the data itself, depending on how they are constructed.
 
Bad joke. And the better reason for 3NF is to make updates easier. Without it you have to know all the places the data is in order to update it, and do multiple updates in order to update them all.

Bottom line, there are pros and cons to every approach, and the “mission” determines which is best, except when none are best, and they all have their up/down side.

we used to make fun of the old flat file databases that had to use Stone Age techniques like partitioning to be performant. Now, we call them nosql databases and they are all the rage again.
 
Well 3NF is not about saving disk space, if that's what you think you're missing the point.
 
My opinion - this falls into the category of the perfect (theoretically speaking) vs. the pragmatic. Depends on the database you are designing. In my experience, some databases sacrifice 'perfection' for a better mapping to the domain(s) the database is supporting. Not a one-size-fits-all approach out there. Have witnessed and mediated (and been the final voice on) some good verbal brawls between data architects, programmers/developers and systems analysis over how databases should be designed. Again, my experience here only - it usually ends up being a compromise that is not technically 'perfect' but gets the system built to specifications on time and on budget.
 
Thank you for not posting this in the joke thread.

This could go in the joke thread....

Ok ok ok ok... stop me if you heard this one... have you heard about this?

Why did the database administrator leave his wife?...........
....... wait for it.........
......... She had one-to-many relationships.

OH! Hey thanks folks! That's all for my set, but I'll be back tomorrow night with all new material. Great audience!!
 
Wait! Don’t forget about 4NF, BNF and 5NF!! those are lots more fun.
 
No, no it’s 5G that causes Covid (and simultaneously tracks you if you’ve been microchipped with the vax). It’s why I only use a 4G iPad for foreflight — so The Man can’t track me.
 
Warning, this is a true geek joke.

My son is taking a summer course, this one on database design. They're in a section on schema, relationships, etc and of course the notion of normalized forms has come up, specifically 3rd normal form (3NF), a highly theoretical way of arranging data so you never duplicate values.

The "simple" explanation is that when values repeat - for example, in addresses, combinations of city / state / zip - you move that data off to a new table and refer to it by a key value, so that you never repeat that data anywhere except in that new table. It greatly complicates the tables in the database, but it does save disk space.

After struggling with it, reading multiple web pages, and watching multiple videos he concluded that whatever disk space has been saved by arranging data in 3NF has more than been lost to all the documentation explaining what 3NF is.

The more I think about that, the funnier it gets.

In 25 years of working with databases, I've never seen a database in full 3NF. The reason is that it just isn't worth it.

Every table in the database in third normal form? No, not generally. The transactional part of the database? Yes, generally, with a few exceptions.

What I'm working on right now is a way to synchronize the data in our customer prospect tables with an external email manager system. I'm pulling data from tables that contain information on prospects who have inquired about our product, which are in third normal form, and writing it to a denormalized table, which will be where the updates that go to the email manager will be pulled from. That denormalized table then gets update daily with changes from the transactional tables. The only reason for pulling all this into the denormalized table is so that we can quickly assemble what's needed to go to the email manager, as it runs every five minutes.

I'm very surprised to hear someone as experienced as yourself not recognizing the benefits of third normal form.
 
I'm very surprised to hear someone as experienced as yourself not recognizing the benefits of third normal form.

The clients don't care. The emphasis is on doing the job (2 of quick/accurate/cheap), not necessarily conforming to academic schema models. It's all a cost and whether it's 3rd or nth nf, nobody cares. It's hard to justify any extra work to achieve a standard that doesn't have an impact. The number of times I've had to update records can be counted on one hand.

So, I'll amend the joke - any savings from having databases in 3NF has been lost in discussing and documenting what 3NF is.
 
So many nerds in here. :D

This observation is why I have to deal with jank like Mongo. There's no spaghetti like unstructured "document database" JSON spaghetti.
 
So many nerds in here. :D

You're welcome.

This observation is why I have to deal with jank like Mongo. There's no spaghetti like unstructured "document database" JSON spaghetti.

All part of Eroom's Law - every 18 months, programmers get half as efficient. It's the counter to Moore's Law.

Honestly, the state of software development today makes me cringe to be in the industry. There's some brilliant stuff. But there's a whole lot of garbage and much of it is so insecure that it shouldn't be allowed within 230' of a wifi signal.
 
Every table in the database in third normal form? No, not generally. The transactional part of the database? Yes, generally, with a few exceptions.

What I'm working on right now is a way to synchronize the data in our customer prospect tables with an external email manager system. I'm pulling data from tables that contain information on prospects who have inquired about our product, which are in third normal form, and writing it to a denormalized table, which will be where the updates that go to the email manager will be pulled from. That denormalized table then gets update daily with changes from the transactional tables. The only reason for pulling all this into the denormalized table is so that we can quickly assemble what's needed to go to the email manager, as it runs every five minutes.

I'm very surprised to hear someone as experienced as yourself not recognizing the benefits of third normal form.
Denormalized table ---- aka "excel spreadsheet"
 
So many nerds in here. :D

This observation is why I have to deal with jank like Mongo. There's no spaghetti like unstructured "document database" JSON spaghetti.
True....but the real issue is what the data is, and how it's used. If everything is highly structured (e.g. phone book info, sales, billing, demographics like student records, etc) then relational DB is better. If very unstructured, then NoSQL (mongodb, hadoop, any of the other 4 categories of NoSQL) works best. It's like the "what's the best programming language?" question - answer? There's really 2:

1) the one that is best suited to the problem
2) the one they're paying me to use

you can really tell I teach this stuff.....

And I am NOT a nerd...I am a Geek.

https://en.wikipedia.org/wiki/Nerd

https://en.wikipedia.org/wiki/Geek

"Age of the geek, baby"
 
Denormalized table ---- aka "excel spreadsheet"


That's pretty much what I'm making. It gets saved as a .csv and dropped on the email manager's FTP site, they use it to create email that gets sent to people who have requested information on our product. It's very straightforward to build, if a bit time consuming because there are something like 120 columns in the table.
 
Back
Top