Use of foreign keys

Posted Friday Friday, May 21, 2010 by Ian Pettman

I partake in several forums related to software development, database design and problem resolution. It ultimately helps improve our service to our Temp Agency and Staff Bank software customers.

Occasionally there are comments or threads that are (in my opinion) really well balanced appraisals of fundamentals. Kenneth Wilhelmsson recently wrote one such contribution and has kindly given permission for its reproduction here. I believe I am correct in saying Kenneth has been contributing wisdom to at least one of the major forums in the SQL community for around a decade.  I have taken the liberty of correcting the very occasional typo as this was initially contributed 'on the fly' to a SQL forum. 

Thank you Kenneth


Kenneth writes on use of foreign keys

Well, this is indeed an interesting subject. :o)
I'll try to ramble a bit in the philosophical vein...
This is also just my .02, so feel free to agree or disagree =;o)
First and foremost, I want to make clear that we don't 'use' keys.. They are there, whether we can see them or not.
Keys are a part of the model, so the concept of a key is abstract. A key is *not* an index or a constraint.
Regardless of prefix, such as 'primary', 'foreign', 'alternative' etc etc, a key is - a key - nothing less and nothing more. They are all the same in this respect.
Now, usually the debate is around how to *implement* keys in the best way. This is where we stumble into the realms of indexes, constraints et al. 
A quick demo: consider my table here:

CREATE table myTab ( col1 int not null, col2 int not null, col3 not null )


Now, if I were to say that this table has one primary key and one foreign key... Would I be lying?
No, I wouldn't. It is so, because I have designed the table and have also decided where my keys are.
The problem is though, that one can't see this without directly asking the designer (me).
This isn't very helpful, so it'd be better to document the keys in some way.
Also, there isn't anything visible at the table level that actually helps to enforce that my keys really do behave like keys.
How to do it then?
If nothing more is done, the keys are still there.  (They are part of the model/design remember?)
It's however a great risk that the table will become corrupted very quickly.
'Corruption' in this sense is when you end up with data that isn't conforming to the keys present, i.e. a duplicate or an orphan or any anomaly not intended.
So, one better need to implement these keys in some way, by placing the rules wanted at some level between the data and the 'other end' - i.e. input/output side.


At this point there may be a debate concerning 'performance', and you now have the two camps noted earlier.
I would argue that performance is always secondary to integrity, so better do it 'right'... or 'corruption' will happen.


Now, if we decide to place the rules far away from the data, then this will probably work for the purpose of keeping integrity...  as long as this path is *ALWAYS* used! In reality though, the farther away from the data DRI are placed, the higher the risk that someday, something, willingly or unwillingly something will slip into our database without passing those rules. This is one of the major downsides to 'application enforced DRI' - distance. Another downside is that the tables usually look like my example above. It doesn't tell you much information:  you have to go to the upper layer and start to unwind the app to figure out what the intentions really are. And, there's also the chance of actually *missing* on performance gains (regarding FK's, more on this later).
Perhaps it make for better sleep at night if the DRI is placed as close to the actual data as possible?
There's usually no debate regarding the key that is designated as 'primary'.
CREATE table myTab ( col1 int not null, col2 int not null, col3 not null )
CREATE unique index myInd on myTab(col1)
This would suffice as far as DRI on my designated PK is concerned.
It's however still not very informative. One can't really be sure what this is, except that it now behaves like a key.
Another thing that is now demonstrated is that our most efficient tool to implement a key is with the help of a unique index. (Clustered or not has absolutely no relevance in this regard.)
So, to better communicate our intentions, we can declare a constraint.
CREATE table myTab ( col1 int not null, col2 int not null, col3 not null )
Now there's no question about what kind of key it is.
The declaration clearly informs that it's designated as 'primary' it is a 'key' and it uses an index to enforce the behaviour such a key should have.
Ok, so similar concept for other kinds of keys, such as Foreign Keys.
This tells us clearly that there is a relation here with a foreign key (information) and it's tightly coupled to the table (reliable)
In essence a FK constraint is a CHECK constraint, and this is where it gets interesting performance-wise...
A check constraint (i.e. our FK here) can be one of two things: 'trusted' or 'not trusted'.
Try issuing this query:


SELECT OBJECTPROPERTY(object_id('FK_name'), 'CnstIsNotTrusted')
If it returns a zero (false), all is well. If it returns a 1 (true) then we have some issues.
Very briefly, if you create/enable a FK constraint *with validation* of existing data, then it will be 'trusted'
In all other cases it will be 'NOT trusted'
This is a huge issue for the optimizer among other things, and if done properly, may yield performance benefits as an added bonus.
For the absolute most parts, one would like to have all FK constraints to be trusted.
It means we can rely on it, and that there are no orphans present, for sure.

If it's not trusted, then we won't know for sure, and neither will the optimizer. It may even disregard the constraint altogether when it looks for a plan, resulting in a less optimal plan overall than otherwise.

If it is trusted, the optimizer also trusts this intelligence, and as a side effect may produce a plan that doesn't even touch the table that the FK references, even though the actual query may say something like:


SELECT a.col2 FROM myTab a JOIN someOtherTab b on a.col3 = b.colX


It depends on what is needed to be retrieved.
If a.col2 is sufficient, then why bother go see if a.col3 ties up with b.colX if the constraint which is present has already done this check? (This is assuming it can be trusted of course.)

So, for possible performance benefits, this is one that may happen in certain circumstances. At any rate, it's a benefit. It also would never ever occur when the DRI is placed at the application level. So, to end this, if anyone has persisted this far.. =;o)

I'm pro the camp that proclaims DRI enforcement at the database level.
It makes me sleep better =;o)

If you found this information useful, please share it!

0 comments for “Use of foreign keys”

    Comments are closed for this post

Contact Information

To find out more about Ava solutions you can contact us in a number of ways:
Follow Us...