I partake in several forums related to software development,
database design and problem resolution. It ultimately helps
improve our service to our Temp Agency
and Staff Bank software customers.
Occasionally there are comments or threads that are (in my
opinion) really well balanced appraisals of fundamentals. Kenneth
Wilhelmsson recently wrote one such contribution and has kindly
given permission for its reproduction here. I believe I am correct
in saying Kenneth has been contributing wisdom to at least one of
the major forums in the SQL community for around a
decade. I have taken the liberty of correcting the very
occasional typo as this was initially contributed 'on the fly' to a
SQL forum.
Thank you Kenneth
Kenneth writes on use of foreign keys
Well, this is indeed an interesting subject. :o)
I'll try to ramble a bit in the philosophical vein...
This is also just my .02, so feel free to agree or disagree
=;o)
First and foremost, I want to make clear that we don't 'use'
keys.. They are there, whether we can see them or not.
Keys are a part of the model, so the concept of a key is abstract.
A key is *not* an index or a constraint.
Regardless of prefix, such as 'primary', 'foreign', 'alternative'
etc etc, a key is - a key - nothing less and nothing more. They are
all the same in this respect.
Now, usually the debate is around how to *implement* keys in the
best way. This is where we stumble into the realms of indexes,
constraints et al.
A quick demo: consider my table here:
CREATE table myTab ( col1 int not null, col2 int not null, col3
not null )
Now, if I were to say that this table has one primary key and
one foreign key... Would I be lying?
No, I wouldn't. It is so, because I have designed the table and
have also decided where my keys are.
The problem is though, that one can't see this without directly
asking the designer (me).
This isn't very helpful, so it'd be better to document the keys in
some way.
Also, there isn't anything visible at the table level that
actually helps to enforce that my keys really do behave like
keys.
How to do it then?
If nothing more is done, the keys are still there. (They are
part of the model/design remember?)
It's however a great risk that the table will become corrupted
very quickly.
'Corruption' in this sense is when you end up with data that isn't
conforming to the keys present, i.e. a duplicate or an orphan or
any anomaly not intended.
So, one better need to implement these keys in some way, by
placing the rules wanted at some level between the data and the
'other end' - i.e. input/output side.
At this point there may be a debate concerning 'performance',
and you now have the two camps noted earlier.
I would argue that performance is always secondary to integrity,
so better do it 'right'... or 'corruption' will happen.
Now, if we decide to place the rules far away from the data,
then this will probably work for the purpose of keeping
integrity... as long as this path is *ALWAYS* used! In
reality though, the farther away from the data DRI are placed, the higher
the risk that someday, something, willingly or unwillingly
something will slip into our database without passing those rules.
This is one of the major downsides to 'application enforced DRI' -
distance. Another downside is that the tables usually look like my
example above. It doesn't tell you much information: you have
to go to the upper layer and start to unwind the app to figure out
what the intentions really are. And, there's also the chance of
actually *missing* on performance gains (regarding FK's, more on this later).
Perhaps it make for better sleep at night if the DRI is placed as close to the
actual data as possible?
There's usually no debate regarding the key that is designated as
'primary'.
CREATE table myTab ( col1 int not null, col2 int not null, col3
not null )
CREATE unique index myInd on myTab(col1)
This would suffice as far as DRI on my designated PK is
concerned.
It's however still not very informative. One can't really be sure
what this is, except that it now behaves like a key.
Another thing that is now demonstrated is that our most efficient
tool to implement a key is with the help of a unique index.
(Clustered or not has absolutely no relevance in this
regard.)
So, to better communicate our intentions, we can declare a
constraint.
CREATE table myTab ( col1 int not null, col2 int not null, col3
not null )
ALTER TABLE myTab ADD CONSTRAINT myInd PRIMARY KEY (col1)
Now there's no question about what kind of key it is.
The declaration clearly informs that it's designated as 'primary'
it is a 'key' and it uses an index to enforce the behaviour such a
key should have.
Ok, so similar concept for other kinds of keys, such as Foreign
Keys.
ALTER TABLE myTab ADD CONSTRAINT myFK FOREIGN KEY (col3)
REFERENCES someOtherTable(colX)
This tells us clearly that there is a relation here with a foreign
key (information) and it's tightly coupled to the table
(reliable)
In essence a FK constraint is a CHECK constraint, and this is
where it gets interesting performance-wise...
A check constraint (i.e. our FK here) can be one of two things:
'trusted' or 'not trusted'.
Try issuing this query:
SELECT OBJECTPROPERTY(object_id('FK_name'),
'CnstIsNotTrusted')
If it returns a zero (false), all is well. If it returns a 1
(true) then we have some issues.
Very briefly, if you create/enable a FK constraint *with
validation* of existing data, then it will be 'trusted'
In all other cases it will be 'NOT trusted'
This is a huge issue for the optimizer among other things, and if
done properly, may yield performance benefits as an added
bonus.
For the absolute most parts, one would like to have all FK
constraints to be trusted.
It means we can rely on it, and that there are no orphans present,
for sure.
If it's not trusted, then we won't know for sure, and neither
will the optimizer. It may even disregard the constraint altogether
when it looks for a plan, resulting in a less optimal plan overall
than otherwise.
If it is trusted, the optimizer also trusts this intelligence,
and as a side effect may produce a plan that doesn't even touch the
table that the FK references, even though the actual query may say
something like:
SELECT a.col2 FROM myTab a JOIN someOtherTab b on a.col3 =
b.colX
It depends on what is needed to be retrieved.
If a.col2 is sufficient, then why bother go see if a.col3 ties up
with b.colX if the constraint which is present has already done
this check? (This is assuming it can be trusted of course.)
So, for possible performance benefits, this is one that may
happen in certain circumstances. At any rate, it's a benefit. It
also would never ever occur when the DRI is placed at the
application level. So, to end this, if anyone has persisted this
far.. =;o)
I'm pro the camp that proclaims DRI enforcement at the database
level.
It makes me sleep better =;o)
/Kenneth