Public Identifiers

You may have noticed in my examples that the public identifiers I use are URNs, not SGML-style public identifiers as used for the standard DITA modules.

This is because public identifiers are nothing more than magic strings, so it absolutely doesn't matter what syntax you use as long as it's reasonably likely to be globally unique. The only XML-defined requirement is that the public ID consist of the characters allowed by the production "PubidChar" in the XML standard (essentially characters allowed in URIs).

I prefer URNs over SGML-style public IDs for two reasons:

It's the 21st century and DITA and XML are Web standards. URIs (and thus URNs) are the Web way of giving globally-unique names to resources.
For XSDs you have to use URIs, so why not be consistent in your global naming syntax?

The use of public identifiers is pretty standard practice in XML and in the DITA community especially. However, in XML, public identifiers are completely pointless.

In XML, you must always have a system identifier. Even if you have a public identifier, you must also have a system identifier. Which immediately raises the question of why have a public identifier at all?

Why indeed?

I used to argue exactly that: that public IDs were pointless, that there was no useful difference between having a public ID and using a URN as your system ID because neither can be resolved directly and thus both require some sort of mapping and entity resolution catalogs can map both public and system IDs with equal facility. This is all true.

In addition, in an environment where document type shells and modules will be deployed to many locations (many different Toolkit instances) it is absolutely necessary that all references to shells and module components be indirect and that everything be properly mapped. Thus having directly-resolvable system IDs would be counter productive—you want system IDs that cannot be resolved directly so that any mapping configuration bugs cause early and immediate failure in your development environment. (This is why I make a point of ensuring that the system IDs in all of my shell document types consist of just the filename of the target module, regardless of where it might be relative to the using module—this ensures it won't be resolvable and thus mask a catalog mapping bug.)

Yet, you will notice that in all the examples in this book I use public identifiers? Why?

The answer is simply that the use of public identifiers is so ingrained and, in some cases, required by tools even when it shouldn't be (especially tools with an SGML legacy), that it simply proved too quixotic to stick to my principle and not provide or use public IDs. So I use them even though they are totally pointless. But I use URNs partly to subtly make the point that they are pointless because its more obvious that a public ID that is a URN is functionally identical to a system ID that is a URN (because they both use the same syntax and both require mapping in order to be resolved).

Whatever you do do not use URLs for public identifiers. It runs the risk of systems trying to resolve them. Always use URNs or SGML-style public IDs.

And please remember that in DITA the public ID for a document type shell or module means nothing. The only thing that matters is the value of a document's @domains attribute. DTDs and XSDs are just a convenience for authoring and (weak) validation and nothing more.

Any DITA tool (or, for that matter, any XML system generally) that puts too much emphasis on public IDs, and especially on the public IDs of document types, is fundamentally broken because it reflects a misunderstanding of what DTDs do and don't represent and, in DITA especially, a misunderstanding of what constitutes a DITA document type.