DTD or XSD?

DITA vocabulary modules can be implemented using DTDs or XML schema documents (XSD). Which should you use?

At the time of writing, most DITA users use DTD-based vocabulary modules. However, you can use XSD-based modules if you want to or must in order to accommodate the tools you're using or the demands of a particular user community. For example, the Syntext Serna editor only uses XSDs to drive in-editor tag awareness (it can also use DTDs for validation, but not for in-editor tag awareness).

Which raises the question: should I use DTDs, XSDs, or both?

The short answer is "use DTDs unless you absolutely have to use XSDs" for now. This answer will hopefully change in the future.

The reason for this is simple: DITA's XSDs currently depend on the redefine feature of XSD. Unfortunately, the definition of redefine in the XSD 1.0 specifications is ambiguous to the point that different conforming XSD processors will produce different results for the same set of XSD documents. For this reason, the redefine feature is deprecated in XSD 1.1 (under development at the time of writing). Unfortunately, as currently formulated, DITA's XSDs depend on one particular interpretation of redefine, the one implemented by the Xerces 2.x parser.

This means that some conforming XSD processors will consider DITA XSDs invalid and will be unable to process them. This means in turn that you cannot reliably interchange XSD-based vocabulary modules and document type shells except to the degree that all the interchange partners are using compatible XSD processors.

However, because the Xerces parser does the right thing and because so many tools use the Xerces parser (or can use it), including the Open Toolkit, and because all the major commercial DITA-aware editors support DITA XSDs, you can create environments where XSD-based DITA documents can be processed reliably. But you cannot expect that all schema-aware XML processors that are not specifically DITA-aware will be able to process XSD-based DITA documents.

I find this frustrating because except for the redefine issue, I prefer XSD over DTDs for the following main reasons:
  1. XSDs are fully namespace aware and namespaces are good.
  2. XSDs are more expressive than DTDs.
  3. As XML documents, you can embed complete XML-based documentation in your XSD in a way you cannot in DTDs.
  4. As XML documents, XSDs are more convenient to process than DTDs (however, there are available DTD parsers that make processing DTDs almost as convenient as XSDs)

DITA 1.x can't use namespaces so that obviates the namespace advantage of XSDs (DITA's modularity features are the functional equivalent of how one can use namespace-based XSDs to create true document type modules, they just don't use namespaces to do it).

The XSD 1.1 spec, currently at working draft stage as of December 2009, defines a new feature, "override", that is intended to functionally replace the redefine feature and provide a more flexible extension mechanism, one that is a better match to what DITA needs. If the override feature works as we want it to (the DITA TC has provided input to the XSD Working Group on the override feature on DITA's specific requirements) and it is implemented by most or all XSD-aware processors, then it will be possible to rework the DITA XSDs to use override rather than redefine, at which point XSDs can become the obvious better choice for vocabulary module implementation.

Until that time, however, the easiest and safest route is to stick with DTD-based shells and vocabulary modules.