DITA Document Types, Configuration, and Specialization

In DITA a "DITA document type" is nothing more or less than a unique set of vocabulary and constraint modules used together in a document. For example, a <concept> document that uses the highlight and indexing domain modules (and no others) reflects the DITA document type consisting of the concept topic type module and the highlight and indexing domain modules. This combination can be expressed by the string "topic concept hi-d indexing-d", read as "the concept topic type, which extends the topic topic type, integrated with the highlight and indexing domains".

This simple list of module names tells you everything you need to know in order to know what the processing requirements for the document are and whether or not elements from another DITA document are or are not consistent with the elements in this document.

In DITA documents this declaration of the set of modules used is specified by the required @domains attribute. e.g.:
<concept id="topic-id"
  domains="(topic concept hi-d indexing-d)"
>
 <title>My Concept</title>
</concept>

Note that you don't need the actual DTD or XSD declarations for the modules, you only need to know the module names.

One implication of this is that DITA documents do not need to have literal DOCTYPE declarations or XSD schema associations as long as they specify the set of vocabulary modules they use. Likewise, when a document does have a DOCTYPE or schema assocation, it doesn't matter what DTD file or XSD document it uses as long as that DTD or XSD accurately reflects the set of modules the document declares it uses.

This means that DITA processors should never depend on the use of a specific DTD or XSD file because the use of a specific file means nothing. Two DTD or XSD document type shells that reflect the same set of modules define identical DITA document types. This is a fundamental difference between DITA and traditional XML and SGML applications, where the only thing you could know for sure was the specific DTD or XSD file a document used.

For this reason, any system that claims to be a general DITA-aware processor that also requires or expects the use of specific DTD or XSD files is fundamentally broken because it demonstrates a lack of understanding of how DITA document types work.

(But do keep in mind that the DITA way of viewing document types is so different from traditional XML practice that it's no surprise that tools and many practitioners would get it wrong, especially tools that reflect an SGML heritage, where the DTD was everything. Unfortunately, some of these tools reflect unfortunate architectural decisions made decades ago that are difficult or impossible to undo in order to fully support DITA's way of thinking about document types. That doesn't mean those tools are not useful or even compelling, just that they will be harder to adapt to locally-defined document types and non-standard-defined vocabulary modules.)

There are three types of modules that can be used to define a DITA document type:

In this module-based approach to vocabulary management there are two things you can do to create DITA document types: configuration and specialization.

The DITA standard defines specific structural, naming, and coding requirements for document type shells and modules that help ensure consistency of design and implementation and make it easy to combine modules into new document types. While these patterns are not strictly needed technically (they have no bearing on the syntactic validity or processability of DITA documents), they make it easier to use and re-use modules and generally keep things consistent. Once you understand the patterns and how the pieces fit together, you will see that creating new specializations and configurations is remarkably easy.

DITA is about interchange and that includes interchange of knowledge and interchange of implementation components, as well as interchange of content. DITA's modular vocabulary approach is designed in part to make the interchange of vocabulary as reliable as the interchange of content. A large part of this is simply standardizing implementation details so that having learned how DITA vocabulary implementation works you should be able to quickly apply that knowledge to any conforming DITA vocabulary, no matter how specialized.

Configuration

Configuration is the task of taking existing vocabulary and constraint modules and combining them together to define a specific DITA document type.

You do configuration by creating new document type shells, that is, DTD or XSD files that serve essentially as a manifests of the vocabulary modules that make up the DITA document types.

Configuration can also involve the creation of new constraint modules.

As an implementation activity, the creation of new document type shells is an entirely mechanical process that anyone can perform even if they have no knowledge of DTD or XSD syntax. These tutorials demonstrate the mechanical process. Likewise, because the process is entirely mechanical (meaning it requires no creative thought or invention), it can be automated, as it has been by Jarno Elovirta and his DITA DTD Generator (http://dita-generator.appspot.com/).

The development of constraint modules requires a bit more DTD or XSD knowledge, but it is also a largely mechanical process because it is always about removing or constraining existing things, not adding new things, so it does not require invention, only analysis of requirements and modification of existing declarations.

Specialization

Specialization is the process of creating new structural or domain vocabulary modules that provide new markup for specific requirements.

The essential aspect of specialization is that every element type or attribute defined in a vocabulary module must be based on and consistent with an element type or attribute defined in a more-general vocabulary module or in the base topic or map type.

This requirement ensures that any element, no matter how specialized, can always be mapped back to some known type and therefore understood and processed in terms of that known type. This ensures that all DITA documents, no matter how specialized, can always be processed in some way. That is, new markup should never break existing specialization-aware DITA processing.

Every element type exists in a specialization hierarchy, which goes from the base module (topic or map) through any intermediate modules to the element itself.

For example, if you defined a specialization of <concept> called <myConcept> it's specialization hierarchy would be <topic> -> <concept> -> <myConcept>. A processor given a <myConcept> document would be able to process it either as a concept topic or as a generic topic, as appropriate.

The magic of specialization is the @class attribute.

Every DITA element must have a @class attribute. The value of the class attribute is the specification of the specialization hierarchy for the element. The syntax of the @class attribute is:
  • A leading "-" or "+" character: "-" for structural types, "+" for domain types.
  • One or more space-separated module/element-type pairs: "topic/p", "topic/body", "hi-d/i", etc.
  • A trailing space character, which ensures accurate string matching on the last term in the hierarchy
For the <myConcept> topic type the @class value would be
"- topic/topic concept/concept myConcept/myConcept "
Which you read right to left as:
The <myConcept> element in the "myConcept" module, which specializes <concept> from the "concept" module, which in turn specializes <topic> from the "topic" module.
If the <myConcept> topic type defined a specialized body element, say <myConceptBody>, then it's @class value would be:
"- topic/body concept/conbody myConcept/myConceptBody "
Looking at an instance of the <myConcept> element you would find these @class attributes:
<myConcept id="topicid"
  class="- topic/topic concept/concept myConcept/myConcept "
>
  <title>My Concept</title>
  <myConceptBody
    class="- topic/body concept/conbody myConcept/myConceptBody "
  >
</myConcept>

Note that these are attributes of element instances. While we tend to think of the @class attribute as something that is set in DTDs or XSDs, that is merely a convenience. What's really important is that the attributes are available to XML processors, which will be the case whether they are defaulted in DTDs or specified explicitly in instances—the two are identical to XML processors.

The magic of the @class attribute is that specialized DITA documents can "just work" when processed by general-purpose specialization-aware processors, such as the DITA Open Toolkit.

One implication of this magic is that you can define new markup without the need to also implement all the different forms of processing that might be applied to that markup—it will just work. To the degree that your specialized markup doesn't require any specialized processing, then you will never need to implement any new processing for it.

If your specialized markup does require specific processing, DITA-aware tools will tend to make adding that processing easier because they tend themselves to be modular. For example, the DITA Open Toolkit provides a general plugin mechanism that makes it easy to implement and deploy specialization-specific processing that extends the out-of-the-box processing using the smallest amount of custom code possible.