Topic Specialization Step 1: Design The Topic Element Types

For topic specialization you have to think about several things in designing your specialization:

What should the new topic element type name be?
What should be allowed within the topic body?
Should the topic allow any nested topics?
Does the topic require specialized topic-level metadata?

Of course, to answer these questions, you have to first understand the requirements, both for the information content and the information presentation and processing.

For this tutorial, our task is to create a specialization of <concept> that supports the requirements of FAQ information, that is, questions and answers.

Note: If you just want to know what the mechanics are and don't really care about how we arrived at the design used in the rest of the tutorial, you can skip on to step 2 at this point. But once you've worked through the tutorial I'd urge you to come back here and read this step all the way through.

I chose FAQs as the subject of the tutorial because they are both familiar to most Web users (and now even non-Web users), they are relatively simple (at least on their face) but not so trivial as to be boring, and they have some potential sophistications that could make for interesting exploration beyond the immediate task of "what are the mechanics of definining and implementing a new topic type?" In addition, there are any number of useful and reasonable ways that FAQs could be constructed using DITA and what is presented in this tutorial is only one, and not necessarily the best one.

For this tutorial I have decided that the each question should be a separate topic, rather than having one topic that contains multiple question/answer pairs. This design follows the general understanding that I've arrived at that making <topic> the primary unit of organization and granularity works well, even if it leads to topics that some people might initially or intuitively think were too small. But I wouldn't go to the mat to defend this design decision and won't claim it's necessarily the best. It has a logic I can defend but that's as far as I'll go.

As you work through the tutorial, take the time to ask yourself how you would have done it and why a different way would or wouldn't be better for some reason. This type of analytical thought is all part of understanding your requirements and mapping those requirements to implementations in order to ensure you have the most appropriate solution.

For the purposes of this tutorial, let us define an FAQ as a set of one or more question and answer pairs, where the question is a relatively short statement and the answer may be as short or long as needed. We would like the markup to reflect this essential nature, that is, there should be something named something like "question" and something named something like "answer". There are no particular requirements for the contents of answers themselves. We would like to be able to get a presentation where the question and answer are clearly identified, e.g., "Q. Question statement", "A. Answer response". Default topic presentation would not be sufficient in this case.

Note that these requirements are pretty simple. In any sort of engineering activity, it is best to start off as simply as you can and use iterative refinement to satisfy new requirements as you discover them. In the world of agile methods this is known as "the simplest thing that could possibly work".

This approach does several things: it lets you get something working quickly, it gives you immediate practical experience that will feed back into the design and implementation quickly, and it avoids designing and implementing things that you don't actually need. When designing XML markup it is quite easy to over-design and build complex markup structures that nobody actually wants or needs or, perhaps, can understand how to use. I've certainly done my share of this in the past. I now find it much more effective to start small and build up as needed. Often this refinement process all happens over the course of a few hours as I implement a new document type or specialization and start testing it with real data, sometimes it happens over weeks or months as the new markup design is tested by its target users. In any case, for this tutorial, we will start small and, once we have something working that minimally meets our requirements, we can start thinking about other things we might need.

Another characteristic of agile development methods is "test-driven development", that is, the use of test cases to drive the implementation, rather than implementing first and testing later. The basic idea is that you write the test case first, which will of course initially fail (because there's no code yet) and then you do the implementation until the test case passes, at which point you know you're done. The test cases reflect the requirements as you understand them at the time you write the test case (and if the requirements change, you update the test case to reflect your new understanding).

For markup design, this translates into creating document instances and then implementing the DTD or schema that will validate those instances. When the instances are valid, you know you're done (as long as your instances reflect all the important cases the schema needs to support). This is as opposed to simply going from requirements straight to markup declarations and then only creating instances after the fact, which is the way we had to do it back in SGML days. One of the nice things about XML is that you can have documents with no document type, so you can start with instances and add DTDs or schemas later.

Names are always important and in this case there is a slight problem with the name to use for the topic element itself, namely "faq", which would be the obvious choice. The problem is that the term "FAQ" can be read as either singular or plural (a set of questions) but here we want a single topic to reflect a single question/answer pair. The name "question-and-answer" might work except that that could also imply more of a test-type question environment than an FAQ environment. Thus I have arrived at the name "faq-question"—it's technically redundant but fairly clear and not too long:

<faq-question id="q1">
</faq-question>

The topic title will be the question statement and it's probably useful to specialize <title> to <faq-question-statement> to make that clear:

<faq-question id="q1">
  <faq-question-statement>Can I add attributes to specific element types?</faq-question-statement>
</faq-question>

In this case we don't have any particular requirements for the topic body content so we could leave <body> unspecialized, but since the body will be the FAQ answer, it makes sense to rename <body> to <faq-answer>.

Note that just "answer" is probably too generic—one challenge with DITA 1.x is that because you cannot use namespaces, all element types, including all specialized element types, must be unique. While you can't guarantee that your specialized types won't conflict with somebody else's, you should try to use names that are reasonably specific to your stuff. This can sometimes lead to names that are longer or more cumbersome than they would need to be if we could use namespaces in DITA 1.x. A good example is the element types defined by the DITA 1.2 Learning and Training vocabulary modules, which all start with "lc" (for "learning content"), which functions as a sort of "namespace prefix" and helps ensure that no other vocabularies will have names that collide with the Learning and Training types.

So our topic body should look like this:

<faq-question id="question-id">
  <faq-question-statement>Can I add attributes to specific element types?</faq-question-statement>
  <faq-answer >
    <p>No, you can only define global attributes, specialized either from &lt;base&gt;
    or @props.</p>
  </faq-answer>
</faq-question>

This design should be sufficient to get us going. Note that we are deferring all issues of how to organize the FAQ questions into FAQs using maps, where we know we already have everything we need to create sets of questions and to do things like group questions into titled groups.