Specialize, but not right away

by Bruce Esrig, Information Architect
In order to adopt DITA, it is essential to take a position on specialization. What is the best way to start: by defining specialized types right away, or by using DITA without specialization until the right structures can be found?

Ideally, your content would be best if you could define its structure well in advance. But is this possible? Let’s look at how a typical adoption process might go.

Adoption process

1. Hear a lot about DITA.

DITA directly supports topic-oriented authoring. It has simple yet powerful mechanisms for organizing topics. Both hierarchical and cross-topic relationships can be managed outside the topic, permitting topics to be reused. Topics may be of three basic types, each with a comparatively simple internal structure. Existing solutions provide excellent support for on-line information delivery. Enhancements are coming that dramatically improve support for book-like outputs.

2. Evaluate existing content.

Some existing content is nicely divided into well-structured topics, but a lot of it is not. Converting existing content would require some re-structuring, but there are options: (a) convert the cleanest content first on a pilot basis, (b) convert lots of additional content, reworking it during the conversion process, (c) convert lots of additional content, doing a minimum of rework, and (d) keep the old content where it is and develop new content in DITA.

3. Make a conversion plan. 

Identify pools of content and what options to follow for each pool. Determine how to get output from each pool: (a) using open-source software, (b) using a vendor’s solution, and (c) using in-house resources.

4. Select a pilot pool of content and convert it.

This is the magical part we want to discuss!

5. Based on experience, revise the conversion plan. 

Decide in more detail what infrastructure needs you will have for each pool of content, and what the content design looks like for the content that is being converted.

6. Migrate authors to DITA as their content is converted.


When considering converting a pilot pool of content, the question of specialization becomes urgent. Is this a good time to specialize?

Let’s review the purpose of specialization, by looking at the following paragraph, which is adapted from a draft of the DITA 1.1 Architectural Specification.

Specialization allows you to define new kinds of information. It extends the base DITA language to express specialized information structures or to reflect the needs of specialized subject-matter domains while reusing the default DITA structural features and processing capabilities. Specialization is used to increase consistency or to provide cues for variations in the treatment of output.

For example, if there are output differences between two types of lists, a specialized list element can be defined for each type. To increase consistency, a sequence of specialized elements can be used to define a routine sequence of required and optional sub-headings.


There are two huge challenges in creating a specialization:

1.        How do you know whether a specialization that you create will be adequate for the entire range of content that you plan to process?

2.        Are you prepared to take the steps that would be required to support a specialization? 

First, unless you have a thorough understanding of the content that will be converted and the new content that will be written in DITA in the future, you may not be able to plan a specialization. So it may be better to run a pilot, develop best practices and supporting style guides, and then reflect the best of those best practices in a proposed specialization. Once you have the pilot content available in DITA, you’ll be in a better position to review the proposed specialization and determine whether to commit to it.

Second, there are two sides to specialization: authoring and processing. You will need to plan whether to implement the authoring side first, and roll out the processing customizations separately, or whether to roll both out together.

Discussion question

What are your experiences with adopting DITA, and where has specialization fit in for you?

In addition to the authoring and processing "sides of specialization" there's also the integration aspect. Consider an ID environment where the source of some key information that is delivered to the customer via an XML publishing pipeline is not DITA, but is XML. So the question becomes how you best merge that content into your DITA stream.

Obviously, a one-way transform immediately comes to mind ... just write an XSLT to refactor the foreign XML into a DITA topic. Most likely, such a transform is going to be "lossy" in that the information-specific semantic markup of the foreign XML schema will be converted into the generic markup of DITA, so it is very probable that you'll lose some metadata and/or semantic information in the process. So what happens if your information development processes require a round-trip for such information ... woops, since the one-way transform was lossy, we can't get back to the original.

In this case, a specialization may be the best way to handle the integration ... on all three axes:
  1. specialized DITA elements/structures preserve the full information set represented by the foreign vocabulary and thus enable two-way transforms
  2. specialized DITA elements/structures help authors capture all of the information required by the foreign vocabulary (maybe the DITA authoring tools even become the preferred method of data entry versus the legacy methods of creating the foreign content)
  3. specialized DITA elements/structures provide the processing pipeline with the extra semantic clues to present the foreign information more effectively than would be possible with generic DITA
This is exactly the sort of situation we intend to address in the soon-to-be-formed Semiconductor Industry Specialization Subcommittee. If anyone is interested in participating, please sign up!
another valid approach is to start from information product point of view. try to catch you're current information usage requirments and make sure all current known semantic is available.
most valid specialisation reasons derived from the usage of the information. if the basic topics doesn't provide the information you need for processing the information you'll fail.

most valid specialisation are those providing a better usage of the information created.


The ultimate function of prophecy is not to tell the future, but to make it. Your successful past will block your visions of the future.
XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I