DITA as a Literate Programming Environment

End of the day and I'm waiting forever for my Other Computer to tell me just exactly how wrong I am.  In the mean time, I figured I'd quickly share a little trick.

Literate Programming Defined

Wikipedia does a better job than I ever will so go read the comprehensive definition at http://en.wikipedia.org/wiki/Literate_programming. To summarize:

  • Code and documentation cohabitate in the same files.
  • The system must allow the author to explore/document concepts in a manner that seems to them to "flow".  
  • The system provides for the extraction of the code fragments (tangling) and the extraction of the documentation (weaving) separately.
  • It is not simply the automatic extraction of documentation from commented code (javadoc, Doxygen) key point is that the documentation is not forced into the structure imposed by elements of code.

The Setup: using DITA to put database server code into Subversion

Upon inheriting a system composed of code written in a mixture of languages, I set out to put all the code in Subversion.  This was necessary because there were already a few copies and I didn't know which was the most recent.  The source code files were a natural fit of course, as were the various shell scripts.

The big stickler was: "what do I do about server side database code?"  An earlier version of the PL/pgSQL code (for a PostgreSQL database) had been checked into Subversion, but the developer had at some point taken the lazy route and started editing the code directly within the database admin tool (pgAdminIII).  If only there was only one database instance.  So now we had several database servers running code with the same function names, but slightly different function bodies.  How was I to get the current version into Subversion?

(Are you still with me?  My Other Computer may have gone to sleep but still has the "busy" indicator on, so there's still hope it will complete and tell me in what way I am wrong.)

This was really when I decided to use DITA for this project.  I decided I'd start by forming a one-to-one relationship between PL/pgSQL functions and Reference Topics.  And like a good reference manual, I made each topic look the same as all the others.  I put a one sentence <shortdesc> before the body and had the same number of sections with the same series of titles: "Overview", "Dependencies", "Pseudocode Presentation" and "PL/pgSQL Presentation".  There was really no need for me to specialize and since I was just learning DITA I didn't need to make the learning curve worse.

Breaking the code out in this way, particularly the exercise of generating the "Pseudocode" section, really helped me reverse engineer the system and gain a detailed understanding of what it did.

In my capacity as overlord and dictator, I declared the version of code integrated with DITA to be the reference copy and introduced it to Subversion.

And all was well with the world.  Our code was visible, it existed alongside an explanation of what it did, and it had <related-links> declared in the appropriate places.  The end.  Or maybe not...

Roundtrip: ...and then we started changing things.

(OK I'm starting to really worry about the process on the Other Computer.)

The funny thing about demystifying a system and dragging all its pieces out into the light where you can see them is that you can then start making it work better.  You can adapt it to new situations as they arise.  But this requires editing the DITA file, extracting the revised code, and uploading the code to the server.

(...and then there were the evil pragmatic systems types who preferred just to edit the function in the database's GUI admin tool.  Several public executions later, revised code made it to the DITA files and occasionally started being committed to Subversion.  This is a story for another time....)

To address this need I hearkened back to my days of using notangle and noweave (go read Wikipedia) to write and document Linux device drivers back in '97. Because the code of interest was already encapsulated in a <codeblock>, I thought it should be a relatively simple matter just to extract the contents of that codeblock.  And sure enough, it was.

Given a <codeblock> with a @platform attribute set to "sql" or "bash", the attached "weave.xsl" will produce a plain text file with just the important bits (e.g., the code). I configured my DITA-grokking XML editor to recognize/apply my transform at will.  These generated files found their way into my subversion repository and make for easy loading of code into the database using only the command line client.

It is only now that I write this that I realize I should have named my transformation "tangle.xsl" instead of "weave.xsl".  Oh well.

I started to find myself documenting in separate concept and task topics which are related to the actual code which does my bidding.  It's quite liberating. The end.  Enjoy the *.xsl and the example reference topic from my repository.

(I think I'm giving up on the Other Computer. Perhaps I will have to divine what is wrong by other means.)

weave.xsl409 bytes
pgsql-array_remove.xml2.4 KB
XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I