DITA

RDF lists and SPARQL

bobdc.blog - Mon, 2014-04-21 12:35
Not great, but not terrible, and a bit better with SPARQL 1.1 Bob DuCharme http://www.snee.com/bobdc.blog
Categories: DITA

Of conferences and webinars on lightweight DITA and such

MP - Fri, 2014-04-18 14:10

First a link to the presentation on DITA and marketing content I co-presented at Intelligent Content:

http://www.slideshare.net/IntelligentContent/dita-for-marketing-content

And then a heads-up that I'll be giving a more detailed overview of lightweight DITA at a webinar on May 14th:

http://www.congility.com/webinar/xdita-and-hdita-marrying-lightweight-di...

And finally, I'll be presenting at the Congility conference in the UK June 18-20. You can get 30% off the registration by using this code:

MPR14SPK

read more

Categories: DITA

Overview of Lightweight DITA (XDITA and HDITA)

MP - Fri, 2014-04-11 16:36

The goal of this proposal is to align a lightweight DITA profile in XML with an equivalent markup specification based on HTML5. This is not a complete specification, just something to start the discussion going. There's still lots of room for change, as well as for adding specific mappings for additional semantics for learning and training content, epubs, or other formats.

read more

Categories: DITA

Tearing down obstacles to OpenStack documentation contributions

JustWriteClick - Thu, 2014-04-10 16:17

Rip. Shred. Tear. Let’s gather up the obstacles to documentation contribution and tear them down one by one. I’ve designed a survey with the help of the OpenStack docs team to determine blockers for docs contributions. If you’ve contributed to OpenStack, please fill it out here:

https://docs.google.com/forms/d/136-BssH-OxjVo8vNoOD-gW4x8fDFpvixbgCfeV1w_do/viewform

barriers_sameleighton
I want to use this survey to avoid shouting opinions and instead make sure we gather data first. This survey helps us find the biggest barriers so that we can build the best collaboration systems for documentation on OpenStack. Here are the obstacles culled from discussions in the community:

  • The git/gerrit workflow isn’t in my normal work environment
  • The DocBook and WADL (XML source) tools are not in my normal work environment
  • My team or manager doesn’t value documentation so we don’t make time for it
  • Every time I want to contribute to docs, I can’t figure out where to put the information I know
  • When I’ve tried to patch documentation, the review process was difficult or took too long
  • When I’ve contributed to docs, developers changed things without concern for docs, so my efforts were wasted
  • Testing doc patches requires an OpenStack environment I don’t have set up or access to in a lab
  • I think someone else should write the documentation, not me
  • I would only contribute documentation if I were paid to do so

Based on the input from the survey, I want to gather requirements for doc collaboration.

We have different docs for different audiences:

  • cross-project docs for deploy/install/config: openstack-manuals
  • API docs references, standards: api-site and others

These are written with the git/gerrit method. I want to talk about standing up a new docs site that serves our requirements:

Experience:
Solution must be completely open source
Content must be available online
Content must be indexable by search engines
Content must be searchable
Content should be easily cross-linked by topic and type (priority:low)
Enable comments, ratings, and analytics (or ask.openstack.org integration) (priority:low)

Distribution:
Readers must get versions of technical content specific to version of product
Modular authoring of content
Graphic and text content should be stored as files, not in a database
Consumers must get technical content in PDF, html, video, audio
Workflow for review and approval prior to publishing content

Authoring:
Content must be re-usable across authors and personas (Single source)
Must support many content authors with multiple authoring tools
Existing content must migrate smoothly
All content versions need to be comparable (diff) across versions
Content must be organizationally segregated based on user personas
Draft content must be reviewable in HTML
Link maintenance – Links must update with little manual maintenance to avoid broken links and link validation

Please take the survey and make your voice heard! Also please join us at a cross-project session at the OpenStack Summit to discuss doc contributions. We’ll go over the results there. The survey is open until the first week of May.

Categories: DITA

How to Build OpenStack Docs and Contributors through Community

JustWriteClick - Fri, 2014-03-21 20:25

I’m well past the three year mark, working on a new open source project that grows and grows every six months. I’ve been working closely with Diane Fleming at Rackspace to focus completely on upstream OpenStack. Upstream means that all of our documentation work goes to the open source project itself. So while Rackspace runs OpenStack in production and for our customers private clouds, Diane and I focus on documentation that helps any organization run and use OpenStack. We have put together an outline of what we do to make upstream OpenStack documentation better all the time.

Growth, scaling, and related challenges

When I started, there were just two projects with two APIs. Now we have 130 git repositories fostered by over twenty related programs. As you can imagine this scenario causes scaling difficulties but we are bravely making our way. Here are some of the challenges we have faced and what we’ve done to lessen the pain of coordinated, collaborative technical documentation in an open source community.

conversationplus_black_white

In the face of language and technical barriers, the OpenStack docs team used a combination of IRC meetings, documentation boot camp, Google hang-outs, a busy docs-team mailing list, and other methods to create a flourishing, global team of writers and technical contributors who have immensely improved the OpenStack docs in the last couple of years. The release that went out in Spring of 2013, three people wrote half of the docs. For the release that went out in Fall of 2013, seven people wrote half the docs. I can’t wait to see what our numbers are for the release going out next, on April 17th.

Team-building tools – the good, the bad, and the ugly

Here are some benefits and pitfalls of these tools:

  • IRC: Pros – clear agenda, follow-through week-to-week, global participation now that we have APAC and North American meetings. IRC meeting bogs enable an automated log of minutes. Cons: difficult to find agreeable time, no face-to-face, hard to introduce new topics.
  • Office hours: Seemed like a good idea but fell by the wayside, we did not have much attendance. As our team grows, people can stop by the IRC channel at any time to get one-on-one help.
  • Google Hangouts: With the video and voice enabled, it is nice to see each other’s faces without having to travel. Hard to find an agreeable time with the round-the-world team.
  • Boot camp: Extremely positive experience – spawned new ideas and new connections. Downside is the cost/time factor. We had great survey responses but decided we didn’t need one every six months.
  • Mailing list: Good way to resolve immediate issues and gather consensus as well as multiple view points.
  • Book sprints: Good way to get a needed book written and distributed. Not a good way to build ongoing community for maintenance, and in some ways you have to be careful not to build a book that no one else thinks they should contribute to. But community is just one part of good docs – this is a good complement to other efforts.

OpenStack docs – before and after

How have the docs changed due to team building?

  • Cleaner, more compact library. We did a huge refactor prior to the Boot Camp, which has made it easier to specify what types of content goes where.
  • Better writing. Professional technical writers have done an amazing job avoiding “frankendoc” — by editing, reviewing, and polishing with an agreed-upon style guide, we improve the actual writing to better serve readers and users.
  • Better technical content. In OpenStack, teams have core reviewers and most teams require two core reviewers must approve a change before it gets built to the published site. As we expand our reviewers (not just core but many reviewers) our docs have improved technically.
  • Better automation. By writing tools that scrape the code for docstrings we are able to keep up with fast-moving projects that release every six months.
  • Timing with releases. We carefully scope what documents are considered tied to a release. The Install Guides and Configuration Reference are the only two books built from a set release. All other documents are continuously published.

OpenStack contributors – before and after

How have the relationships among contributors and contributors’ roles changed due to team building?

  • Much greater communication among contributors – now we know each other on a more personal level, and feel more comfortable working together
  • Contributors have found where their strengths lie in the community. Some people are more tools and gear heads and build gates and tests and build tools. Some people are natural editors and review heavily with suggested edits. Some people are blue-sky visionaries. Some people are heads-down system administrators and architects. There’s a place for everyone, not just writers.

What’s to come?

We want to discuss revisions to our original vision openly. This blog post is a starting point, but we are listening on all available channels. At the OpenStack Summit in Atlanta in May, I want to collaborate on a request for proposals for a new front-end design for our documentation that can help us make changes to how docs are authored. I’d like to find out more about ways to enable non-CLA contributors to the docs. I have lots of ideas and look forward to working with this amazing group to improve our processes and results.

conversation_black_white

Categories: DITA

A New Brand of Marketing – a must read for executives

Those of you who appreciated Scott Brinker’s Gilbane Conference keynote What is a Marketing Technologist?, and even more importantly those who missed it, should check out Scott’s short new book, A …
Categories: DITA

Easier querying of strings with RDF 1.1

bobdc.blog - Sat, 2014-03-08 15:09
In which a spoonful of syntactic sugar makes the string querying go down a bit easier. Bob DuCharme http://www.snee.com/bobdc.blog
Categories: DITA

Candid Camera Moment at Intelligent Content 2014

The Content Wrangler - Fri, 2014-03-07 09:19

If you missed the closing presentation at this year’s SOLD OUT Intelligent Content Conference in San Jose, CA you missed an amazing performance by Candid Camera veteran, Durwood Fincher, aka Mr. Doubletalk. Durwood played a few tricks on the conference crowd. Here’s a few of the excerpts for your viewing pleasure.

#ICC2014 – San Jose, CA – February 28, 2014 from Durwood Fincher on Vimeo.

Categories: DITA

[BOOK] “The Language of Content Strategy” by Scott Abel and Rahel Anne Bailie

The Content Wrangler - Thu, 2014-03-06 09:14

It’s a book. It’s an eBook. It’s a website. It’s a deck of terminology cards. In fact, it’s all of these things and an introductory lexicon for our discipline. What is this amazing creation? It’s The Language of Content Strategy, the latest project from The Content Wrangler.

TLOCS Cover 268pxThe Language of Content Strategy (February 2014, XML Press) is the gateway to a language that describes the world of content strategy. Co-produced by Scott Abel (The Content Wrangler) and Rahel Anne Bailie (Intentional Design), and with over fifty contributors, all known for their depth of knowledge, this collection of terms forms the core of an emerging profession and, as a result, helps shape the profession. The terminology spans a range of competencies within the broad area of content strategy.

There is a recognition that content strategy is about the care and delivery of content at all points in its lifecycle, from its planning and creation right through to its sunsetting, and all stages in between. This book also recognizes that content gets delivered in many markets, in many languages, and to many devices.

A common vocabulary is an important aspect of the maturation of a discipline. A lexicon helps professionals across all industries, from clients and colleagues who need common terminology to have effective conversations to internal stakeholders who have diverse technical backgrounds. (What do you mean by transclusion?)

TLOCS card photo with handA lexicon helps hiring managers explain what they’re looking for in a job candidate and recruiters find the candidates most suited for particular projects. (Here’s what my client expects from a content strategist.) It helps students who are discovering the discipline and helps instructors who need to convey concepts that will be understood in the marketplace. (What your employer will expect in a message architecture.)

We expect this vocabulary to grow, progress, and change as time goes on, the discipline develops, and industry demands more from content strategy. This foundational work allows practitioners to conduct meaningful conversations, engage in healthy debates, and build on existing concepts and ideas. This is an opportunity to expand our vocabulary, our opportunities, and our worlds.

This book, and its companion website and terminology card deck, is an invitation to readers to join the conversation. This is an important step: the beginning of a common language. Using this book will not only help you shape your work, but also encourage you to contribute your own terminology and help expand the depth and breath of the profession.

Check it out and buy a copy (or three) today!

Categories: DITA

Shockproofing Your Content Marketing: Step Outside The Marketing Comfort Zone

The Content Wrangler - Wed, 2014-03-05 17:58
Scott Abel

Scott Abel

By Scott Abel, The Content Wrangler

The web is littered with malarkey. Meaningless talk. Nonsense. And not the type offered up as entertainment by comedians like Mr. Doubletalk. More often than not, web-flavored malarkey is designed to attract attention. To get your juices flowing. To engage. To entertain.

There’s nothing wrong with creating juice-inducing, extremely engaging content—it’s every marketers goal. But, it helps if the content you create is informed by the world around you.

One of the biggest challenges facing content marketing as a discipline is a lack of understanding of things outside of what I like to call, the marketing comfort zone. Things like science and technology often fall in this zone. Marketers need to be a lot more aware of what’s happening in these disciplines. Advances in science and technology have a direct impact on the work we do, like it or not.

I’m not trying to insult anyone. I’m just pointing out that some of the biggest mouthpieces in the marketing space tend to wax poetic without the benefit of knowledge from folks outside their own discipline. And, that’s got to stop if you want people to take you seriously.

Content Shock? I Don’t Think So

Consider a recent blog post by Mark Schaefer that reasons—based on the economic principle of supply and demand—that we’ll have to pay people to read our content in the future. He says this will be necessary “when exponentially increasing volumes of content intersect our human capacity to consume it.” Simply put, his argument is that there’s so much content (much of it free) that if everyone starts creating it (and giving it away for free), that there will be too much content. He refers to the problem as content shock. Others have dubbed it attention crash, but you likely know this concept already by its now familiar moniker: information overload.

Schaefer is right about one thing. Information overload is a reality. But, it’s not a new one and his argument that we’ll have to pay people to consume it comes from outdate thinking and it assumes that innovations in content marketing won’t provide sufficient value to continue engaging those we hope to attract and retain. Paying customers to consume content is also contrary to the practice of content marketing, the definition of which I provide here as a reminder of what content marketing is actually about.

What is Content Marketing, again?

 According to the smart folks at Content Marketing Institute, content marketing is “a marketing technique of creating and distributing relevant and valuable content to attract, acquire, and engage a clearly defined and understood target audience – with the objective of driving profitable customer action. Content marketing’s purpose is to attract and retain customers by consistently creating and curating relevant and valuable content with the intention of changing or enhancing consumer behavior. It is an ongoing process that is best integrated into your overall marketing strategy, and it focuses on owning media, not renting it.”

Where does it say this is a technique in which we pay consumers to read our content. It doesn’t. And, it’s not going to. Some may try that approach (good luck), but that’s not where the future will lead us. For those of us on the content engineering side of content marketing, we know better.

Now you will, too.

One Example – Infographics

Content engineers—folks who understand how content and technology work together—aim to help create innovative content solutions designed to assist their clients in differentiating themselves from the competition. Engineering-minded content strategists and choreographers look for ways to connect data with documents in innovative and meaningful ways.

Consider infographics. Today, infographics are most often provided by content marketers as a way to attract audience. They’re usually visual snapshots of random facts and figures about a specific topic. They can take the form of timelines, feature location-based data, or attempt to compare and contrast different facts, figures, or approaches. They can be interesting, but they’re not all that engaging. As everyone jumps on the infographic bandwagon, their novelty—and attractiveness—diminishes. That’s likely happening already. But, it doesn’t have to be that way.

infographic

Infographics suffer from being dead on arrival. They’re static documents. A picture of a moment in time. They’re more meaningful as a historic record, than they are useful business decision-making tools. And, they decrease in value over time, which means your content marketing machine will need to continue to crank out more and more infographics in order to attract audience.

But what if you could add the magic of programming to the mix? What if your infographics were living documents connected to real-time data feeds. What if they changed and morphed over time—automatically?

The folks at MIT Simile Project envisioned a solution to this—and other—information visualization problem years before anyone else did (circa 2006). No, they’re not marketers. They’re content engineers who envision ways of presenting information dynamically, most often in a web browser, although their approach does not need to be limited to the web. The examples they make available on the web are designed to spark your imagination. They’re works created to get you thinking about the possibilities.

Here are a few examples:

Billionaires in History

US Cities by Population

Recent United States Senate Bills

JFK Assassination Timeline

Breakfast Cereal Character Guide

Dozens of other organizations exist to help us solve this challenge. The folks at InfoActive are working to create a platform that will help us connect data to infographics.

And, one needn’t look only to academics for ideas. While they kick-started the movement, the ideas they offer up as samples have already been implemented in some pretty exciting ways by forward-thinking content marketers.

Here are a few examples:

In the Red Zone (American football fans love this one!)

Honest Tea, National Honesty Index (who is more honest, you or your friend?)

The Reality

We can (and should) borrow ideas, techniques, and tools from the content engineering folks to create exceptional content marketing experiences that keep us connected to our audience and prevent the onset of content shock, attention crash, and information overload. Those who believe that too much information can cripple content marketing are not thinking outside the box. They are trapped in the marketing comfort zone.

If we continue to create the same old, tired, static content as everyone else, we’ll bore the audience to death before we ever overload them. Our audience is not stupid. They have choices. They migrate their attention to content that provides the reward they’re looking for. It’s our job to innovate—to produce exceedingly awesome content designed to amaze and delight. To do so, we’ll need to stop thinking the way we have for decades. We’ll need to become familiar friends with code and with content.

And, we’ll have to make changes to the way we create, manage, and deliver content. We’ll have to think differently.

It’s an exciting time to be a content professional. Let’s see what we can make happen. I’m stoked! Are you?

Categories: DITA

Finding an OpenStack Mentor

JustWriteClick - Mon, 2014-03-03 01:21

Last week I ran an internal “So You Want to be an OpenStack Contributor?” workshop showing the different ways to work on OpenStack. Here’s the slide show so you can see the way I approached it. As the Documentation Program Technical Lead you’d think I’d steer people straight to the documentation bug backlog, but I try to find out where interests lie before going straight to doc fixes. Definitely people should read the docs as a great start.

So You Want to be an OpenStack Contributor from Anne Gentle

You can work on OpenStack in non-code ways, such as bug triaging. Also the OpenStack Foundation does community marketing and staffs booths at events from the community. But a great way to understand the ins and outs of OpenStack-landia is to commit a patch.

I have to admit, I didn’t know much when I first started working on OpenStack at Rackspace. The Swift team was the group I had immediate access to in person. Wow were they patient with me while I made hilarious-in-hindsite errors. I had a patch where I changed “referer” one r to “referrer” two rs, because duh that’s how referrer is spelled. Well as it turns out that’s not the way the WC3 HTTP Protocol specifies request headers since 1992 or so, woops! Then I also managed to change the RST backticks (`) to single quotes (‘) which is absolutely not going to render correctly with Sphinx. Chuck Thier patiently explained the errors I had made and how to correct them. So do not be discouraged if it’s difficult to get the hang of your first patch or two or ten. Code reviewers are happy to help you iterate and revise. I’ve heard of good and bad patch reviewing going on in the community so I encourage you to find a real person who can help you get helpful reviews.

We also have organized OpenStack mentor programs now. We’ve been participating in the GNOME Outreach Program for Women for three rounds, and we’re a participating organization with the Google Summer of Code program for 2014. There are ideas for projects on the OpenStack wiki:

We have dedicated IRC channels for new contributors – #openstack-101 and #openstack-opw on freenode. Our OPW interns have written great blog entries about getting started with OpenStack (In a nutshell:how OpenStack works) and DevStack (Installing DevStack with Vagrant). Their fresh eyes make for great starting points. I encourage us all make this work both ways – people of OpenStack, be mentors, and newcomers, seek out the people in OpenStack who want to help you get started. Updated to add: be sure to check out opensource.com for “How to Contribute to OpenStack.”

Categories: DITA

Multichannel content management

In Marketing technology landscape explosion and CMS evolution we looked at two of the major themes of December’s Gilbane Conference. The third major theme that we asked speakers to respond to in our spotlight …
Categories: DITA

Do Content Strategy and Business Priorities Mesh?

The Content Wrangler - Mon, 2014-02-10 17:25

by Cheryl LandesSTC Fellow and founder of Tabby Cat Communications, Vancouver, WACherylLandes-RedSweater

Content is business. Great content sells. Bad content doesn’t.

No matter how hard we try, “content and usability strategy can’t predict outcomes,” said Jared Spool, the CEO & Founding Principal of User Interface Engineering, at his annual presentation sponsored by the IEEE Computer Society, GBC/ACM, and BostonCHI at Constant Contact in Waltham, MA, on January 16.

“The key is understanding business models. Great business models are designed.”

Business models have strategies, and like content strategy, there are priorities. Spool said there are five priorities for any business strategy:

  1. Increase revenue.
  2. Decrease costs.
  3. Increase new business.
  4. Increase existing business.
  5. Increase shareholder value.

And, he continued, these priorities are the same for content and usability strategies. “Take the money and put it into the (user) experience and see where it gets you,” he said.

zappos

An investment in great content goes far. For example, Zappos’ quick, convenient return policy has actually helped the company increase its business. If you’re unhappy with your purchase for any reason, you can return it absolutely free. The return policy is accessible from one click, a big button in the top right corner of Zappos’ Home page. You can read clearly written instructions or watch a video. If you return a product, you can print out the pre-paid UPS shipping label directly from Zappos’ website. The ease of usability actually encourages customers to buy more at Zappos, because they have a delightful shopping experience.

cluttered

Investing in great content is much more profitable than spending the same money on advertising, Spool says. “Ads don’t work. When you don’t pay for the product, you are the product.” He used Dictionary.com as an example, where users see mostly ads on the page when they look up definitions. The content on Dictionary.com is free, because it’s funded by advertising. But to find the content users want, they have hunt for it by scrolling down on the page. So what happens? Readers ignore the ads, although they must work to find the content they’re after.

Spool cited several statistics about the performance of ads. There’s a 0.1% click-through rate for 1,707 ads seen per year. Four of 10,000 clicks are for the best ads. And 31 out of 100 ads have never been seen. So the odds that people will actually make a purchase from seeing an ad are slim to none.

“The best performing ads don’t look like ads,” he said. Word-of-mouth is the most powerful tool. “Things that work the best are out of the company’s control.”

nyt

On March 28, 2011, the New York Times began a radical business strategy of reducing the number of ads on its website and switching to digital subscriptions, also known as a metered paywall. Readers are allowed to view 10 free articles, videos, slide shows, and other features per month. When the tenth article is viewed, they receive a message that they need to become a digital subscriber. Since then, the Times has earned more money from the metered paywall customers than from advertising.

Why? Quality content. People are willing to pay for great content.

The New York Times played with a business model and the returns to get the best results. And because they followed the five standard priorities of a business model, they’ve survived and thrived at a time when many newspapers are going out of business.

So the lesson here is that delightful content creates a great user experience. And with those great experiences come customer sales and loyalty. Content and usability strategists “create delight by working at the intersection of business and design,” Spool said.

Is your content delightful? Do customers delight in the experience? If they are, you’ll know, Spool says. “The better the content, the better the business.”

That’s the bottom line.

References

Cook, Jonathan E. and Shahzeen Z. Attari. “Paying for What Was Free: Lessons from the New York Times Paywall.” Cyberpsychology Behavior and Social Networking, 15(12), 2012, pp. 1-6. Retrieved January 30, 2014

Dictionary.com

New York Times. “A Letter to Our Readers About Digital Subscriptions,” March 17, 2011. Retrieved January 30, 2014

Zappos return policy

Bio

Cheryl LandesSTC Fellow, and Certified Professional Communicator through the Association for Women in Communications Matrix Foundation, founded Tabby Cat Communications in Seattle in 1995. She has 23 years of experience as a technical communicator in several industries: computer software, HVAC/energy savings, marine transportation, manufacturing, retail, and the trade press. She specializes as a findability strategist, helping businesses to organize content so that it flows logically and to make content easier to retrieve online and in print.

Cheryl, who currently lives in Vancouver, WA, has given many presentations and workshops about indexing, technical communication, and marketing services as a solo entrepreneur throughout the United States and Canada. She has written two handbooks on digital indexing in MadCap Flare and Adobe FrameMaker, and more than 100 articles and three books on Northwest travel and history. Her latest book, Embedded Indexing in Adobe InDesign, will be released in early 2014. For more information, visit her website at http://www.tabbycatco.com and follow her on Twitter @landesc.

Categories: DITA

Querying my own MP3, image, and other file metadata with SPARQL

bobdc.blog - Sun, 2014-02-09 16:31
And a standard part of Ubuntu. Bob DuCharme http://www.snee.com/bobdc.blog
Categories: DITA

How I Learned to Stop Worrying and Love Hypertext

The Content Wrangler - Wed, 2014-02-05 15:30

By Mark Baker, Analecta Communications Inc.Mark Baker

The Hypertext Nobbling Committee (HNC), that secret cabal of marketers, publishers, writers, and designers dedicated to breaking the Web, has been busy of late. Their latest ploy: the one page site in which all the content is presented in one continuous scroll and any links simply lead from one part of the page to another.

The aim of the HNC is to reestablish linearity and hierarchy in the creation, management, and consumption of content. The one page site fits this mission perfectly. It makes it harder to find information within the page via search, since the search will land the reader at the top of the page, not the item they are looking for, thus obscuring the information scent the reader is following. And it is bound to make it more complex for a search engine to identify the page as a good source for a particular subject. It tells the reader, in no uncertain terms: you are supposed to come here directly, you are supposed to read all of it, and you are supposed to read it in order.

This is, of course, a profoundly anti-Web message. It is the antithesis of hypertext. But the membership of the HNC has good reason to fear hypertext, and to try to nobble it. In David Weinberger’s words:

Hypertext subverts Hierarchy.

The aim of the marketers, publishers, writers, and designers who make up the HNC is to control the narrative: to control the selection and ordering of ideas that are presented to the reader, so as to influence the reader in some way favorable to the writer. Hypertext strips the writer of the ability to control the selection and ordering of ideas. It gives that power firmly and irrevocably to the reader, and to readers acting collectively in their mutual interest to link the world’s information in ways that are useful to readers.

cluetrainAs the Cluetrain Manifesto noted:

The potential connections are vast. Hyperlinks are the connections made by real individuals based on what they care about and what they know, the paths that emerge because that’s where the feet are walking, as opposed to the highways bulldozed into existence according to a centralized plan.

Weinberger, David; Locke, Christopher; Levine, Rick; Searls, Doc; McKee Jake (2009-06-30).
The Cluetrain Manifesto: 10th Anniversary Edition (p. 193). Basic Books. Kindle Edition.

The Hypertext Nobbling Committee, of course, is in the business of preventing those connections from being made, of ensuring that the reader’s feet stick to the path bulldozed in accordance with their official content strategy.

Content Strategy Versus Readers

While content strategy usually gives lip service to the idea that the user’s needs are paramount, the fact is that content strategy is a business activity done for business reasons, and the business may not always see the user’s interests as being identical to its own, a point that Paul Bryan makes about the related field of User Experience in his post User Experience Versus Users:

The field of user experience has, from its inception, championed the notion that meeting users’ needs is the path to success for digital products. Recently, however, it seems that user experience is increasingly playing a role in formulating designs that diametrically oppose users’ wants and needs for the sake of generating greater profits

And, of course, there is a legitimate business concern here. UX and Content Strategy both exist to meet business goals, and the principle of making the user’s needs the sole criteria of either practice is not an absolute moral imperative, regardless of its business consequences. It is, in fact, based on a the idea that the business’s ends are best served by serving the customer’s ends, and while that is a good principle within bounds, there are genuinely cases where it is not the case, and serving the customer’s best interest is not in the business’s best interest. In these cases, we can expect that the businesses interests will prevail.

Interests of writers not identical to those of business, or of readers

But there is a big difference between when the business’s interests and the customer’s interests actually diverge, and when people making decisions for the business think they diverge, or when they fail to understand where the customer’s interests actually lie. Sometimes hypertext nobbling is attempted in the sincere (though incorrect) belief that it actually serves the reader’s needs.

Also, the interests of the employee making those choices may not always be well aligned with the interests of the company that employs them. There are many examples of this:

  • A sales person who makes a false promise to a customer to meet quota, even though it damages the company’s long term relationship with the customer.
  • A software developer who wants to pad their resume with experience in a particular tool or language, even if it creates an additional maintenance burden for the company.
  • A writer who wants to compile a magnum opus as a showpiece for their portfolio, or simply as an artifact of their ego, when Every Page is Page One content would better serve the user and the company.
  • The employee in any role who is used to doing things a certain way and is not interested in doing the work to acquire new skills and adopt new standards of quality in a world that has radically changed.
  • A manager who is driven more by the need to feel in control than by the need to let talent work and product flow.

hypertextThe activities of the Hypertext Nobbling Committee, therefore, may not be aligned with the true business interests of companies whose employees serve on the committee. But that does not stop the committee from having a profound effect on how content is written, organized and presented. In fact, I would suggest that hypertext nobbling is almost always bad for business.

Why do so many companies and so many writers, information architects, content strategists, and designers continue trying to nobble hypertext? The answer, I think, is twofold. First, it involves admitting that the power you are used to wielding has gone. We are attached to our power, and will struggle to keep it or to recover it — often to our own detriment, if it leads us to ignore new sources of power.

Second, hypertext is hard. It upsets our traditional ideas of how information is created and organized, and perhaps worse, it defies traditional approaches to managing information. In no small part, organizations engage in hypertext nobbling simply in order to make their traditional internal management processes and tools work.

Anti-hypertext strategies and their defects

One of the Hypertext Nobbling Committee’s most successful innovations was the native app. Diverting mobile readers away from your website onto a mobile app seems like a great way of ensuring that they follow your path, not their own. If readers can be persuaded to start your app rather than searching Google, there is no danger of losing them to the competition. Thus readers visiting websites from their phones often have to dismiss annoying screens asking them to download the site’s own app.

But no, I DON’T WANT TO DOWNLOAD YOUR APP!

appsYou know why? Because I am on your site only because one page on it looked interesting in my search results. Just because I want to read (or glance at for a couple of seconds) one page on your site in no way means I want an app that shows me your site and nothing else. Nor do I want to clutter my phone with dozens of apps for individual sites.

Apps that do things are great. Apps that aggregate content from many sites according to filters that I create myself (such as Flipboard) are useful. Apps that merely show information from one site are pretty much pointless. Eddie Vassallo reports that the tablet magazine, launched with such fanfare such a short time ago, is sinking fast:

It’s also no secret that tablet magazines are simply not being read – the form factor and technology is basically making the standardized magazine page a near anachronism in a world of dynamic live canvases of the caliber of a Flipboard or Zite.

While Flipboard is, of course, an app, it is a hypertext app, pulling in resources of interest from all over the web. More specifically, Flipboard and its ilk are essentially search engines that allow you to store one search to be executed on a regular basis, and which allow you to continually refine the search terms to tweak your results over time.

The nature of hypertext

It is perhaps useful at this point to say a little more about hypertext. It may to too easy to dismiss hypertext today as a kind of relic of the 80s, to associate it with such abortive projects as the hypertext novel. But what many of the 80’s experiments with hypertext missed was that hypertext is not a tool for writers, but a tool for readers.

We don’t tend to say it much these days, but the Web is a hypertext medium. We should remember that the “HT” in HTML and HTTP stands for Hyper Text: they are Hyper Text Markup Language and Hyper Text Transfer Protocol respectively. The links that connect one page to another are properly called hyperlinks.

But hypertext is not just about linking. Hypertext is about the non-linear traversal of information spaces. When hypertext and the Web were being dreamed up and the terms defined, links were the principle mechanism for such traversals. Today, while links remain important, search, social curation, and dynamic content APIs also play key roles.

More importantly, search and social curation make hypertext a tool for readers, a tools that they can use untrammeled by the limits of the links that writers provide, and untrammeled by all the machinations of the Hypertext Nobbling Committee.

flipboardThere is a reason, after all, why you still need a mobile website, even if you would rather people used your app: content in apps is invisible to search. The kind of content apps that work, like Flipboard, exploit hypertext tools to put the reader in charge of the experience.

The most profound effect that search and social curation have on content, though, is that they make all content hypertext. Before search, it was up to the author to decide if a text was to be a hypertext. With search, the reader can treat any text like a hypertext. If they come to some phrase they want more information about, they can simply highlight it and initiate a search on it. This provides links where the author neglected to provide them. Search makes all texts hypertexts.

This even includes information on paper, as well as movies, TV shows, and billboards. Most of the world is walking around with a search engine in their pocket. If they want to link from the text they are reading, or the video they are viewing, they have the means to do so, and are increasingly habituated to doing so. The only downside of this off-Web content in a hypertext world is that all the links lead outward. No links point back to this content.

Every text that is not designed to be a hypertext is effectively a one way hypertext, with user-generated links leading out but not in. Thus every trick the HNC dreams up for preventing people from leaving your content ends up doing more to keep them out than to keep them in.

Smartphone with appsOn the Web, content that is not designed to be a hypertext, and even content that is deliberately designed to nobble hypertext, is a hypertext not the less. Readers can search on the terms it contains, and search and social curation can find it and point to it (unless it is nobbled to the extent of diverting all links to a landing page, or is, (God help us!) a tri-pane help system that provides no reliable way to link to individual content pages).

Thus the fundamental problem with the HNC’s tactics is: you can’t prevent people from using you content as hypertext. But you can prevent it from working well for people using it as hypertext, and you can make it difficult to find it in a hypertext environment. Any discussion about the virtues of having your content be a hypertext or not are therefore moot: you can’t prevent the undesirable aspect of hypertext (the ability of the reader to leave), you can only limit the desirable aspect of hypertext (the ability of readers to come in). You can’t nobble the reader; you can only nobble yourself.

Other Hypertext Nobbling Activities

Other notable efforts as nobbling hypertext include endless scroll (no need to go elsewhere; this page will never end) and ebooks (get off the web: download this and read it in isolation). Both these things have their place. Endless scroll makes sense for a feed, such as Twitter. Ebooks make sense for, well, books. (But – newsflash – rebranding your 20 page PDF whitepaper with the word “ebook” doth not an ebook make.) But to impose them on what ought to be regular Web content is just an annoying form of hypertext nobbling that does neither you nor your reader any good.

help-toc

The techcomm branch of the HNC has its own particularly vicious device (which thankfully has not caught on with the rest of the committee): the tri-pane help system. Conceived as a way to show help in a desktop application, these systems are designed as if they were the whole information universe. In contemporary incarnations, they often have a single TOC that includes the content of dozens of books in a hierarchy that is utterly unnavigable, and often with pages made out of sections of books that make no sense or contain no useful information when viewed individually (something I have dubbed the Frankenbook). This is to say nothing of the problems they cause for linking and for SEO on the Web.

Unfortunately, people are still coming up with new tools for delivering documentation on the Web using this outdated and inappropriate model, and with new authoring tools for creating it.

A schoolroom approach to readers

A big part of the reason that this model persists today is that many writers, particularly in technical communication, it seems, are deeply suspicious of hypertext. Partly this may be born of a fear that user-generated content shared on the Web may put them out of business. But more deeply than is, I believe, they share a belief, most notably expressed by Nicholas Carr in The Shallows, that the Web is shortening attention spans and robbing people of their ability to focus.

But this belief is built on a schoolroom definition of attention and focus, on the assumption that the measure of attention and focus as the ability or willingness to focus on a single piece of content. It is assumed that if the reader is moving rapidly from one piece of content to another that they are not focusing and not maintaining a sustained level of attention to what they are reading.

Children_reading_1940In Too Big to Know, David Weinberger argues that our civilization’s long dependence on the book has warped our understanding of what it means to know. It has caused us to think that knowledge is shaped like a book, that to know something is to know the texts about it, and therefore that to study is to study texts. While the modern world does not have quite the reverence for texts of earlier centuries, we still largely compose curriculums around text books and evaluate students on their ability to learn texts.

In the 20th century classroom, the text, not its subject matter, but the text itself was the object of the lesson. The task was to understand what the text said, and it was evaluated by measuring the student’s comprehension of the text.

This is exactly how many of the studies of reading on the web are conducted today: by watching how readers read individual texts and measuring their comprehension of the text.

But that is not how hypertext works. Hypertext works by allowing the reader to traverse information spaces in search of understanding of a subject. The point is to comprehend the subject, not to comprehend the text. And while writers would love to believe that to understand their text is to understand their subject, such is rarely the case. We learn a subject better by working with it, by talking about it, and by reading multiple different views on it, than we do by reading a single text. That has always been true of how we learn about subjects, but hypertext puts that learning process on overdrive, allowing us to accumulate more divergent perspectives on a subject quicker than ever before, and giving us access to data on which we can run our own experiments and from which we can draw our own conclusions.

A person traversing a hypertext field may spend little time on any one page, but that does not mean they are not focused, or that their attention span is limited. Rather, their attention is focused on the subject they are pursuing, not the individual texts they encounter.

Hypertext and information foraging

A look at information foraging theory is useful to understand this point. Information foraging theory is based on the discovery that the patterns that readers use in seeking information are essentially those used by wild animals in search of food. Foraging is all about the most calories consumed for the least calories used. Thus a foraging pattern that burns the fewest calories while finding the most food is optimal.

Screenshot 2014-02-05 07.23.23This means that how a forager behaves in a rich environment is different from how they behave in an environment where food is scarce. If it is a long way from one berry patch to the next, then the best strategy is to pick the current patch clean, spending the extra time and energy to get to the fruit that is hard to reach and putting up with the thorns that scratch your nose. If it is a short distance from one berry patch to the next, it is more efficient to take the easily-accessible fruit from one patch and move on to the next one.

The forager’s behavior in switching from one berry patch to another in an environment where berries are plentiful is not a sign that the forager lacks focus or has a short attention span. Their focus is on food, and their attention is on consuming the most calories while burning the fewest. An optimal foraging pattern is not a sign of a shortened attention span, but of attention focused on the correct goal.

The Web, the largest hypertext system in existence, is the ultimate rich information foraging ground. Not only is it full of rich information patches, it also makes it easy to move from one information patch to another using links, search, social curation, and dynamic content resources. Information seekers will naturally adapt to this environment by switching more rapidly between different information sources. As with animal foraging, this is not a sign of shortened attention spans, but of attention focused on the correct goal.

Hypertext nobbling practices designed to trap the reader in one information patch will backfire because, ultimately, they offer the reader a less fertile ground for their information foraging. Good information foragers will always gravitate to the richest information foraging ground. Basic evolutionary forces will always favor the organism that gets the most food from the least effort.

Hypertext in local information systems

Of course, there are cases in which the Web is not the richest information field for particular subjects. In some cases, the Web either has less information than another source, or the information is hidden in a thicket of thorns that make it difficult to get to. In these cases, a separate information source, specially cultivated (curated and edited) to keep down the weeds, may make a better foraging ground for a particular class of readers.

Even so, such an information source should still be constructed to work as a hypertext field. Why? First, because hypertext creates a richer information environment which is easier to traverse, and therefore one that is more attractive to information seekers. Second, people’s information seeking habits are increasingly formed by the Web and the way the Web works. Frustrate those habits and your information set becomes harder to use.

And creating such an information source is no small undertaking. In particular, it is not created by putting up walls to keep hypertext out. It is only created by assembling a comprehensive collection of excellent content and linking it well. Fences don’t make orchards. Fruit trees make orchards. Nobbling hypertext won’t keep information seekers in; it will keep them out.

Hypertext nobbling is not good content strategy

Ultimately, therefore, hypertext nobbling is not good content strategy. Our content strategy goals would be better served if we learned to stop worrying and love hypertext. The way to win in a hypertext environment, after all, is to make better hypertext than the next guy. The bears will always come to the bushes with the best fruit. And if hypertext makes it harder to stop eyeballs and attention from wandering off your content, it equally makes it easier to attract them to your content. Do hypertext well and readers will flock to you.

The key to Hypertext: Every Page is Page One

The key to embracing hypertext is to acknowledge that Every Page is Page One. In a hypertext field, readers traverse texts non-linearly looking for the scent of information. To attract such readers, each piece of content needs to act as a new page one: establishing its context, sticking to its subject, conforming to its type, and linking richly to content on related subjects.

It is time, therefore, to stop worrying and learn to love hypertext, and to learn to do it well. That means embracing a world in which readers traverse a web of information, only some of which you own, and focusing on making your individual pages give off a strong information scent, and then following up that scent with satisfying content. You will catch more flies with honey than with vinegar. Stop trying to nobble hypertext. Every Page is Page One.

BIO

every page

Mark Baker is a twenty-five-year veteran of the technical communication industry, with particular experience in developing task-oriented, topic-based content, and technical communication on the Web. He has worked as a technical writer, a publications manager, a structured authoring consultant and trainer, and as a designer, architect, and builder of structured authoring systems. He is currently President and Principal Consultant for Analecta Communications Inc. in Ottawa, Canada. Mark blogs at everypageispageone.com.

The blog is focused on the idea that on the Web, Every Page is Page One. It is Mark’s firm belief that the future of Technical Communications lies on the Web, and that to be successful on the Web, we cannot simply publish traditional books or help systems on the Web, we must create content that is native to the Web.

Categories: DITA

OpenStack Operations Guide Mini Sprint

JustWriteClick - Sun, 2014-02-02 18:19

oreilly-openstack-ops-guide

We held a two-day mini-sprint in Boston at the end of January to update the OpenStack Operations Guide. You may remember the first five-day sprint was in Austin in February 2013. This time, the sprint was shorter with fewer people in Boston and a few remote, but we had quite specific goals:

  • Update from Folsom to Havana (about a year’s worth of OpenStack features)
  • Roadmap discussion about nova-network and neutron, the two software-defined networking solutions implemented for OpenStack
  • Add upgrade instructions from grizzly to havana
  • Implement and test the use of parts to encapsulate chapters
  • Address editor comments from our developmental editor at O’Reilly
  • Add a reference architecture using RedHat Enterprise Linux and neutron for networking

Some quick wins for adding content were:

We added and updated content like mad during the two days:

The two toughest updates are still in progress, and our deadline for handover to O’Reilly is this Wednesday. The first tough nut to crack was getting agreement on adding an example architecture for Red Hat Enterprise Linux. We are nearly there, just a few more fixes to go, at https://review.openstack.org/#/c/69816/. The second is testing the upgrade process from grizzly to havana on both Ubuntu and RedHat Enterprise Linux. That’s still in progress at https://review.openstack.org/#/c/68936/.

The next steps for the O’Reilly edition are proofreading, copyediting, and indexing over the next six weeks or so. I’ll be keeping the O’Reilly edition in synch with our community-edited guide. As always, anyone in the OpenStack community can contribute to the Operations Guide using the steps on our wiki page. This guide follows the O’Reilly style guide rather than our established OpenStack documentation conventions. I’m looking forward to a great future for this guide and we’re all pretty happy with the results of the second mini-sprint.

Thanks to everyone making this a priority! Our host at MIT was Jon Proulx joined by Everett Toews who braved airport layovers and snow, and Tom Fifield who wrote the most patches despite a complete lack of sleep. Joe Topjian worked on edits for months leading up and has been tireless in making sure our integrity and truth lives on through this guide. Thanks too to the the hard working developmental editor at O’Reilly who offered lunch in Boston, Brian Anderson, joined by Andy Oram. David Cramer got DocBook parts working for us in time for the sprint. Summer Long worked long and hard on the example architecture for RedHat. Our remote reviewers Matt Kassawara, Andreas Jaeger, and Steve Gordon were so valuable during the process and ongoing. Shilla Saebi gave some nice copyediting this past week. What an effort!

Categories: DITA

DITA without a CMS: Tools for Small Teams

Dr. Macro's XML Rants - Sun, 2014-01-26 23:11
[This is a copy of a post I made to the Yahoo DITA Users list.]

A topic of discussion that comes up quite a bit (it came up at the recent Central Texas DITA User Group meeting) is how to "do DITA" without a CMS, by which we usually mean, how to implement an authoring and production workflow for a small team of authors with limited budget without going mad?

NOTE: I'm using the term "CMS" to mean what are often called a Content Component Management (CCM) systems.

This is something I've been thinking about and doing for going on 30 years now, first at IBM and then as a consultant. At IBM we had nothing more than mainframes and line-oriented text editors along with batch composition systems yet we were able to author and manage libraries of books with sophisticated hyperlinks within and across books and across libraries. How did we do it? Mostly through some relatively simple conventions for creating IDs and references to them and a bit of discipline on the part of writing teams. We were using pre-SGML structured markup back then but the principles still apply today.

As I say in my book, DITA for Practitioners, some of my most successful client projects have not had a CMS component.

Note that I'm saying this as somebody who has worked for and still works closely with a major CMS vendor (RSI Content Solutions). In addition, as a DITA consultant who wants to work with everybody and anybody, I take some risk saying things like this since a large part of the DITA tools market revolves around CMS systems (as opposed to editors or composition systems, where market has essentially resolved on a small set of mature providers that are unlikely to change anytime soon).

So let me make it clear that I'm not suggesting that you never need a CMS--in an enterprise context you almost certainly do, and even in smaller teams or companies, lighter-weight systems like EasyDITA, DITAToo, Componize, BlueStream XDocs, and DocZone can offer significant value within the limits of tight small-team budgets.

But for many teams the cost of a CMS will always be prohibitive, either for cost or time or both, especially at the start of projects. So there is a significant part of the DITA user community for whom CMS systems are either an unaffordable luxury or something to be worked toward and justified once an initial DITA project proves itself.

In addition, an important aspect of DITA is that you can get started with DITA and be productive very quickly without having to first but a CMS in place. Even if you know you need to have a CMS, you can start small and work up to it. I have seen many documentation projects fail because too much emphasis was put on implementing the CMS first.

Many people get the idea, for whatever reason, that a CMS is cost of entry for implementing DITA and that is simply not the case for many DITA users.

The net of my current thinking is that this tool set:
  • git for source content management
  • DITA Open Toolkit for output processing
  • Jenkins for centralized and distributed process automation
  • oXygenXML for editing and local production
allows you to implement an almost-complete, low-cost DITA authoring, management, and production system. Of these four tools, only one, oXygenXML, is commercial. If you use Github to host private repositories that has a cost but it's minimal.

In particular, the combination of git, Jenkins, and the Open Toolkit enables easy implementation of centralized, automatic build processing of DITA content. Platform-as-a-service (PaaS) providers like CloudBees, OpenShift, and Amazon Web Services provide free and low-cost options for quickly setting up central servers for things like Jenkins, web sites, and so on, with varying degrees of privacy and easy-to-scale options.

The key here is low dollar cost and low labor investment to get something up and running quickly.

This doesn't include the effort needed to customize and extend the OT to meet your specific output needs--that's a separate cost dependent entirely on your specific requirements. But the community continues to improve its support for doing OT customization and the tools are continually improving, so that should get easier as time goes on (for example, Leigh White's DITA for Print book from XML Press makes doing PDF customization much easier than it was before--it's personally saved me many hours in my recent PDF customization projects).

For each of these tools there are of course suitable alternatives. I've specified these specific tools because of their ubiquity, specific features, and ease of use. But the same approach and principles could be applied to other combinations of comparable tools.

OK, so on to the question of when must you have a CMS and when can you get by without one?

A key question is what services do CMS systems provide and how critical are they and what alternatives are available?

As in all such endeavors it's a question of understanding your requirements and matching those requirements to appropriate solutions. For small tech doc teams the immediate requirements tend to be:
  1. Centralized management of content with version control and appropriate access control
  2. Production of appropriate deliverables
  3. Increased reuse to reduce content redundancy
  4. Localization
Given that understanding of very basic small-team requirements, how do the available tools align to those requirements?

Since the question is CMS vs. ad-hoc systems built from the components described above, the main question is "What do CMS systems do and how do the ad-hoc tools compare?"

CMS systems should or do provide the following services:

1. Centralized storage of content. For many groups just getting all their content into a single storage repository is a big step forward, moving things off of people's personal machines or out of departmental storage silos.

2. Version management. The management of content objects as versions in time.

3. Access control. Providing controls over who can do what with different objects under what conditions.

4. Metadata management for content objects. The ability to add custom metadata to objects in order to enable finding or support specific business processes. This includes things like classification metadata, ownership or rights metadata, and metadata specific to internal business processes or data processing.

5. Search and retrieval of content objects. The ability to search for and reliably find content objects based on their content, metadata, workflow status, etc.

6. Management of media assets. The ability to manage non-XML assets (images, videos, etc.) used by the main content objects. This typically includes support for media object metadata, format conversion, support for variants, streaming, and so on. Usually includes features to manage the large physical data storage required. Sometimes provided by dedicated Digital Asset Management (DAM) systems.

7. Link management. Includes maintaining "where used" information about content and media assets, management of addressing details, and so on.

8. Deliverable production. Managing the generation of deliverables from content stored in the CMS, e.g. running the Open Toolkit or equivalent processes.

These are all valuable features and as the volume of your content increases, as the scope of collaboration increases, and as the complexity of your re-use and linking increases, you will certainly need systems that provide these features. Implementing these services completely and well is a hard task and commercial systems are well worth the cost once you justify the need. You do not want to try to build your own system once you get to that point.

In any discussion like this you have to balance the cost of doing it yourself with the cost of buying a system. While it's easy to get started with free or low-cost tools, you can find yourself getting to a place where the time and labor cost of implementing and maintaining a do-it-yourself system is greater than the cost of licensing and integrating a commercial system. Scope creep is a looming danger in any effort of this scope. Applying agile methods and attitudes is highly recommended.

The nice thing about DITA is that, if you don't do anything too tool-specific, you should be able to transition from a DIY system to a commercial one with minimum abuse to your content. That's part of the point of XML in general and DITA in particular.

Also, keep in mind that DITA has been explicitly architected from day 1 to not require any sort of CMS system--everything you need to do with DITA can be done with files on the file system and normal tools. There is no "magic" in the DITA design (although it may feel like it to tool implementors sometimes).

So how far can you get without a dedicated CMS system?

I suggest you can get quite a long ways.

Services 1, 2, and 3: Basic data management

The first three services: centralized storage, version management, and access control, are provided by all modern source code management (SCM) tools, e.g. Subversion, git, etc. With the advent of free and low-cost services like Github, there is essentially zero barrier to using modern SCM systems for managing all your content. Git and Github in particular make it about as easy as it could be. You can use Github for free for public repositories and at a pretty low cost for private repositories or you can easily implement internal centralized git repositories within an enterprise if you have a server machine available. There are lots of good user interfaces for git, including Github's own client as well as other open-source tools like SourceTree.

Git in particular has several advantages:
  • It is optimized to minimize redundant storage and network bandwidth. That makes it suitable for managing binaries as well as XML content. Essentially you can just put everything in git and not worry about it.
  • It uses a distributed repository model, in which each user can have a full copy of the central repository to which they can commit changes locally before pushing them to the central repository. This means you can work offline and still do incremental commits of content. Because git is efficient with storage and bandwidth, it's practical to have everything you need locally, minimizing dependency on live connections to a central server.
  • Its branching model makes working with complex revision workflows about as easy as it can be (which is not very but it's an inherently challenging situation).

Service 4: Metadata Management

Here DITA provides its own solution in that DITA comes out of the box with a robust and fully extensible metadata model, namely the element. You can put any metadata you need in your maps and topics, either by using <data> with the @name attribute or by creating simple specializations that add new metadata element types tailored to your needs. For media assets you can either create key definitions with metadata that point to media objects or use something like the DITA for Publishers <art> and <art-ph> elements to bind <data> elements to references to media objects (unfortunately, the element does not allow <data> as a direct child through DITA 1.3).

In addition, you can use subject schemes and classification maps to impose metadata onto anything if you need to.

It is in the area of metadata management that CMS systems really start to demonstrate their value. If you need sophisticated metadata management then a CMS is probably indicated. But for many small teams, metadata management is not a critical requirement, at least not at first.

Service 5: Search and Retrieval

This is another area where CMS systems provide obvious value. If you have your content in an SCM it probably doesn't provide any particular search features.

But you can use existing search facilities, including those built into modern operating systems and those provided by your authoring tools (e.g., Oxygen's search across files and search within maps features). Even a simple grep across files can get you a long way.

If you have more implementation resources you can also look at using open-source full-text systems or XML databases like eXist and MarkLogic to do searching. It takes a bit more effort to set up but it might still be cheaper than a dedicated CMS, at least in the short term.

If your body of content is large or you spend a lot of time trying to find things or simply determining if you do or don't have something, a commercial CMS system is likely to be of value. But if your content is well understood by your authors, created in a disciplined way, and organized in a way that makes sense, then you may be able to get by without dedicated search support for quite a long time.

In addition, you can do things with maps to provide catalogs of components and so on. Neatness always counts and this is an area where a little thought and planning can go a long way.

Service 6: Management of Media Objects

This depends a lot on your specific media management requirements, but SCM systems like git and Subversion can manage binaries just fine. You can use continuous integration systems like Jenkins and open-source tools like ImageMagick to automate format conversion, metadata extraction, and so on.

If you have huge volumes of media assets, requirements like rights management, complex development workflows, and so on, then a CMS with DAM features is probably indicated.

But if you're just managing images that support your manuals, you can probably get by with some well-thought-out naming and organizational conventions and use of keys to reference your media objects. 

Service 7: Link Management

This is the service where CMS systems really shine, because without a dedicated, central, DITA-aware repository that can maintain real-time knowledge of all links within your DITA content, it's difficult to answer the "where-used" question quickly. It's always possible to implement brute-force processing to do it, but SCM systems like Subversion or git are not going to do anything for you here out of the box. It's possible to implement commit-time processing to capture and update link information (which you could automate with Jenkins, for example,) but that's not something a typical small team is going to implement on their own.

On the other hand, by using clear and consistent file, key, and ID naming conventions and using keys you can make manual link management easier--that's essentially what we did at IBM all those years ago when all we had were stone knives and bear skins. The same principles still apply today.

An interesting exercise would be to use a Jenkins job to maintain a simple where-used database that's updated on commit to the documentation SCM repository. It wouldn't be that hard to do.

8. Deliverable production

CMS systems can help with process automation and the value of that is significant. However, my recent experience with setting up Jenkins to automate build processes using the Open Toolkit makes it clear that it's now pretty easy to set up DITA process automation with available tools. It takes no special knowledge beyond knowing how to set up the job itself, which is not hard, just non-obvious.

Jenkins is a "continuos integration" (CI) server that provides general facilities for running arbitrary processes trigged by things like commits to source code repositories. Jenkins is optimized for Java-based projects and has built-in support for running Ant, connecting to Subversion and git repositories, and so on. This means you can have a Jenkins job triggered when you commit objects to your Subversion or git repository, run the Open Toolkit or any other command you can script, and either simply archive the result within the Jenkins server or transfer the result somewhere else. You can implement various success/failure and quality checks and have it notify you by email or other means when something breaks. Jenkins provides a nice dashboard for getting build status, investigating builds, and so on. Jenkins is an open-source tool that is widely used within the Java development community. It's easy to install and available in all the cloud service environments. If your company develops software it's likely you already use Jenkins or an equivalent CI system that you could use to implement build automation.

My experience using CloudBees to start setting up a test environment for DITA for Publishers was particularly positive. It literally took me minutes to set up a CloudBees account, provision a Jenkins server, and set up a job that would be able to run the OT. The only thing I needed to do to the Jenkins server was install the multiple-source-code-management plugin, which just means finding it in the Jenkins plugin manager and pushing the "install" button. I had to set up a github repository to hold my configured Open Toolkit but that also just took a few minutes. Baring somebody setting up a pre-configured service that does exactly this, it's hard to see how it could be much easier.

I think that Jenkins + git coupled with low-cost cloud services like CloudBees really changes the equation, making things that would otherwise be sufficiently difficult as to put off implementation easy enough that any small team should be able to do it well within the scope of resources and time they have.

This shouldn't worry CMS vendors--it can only help to grow the DITA market and help foster more DITA projects that are quickly successful, setting the stage for those teams to upgrade to more robust commercial systems as they're requirements and experience grows. Demonstrating success quickly is essential to the success of any project and definitely for DITA projects undertaken by small Tech Doc teams who always have to struggle for budget and staff due to the nature of Tech Doc as a cost center. But a demonstrated DITA success can help to make the value of high-quality information clearer to the enterprise, creating opportunities to get more support to do more, which requires more sophisticated tools.
Categories: DITA

[Office Humor] What If In-Person Meetings Ran Like Conference Calls?

The Content Wrangler - Fri, 2014-01-24 18:30

What if in-person meetings had the same characteristics as conference calls?  The folks at Leadercast have created a hilarious comedy video illustrating what a world of in-person meetings would be like if conference call type challenges were introduced. A world of VOIP noises, software crashes, and unnatural delays. A world in which dogs are barking, children are crying, and dueling cappuccino machines are heard screeching in the background. Give the video a whirl. Laugh out loud. Then share the fun with others.

Categories: DITA

Marketing technology landscape explosion and CMS evolution

The most popular and pervasive meme at the recent Gilbane Conference on Content and the Digital Experience was certainly “marketing technologist”. There were many other topic streams but none quite …
Categories: DITA

Storing and querying RDF in Neo4j

bobdc.blog - Tue, 2014-01-07 13:56
Hands-on experience with another NoSQL database manager. Bob DuCharme http://www.snee.com/bobdc.blog
Categories: DITA
XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I