Deep Data Dive into Supporting Developers

JustWriteClick - 6 hours 50 min ago

Here at Rackspace, we watch several sources to determine if a developer is having trouble with Rackspace Cloud services, many of which are based on OpenStack. We even have a notification tool, aptly named peril, that offers an aggregation of the many sites where developers may seek help. I’ve been on the Rackspace developer experience team since the very early days when we started supporting developers at the code level a few years ago. We monitor stackoverflow.com, serverfault.com, and superuser.com for questions tagged with Rackspace, rackspace-cloud, fog, cloudfiles, jclouds, pyrax, or keystone. At Rackspace we have been supporting cross-cloud SDKs such as Apache jclouds, Node.js pkgcloud, Ruby Fog, Python Pyrax, .NET, and PHP. Let’s look at the data from these many places to find out the patterns for application development.

A developer is in peril!

We have an email address, sdk-support@rackspace.com, that we monitor for developer support requests. In the last year, about 6.25% of support requests came in through email. We saw nearly half of support requests on the jclouds bug list (JIRA tracker) and a community forum at community.rackspace.com. Tracking Github issues on our supported SDKs was another 40% with the 14% remaining support requests coming from Stack Overflow, where we track certain tags on questions asked. Here’s a screenshot showing what our notifications look like in a Slack channel internally. I like how it cycles through various alert messages, “Heads up, incoming!” and not shown is “BWEEEEEP BWEEEEP BWEEEEP” which naturally makes us want to help!

Peril example
True story: sometimes our slurps catch and notify before our email server. We are on it!

Documentation comments

If you’ve read my blog for a while or my book, you know I appreciate documentation that offers back-and-forth discussion in comment threads. We use Disqus comments for developer documentation at Rackspace. These comments tend to uncover three categories of requests:

  • Request for help when something doesn’t work as expected
  • Request for a feature that doesn’t exist
  • Request for correction: pointing out typos or incorrect formatting, such as JSON examples that lose their indentation

We see about 20 comments a month on API docs spanning all our products, with about 27.5% on Cloud Files (Object Storage), about 20% Cloud Servers (Compute), and Identity coming in third at 12%. In OpenStack docs, we have a doc bug link that serves for the third type of comment — pointing out a doc bug — but not on the API reference, yet. One pattern we see that’s comparable to a review of the document is someone asking a bunch of questions at once to gain understanding.

Stack Exchange sites

Stack Exchange sites are question and answer sites with built-in features to boost motivations for answering questions posted by others. Some examples sites include Stack Overflow for developers and Server Fault for administrators.

We track these tags on Stack Overflow with our peril tool:

rackspace rackspace-cloud fog cloudfiles jclouds pyrax keystone

When people don’t get a satisfactory answer on Stack Overflow, one interesting pattern is that they come to ask.openstack.org, the OpenStack open source equivalent site.

The Stack Exchange API is a ton of fun to scrape data. I wrote some Python scripts that request data from these calls, using the tag “openstack”:

  • Top Answerers: Matt Joyce, Everett Toews, and Lorin Hochstein are great at answering questions on Stack Overflow, and since Everett’s on the developer support team at Rackspace that makes sense.
  • Related Tags: openstack-nova, cloud, python are the top three related tags for openstack on Stack Overflow.
  • Top Tags: python, ruby, csharp/.net, php, javascript(node.js). One interesting observation is that overarching concepts like security and networking were often tagged along with the language itself.
  • Frequently Asked Questions: It’s not surprising that authentication is the root of the most frequently asked questions. Networking is also particularly complex and it shows in the number of questions asked. What was interesting though is the number of questions about monitoring and metrics on the cloud consumption itself.
  • Unanswered Questions: For the openstack tag, the unanswered questions had less than 100 views, compared to over 1000 views for the answered questions.

While your browser automatically decompresses the results when you enter http://api.stackexchange.com/2.2/tags/openstack/top-answerers/all_time?site=stackoverflow in your browser, I had to figure out how to have Python unzip the results and put them into JSON. Then I got stuck trying to automate putting the JSON into CSV, so I used konklone.io/json/ to convert the JSON to CSV.

Once I had the CSV files, I could use Tableau to get interesting data visualizations like a bubble grouping for related tags showing the frequency of the tags.

Tag bubbles

Github data

I found that Github issues were a great place to dive into the use cases for particular software development kits. I even discovered that our Austin-based real estate agent’s web site was developed by Rackspace Cloud php-opencloud users!


One extremely helpful sandbox for trying OpenStack services through API use is Trystack, with over 15,000 users and growing every year. I find it very helpful to have my own sandbox besides the Rackspace Cloud to test the actual API calls and see what is returned from OpenStack. In looking at the logs, I found many queries about quotas which also explain the many questions about policies. It’s a free, community-donated cloud, and support is entirely through a Facebook group, so it’s a bit unusual but I still found interesting data around it.

Other findings

I found that many of the SDKs do not yet have full support for certain OpenStack services. For example, jclouds, the Java multi-cloud toolkit has a lot of users but doesn’t yet support the newest OpenStack services like Orchestration (heat templates), the Metering module ceilometer, the new versions of APIs for Images and Identity, and the latest storage policies implementation for Object Storage.

I also surmise that the service catalog coming through the Identity service needs stricter documentation and expectations setting. So the OpenStack API Working Group is tackling that issue by first discovering all the common patterns for the service catalog. Feel free to join the OpenStack API Working Group and review the incoming suggestions for consistency going forward or review patches going into OpenStack services that affect APIs.


You can get the full presentation from slideshare.net, or watch me give it on YouTube.

Categories: DITA

Gilbane Conference Advisor 1.20.15

Don’t Try to Be a Publisher and a Platform at the Same Time Or at least think it through very carefully. Also, do you really want to be called a “platisher”? Making these hybrids work over the long term is difficult, because their incentives work against each other. Toward the end of last year, one […]
Categories: DITA

R (and SPARQL), part 2

bobdc.blog - Tue, 2015-01-20 13:32
Retrieve data from a SPARQL endpoint, graph it and more, then automate it. Bob DuCharme http://www.snee.com/bobdc.blog
Categories: DITA

R (and SPARQL), part 1

bobdc.blog - Tue, 2015-01-13 13:26
Or, R for RDF people. Bob DuCharme http://www.snee.com/bobdc.blog
Categories: DITA

API Archaeology: Complexity and sizing of an interface

JustWriteClick - Tue, 2015-01-13 02:06

For both OpenStack and Rackspace cloud APIs, we use WADL, Web Application Description Language, to build an API reference listing for all REST API calls. In a previous post I discuss how the reference pages at http://developer.openstack.org/api-ref.html are made with WADL source and a Maven plugin, clouddocs-maven-plugin. You can see the output for the Rackspace API reference page at http://api.rackspace.com, built with the same tool chain but different branding through CSS. I can discuss the tooling decision process in another post, but let’s talk about ongoing upkeep and maintenance of this type of API reference information.

In this post I want to dig beneath the surface to discover how complex these APIs are, how that complexity might translate into difficulty or time spent in documenting the interfaces, and discuss some of the ways you could assign the work for creating and maintaining reference information for APIs. In another post I said, start with a list. This post looks into what happens after you have a list and need more lists to know the shape and size of your API and its documentation needs.

Some of the complexity also lies in documenting the parameters and headers for each API. Just like unearthing the walls of an ancient structure, you can look at the various ways an API is put together by looking at the number of calls, the number of parameters on each call, whether there are headers on any given call, and how the calls are grouped and related. I’ve summarized some of that below for the comparison cultures, er, grouped APIs.



A call is defined as a GET PUT POST DELETE command sent to a resource. These are known as HTTP verbs.

A header is defined as an optional or required component of a HTTP request or response. There are plenty of standard headers, what I’m talking about here are the extra headers defined by the API you document specifically.

A parameter may be a query parameter or provide a way to filter the response for example. Parameters specify a varying part of the resource and your users need to know what parameters are available and what they can do with them.

Running the numbers

To get these numbers, I first built each reference site so that the WADL files can be built into a single folder, which lets me do a grep for a count.

So I cloned each repo, ran mvn clean generate-sources with in the api-ref directory, then ran this command from with in from within api-site/api-ref/target/docbkx/html

grep -c “rax:id” wadls/*.wadl | sort -n -k2 -t:

Then I imported the output from the command as a colon-delimited file to a spreadsheet to get these counts.
OpenStack API Reference Metrics
Number of Compute v2.0 calls:290
Number of Networking v2.0 calls:92
Number of Orchestration v1.0 calls:41
Total documented calls:755

Rackspace API Reference
Number of Cloud Files calls:21
Number of Compute v2.0 calls:70
Number of Networking v2.0 calls:18
Total documented calls:670

Here’s a breakdown for just a few of the OpenStack APIs header and parameter counts.
Object Storage API Parameters: 12, Headers: 75
Volume API Parameters: 23 Headers: 0
Compute core API Parameters: 69 Headers: 1

Other metrics

We track doc bugs for the OpenStack API reference in Launchpad with the openstack-api-site project. There are nearly 200 doc bugs logged against the API Reference right now.

The three APIs with the most calls for Rackspace are Monitoring, Email & Apps, and Load Balancers, all of which are not OpenStack APIs. So a full two-thirds of Rackspace calls are not OpenStack-sourced. However, this means that a full third of Rackspace calls are identical to OpenStack.

What are some of the differences between OpenStack and Rackspace?
Extensions are complete in OpenStack; Rackspace only implements a handful of extensions.

Internal (admin-only) and external (user) calls are documented in OpenStack; Rackspace API Ref only documents external calls.

Rackspace has paid API writers and accepts pull requests on Github; OpenStack docs are written by writers and developers in the community (often with corporate sponsors) using the OpenStack gerrit process.


So that’s a lot of numbers, but what’s your point? My point is that making lists helps you determine the size and complexity of documenting multiple APIs. Not all companies or projects will have more than one API to document, but as we move towards more application interfaces for more business reasons, I believe that writers and developers need to get really accurate in their estimations of just how much time to allocate to document their APIs and do it well. Since these estimates are for API reference information only, don’t fail to also estimate time to write and maintain viable, tested example code as well. That’s a post for another day, thanks for reading about the complexity and comparison of OpenStack and Rackspace cloud APIs.

Categories: DITA

Gilbane Conference 2014 resources

  Below are some posts about the Gilbane Conference 2014. You can also access conference presentations, video recordings, and speaker spotlights. If you see any we are missing please let us know.   ChiefDigitalOfficer.net 5 Questions With… Raimund Gross of SAP 5 Questions with… Frank Gilbane CMS Myth Gilbane 2014: Day One Best Bets Gilbane 2014: Day Two Best […]
Categories: DITA

How to get started writing API docs

JustWriteClick - Tue, 2014-12-30 19:17

I know a lot of people who want to consume awesome API docs. Let’s talk about what it takes to get started writing them. I’m not talking about completing your API docs. I’m talking about just getting started, what does it take?

For API documentation especially REST APIs, I’d advocate a reference-first approach. Like the couch to 5K program for running, let’s start API documentation on your couch. You look under your coffee table and find your shoes. You pull them on (make sure you have wrinkle-free socks!) and lace them up. You’re ready!


https://www.flickr.com/photos/jypsygen/3321559694/sizes/l/in/photolist-64vRYA-82CrNJ-86fGxc-9qLHZf-e8mSA1-acUfDf-kpF6vK-jKgd2h-jC5u26-ktM7vp-kpEjtn-kvG8qZ-cymDph-fMuVKS-7WuzPS-9V1uGW-8NfZuz/Please write API docs! Photo courtesy jypsygen on Flickr.

If you are working on REST APIs, I’d also recommend understanding how they’re designed. I like the book, “A Practical Approach to API Design:From Principles to Practice,” by D Keith Casey Jr. and James Higginbotham. You can read a sample for free online, and then support their efforts by buying a copy at a price you set!

If you work is with another style of API, be sure you understand the underlying reasons for using that interface. It’ll help you understand your audience first.
Make a list. First, a list of all the calls. Then for each call, make a list of requirements. What must a user give to the interface to get back what they want? What are the requests, what are the responses? Then list the optional parameters the call can take. Are there any headers that can be sent or received? Be sure to write those on your list also.

Let’s not get tied up in tools yet, this initial writing work can be on a notebook or any text editor. You don’t run out and buy an expensive heart monitor or activity sensor when you’re just starting out as a runner. Figure out how far you can get with a pair of shoes (your notebook) before investing in cool tools and gear. Otherwise, you’ll get distracted by the cool tools and gear and not write stuff down!

With these lists, you’re building scaffolding. Just like a running program, you need to make a pattern to learn to pace yourself and spend your energy wisely. Once you have a list of the calls, you can write or diagram the users, the tasks they want to complete, and then see if your reference is filled in for all the users and tasks. I highly advocate the reference-first approach as it’ll help you test the completeness and helpfulness of your documentation.

I have some additional posts I’m writing for this year where I want to dive into increasing complexity of APIs so that writers and develeopers can estimate the amout of time needed for good documentation. I’ll also analyze some tooling for REST API documentation and offer benefits and tradeoffs for different tools. Looking forward to digging into API documentation in 2015!

Categories: DITA

Hackathons of Late

JustWriteClick - Tue, 2014-12-23 01:09

I’ve participated in a few hackathons in the past year, since I’ve been working on a team of developer advocates at Rackspace. I wanted to chronicle some of my experiences with a few goals in mind: recording memories for myself, letting others get ideas for the wide range of hackathons, and chronicling the wide variety of types of hackathons and outcomes of hackathons.

For a bit of context, I’m female, have a family and weekend obligations that include sports practices, and I’m a generation-Xer if that gives you a sense of my age (with a range). I’ve definitely read and enjoyed a lot of articles about women and inclusive hackathons, “Running an Inclusive Hackathon: How to get better representation at your hackathon and recently about “grownups” and hackathons, “Why I don’t like hackathons, by Alex Bayley aged 39 1/2. I appreciate a well-run, inclusive hackathon for my own participation, but I also see a need for well-run, inclusive college-age hackathons.

Internal Hackathon at Rackspace

Screen Shot 2014-12-22 at 4.50.30 PM
This was a bunch of fun for me to dig into one of our SDKs (Python and pyrax) and services (Cloud Files) to make a gallery of photos from a balloon photography project we did for a girls in engineering day at the University of Texas. This project has had some re-use in demonstrating to school-age students about what code for the cloud looks like and it offers a nice demonstration since it’s visual. I sat next to another developer advocate who helped me break the idea into parts that I could manage more easily with my rudimentary coding skills. It was a ton of fun and at the end of the day I didn’t demo it for a group but did show it to our vice president who was walking around to see what we were working on. The hackathon was held in the office after another technical event during work hours, so it was easy to make it part of my day.

Austin Ladies Hackathon

This one involved a lot of time from a fun event Friday night to form teams and included hacking on two weekend days. I served as a mentor for teams in this hackathon rather than participating myself. It wasn’t the time investment that prevented me from participating, it was just my role to mentor rather than hack. I have a write-up on the Rackspace Developer Blog.
Austin Ladies Hackathon

Slashathon at SXSW Interactive

This event was one of the highlights of my week at SXSW Interactive, as a bookend with a really neat experience interviewing a woman on the StartUp Bus, Nicole Dominguez.

After the last day of SXSW Interactive, we all arrived at Capitol Factory in the morning for the hackathon. I was toting a Rackspace red bike and scheduled to be a developer mentor for the Rackspace Cloud APIs for the morning session.


Everett Toews and I both work on the developer relations group at Rackspace, and he gave a presentation about how to hack the hackathon by using our cloud services. The other presenters really piqued my interest in metadata about music, bands, albums, and artists headphone hardware advances (hacking headphones are amazing!), cool uses of location and proximity hardware devices, all of the technology really makes ideas fire off in your brain. We came up with an idea collaboratively with four of us – Everett, and two web developers from Harvard Business Review who had to catch afternoon flights. We wanted to create a site for seeing on a map where your favorite band or musician is, with related Tweets and social media posts and the ability to buy tickets for their next gig. Everett presented to the panel of judges – Slash himself, Robert Scoble, and the guy who invented bittorrent, Bram Cohen. It went over really well! The winner had some cool hardware that helped enhance a concert experience and second place went to a Google Glass app that performers could wear to measure the decibels of audience noise. LoudWire covered the event in a post.

Types of Hackathons

I definitely see a difference between hackathons for college-age or students and for working adults or non-traditional students. There are also hardware hackathons versus software hackathons or hackathons with a specific theme, such as the Slashathon that featured music or audio hardware and music-based APIs. There should be a hackathon for many interests, then it’s a matter of determining your own goals for participating.

Outcomes of Hackathons

I think there are different types of outcomes: projects by participants, learning by participants, and then for sponsors and mentors, recruiting and learning more about participants or testing their products. The outcomes I don’t like to see are companies trying to get hard work for “free” or with little investment, or participants focusing too much on winning prize money.

What do you think about hackathons lately? What are your experiences? I’d love to hear more from others experiencing hackathons as a participant and as a sponsor and mentor.

Categories: DITA

[Slide Deck] How To Use Neuroscience To Create Memorable Presentations

The Content Wrangler - Thu, 2014-12-18 19:50

Did you know that audiences forget 90% of what you present? That is significant. To make matters worse, the 10% people remember differs between members of your audience.

So how can you control the 10% they remember?

In the past decade, brain imaging technology has dramatically increased our understanding of the brain. We now know more about the way our brains process information and ultimately remember it. Take a peek at this slide deck, part of a presentation by neuroscience maven Dr. Carmen Simon of RexiMedia to discover how you can begin to apply principles from cognitive neuroscience to create and deliver presentations with lasting impact. You will learn 3 brain science principles converted to guidelines you can apply to your own presentations immediately.

How to Use Neuroscience to Create Memorable Presentations from Information Development World

Categories: DITA


bobdc.blog - Sat, 2014-12-13 14:13
What it is and how people use it: my own summary. Bob DuCharme http://www.snee.com/bobdc.blog
Categories: DITA

[Slide Deck] Clear and Simple: Lower Your Content Costs with Global English

The Content Wrangler - Wed, 2014-12-10 14:00

If you missed The Content Wrangler Virtual Summit on Advanced Technical Communication Practices, you’re in luck. You can watch the recordings on-demand, whenever you like. This session was delivered by Matthew Kaul and Greg Adams of AdamsKaul.com.

In this webinar, Matt and Greg explain what Global English is and who it benefits. The duo will also introduce you to some Global English techniques that you can implement immediately. And, they examine several case studies of companies who have implemented Global English—and have experienced dramatic results.

Slide deck: Clear and Simple: Lower Your Content Costs with Global English

Categories: DITA

[Slide Deck] The ROI of Intelligent Content

The Content Wrangler - Tue, 2014-12-09 14:30

If you missed The Content Wrangler Virtual Summit on Advanced Technical Communication Practices, you’re in luck. You can watch the recordings on-demand, whenever you like. This session was delivered by Mark LewisQuark.

You CAN prove the savings possible from moving your unstructured content to intelligent content. The benefits are measurable. Intelligent content combined with a content management system can facilitate savings and improvements in content development, translation, regulations, governance, multi-channel publishing, and quality.

In this session, Mark Lewis discusses how the various processes benefit from intelligent content and discusses metrics that prove the benefit. If it hurts, then it’s time to calculate the pain — and the relief. This session draws from concepts in Mark’s book, DITA Metrics 101, The Business Case for XML and Intelligent Content.

Mark also discusses which metrics you should gather so you can align your plan with corporate strategy and become the “Executive Whisperer.”

Slide decks: The ROI of Intelligent Content

Categories: DITA

[Slide Deck] The Future of Technical Communication is Marketing

The Content Wrangler - Mon, 2014-12-08 14:00

If you missed The Content Wrangler Virtual Summit on Advanced Technical Communication Practices, you’re in luck. You can watch the recordings on-demand, whenever you like. This session was delivered by Scott AbelThe Content Wrangler.

Once a prospect buys a product or service, the content they interact with is no longer familiar. The instructions provided don’t look, feel, or sound anything like the marketing and sales materials that introduced them to your brand. Neither does the service contract, the warranty, the customer support website, the product documentation, nor the training materials.

The extensive variability in customer experience — and each customer touchpoint — creates a different and inconsistent version of the brand, some that bear little or no resemblance to the brand that executives believe they are building. There are often as many brands as there are touch points.

For no good reason, the content experience changes drastically — and not in a good way. That’s why organizations that recognize the importance of a unified customer experience have started rethinking what it means to be customer-centric.

Some forward-thinking organizations are reorganizing customer-facing content creators into teams under one roof. They’re breaking down the barriers — the silos — that prevent them from collaborating; from creating a unified customer content experience.

In this presentation, Scott Abel, The Content Wrangler, discusses the challenges of content inconsistency and incongruity, and why he thinks the future of technical communication is marketing.

Slide deck: The Future of Technical Communication is Marketing

Categories: DITA

[Slide Deck] Fandom Isn’t Random: How To Cultivate A Loyal Customer Base

The Content Wrangler - Fri, 2014-12-05 16:00

If you missed The Content Wrangler Virtual Summit on Advanced Technical Communication Practices, you’re in luck. You can watch the recordings on-demand, whenever you like. This session was delivered by Andrew ThomasSDL.

Andrew Thomas delivers a fast-paced look at how to leverage content to cultivate a loyal customer base.

Andrew is Director of Product Marketing for Content Management Technologies at SDL, focusing on structured content technologies. Andrew has worked with XML for a wide variety of content, from marketing materials, to printed manuals and web applications. He’s witnessed firsthand, the diversity of structured content and how it can empower businesses and customer engagement. Before joining SDL, Andrew was a language intelligence solutions manager for Adobe Systems and oversaw the translation process for their DITA content.

Slide Deck: Fandom Isn’t Random: How To Cultivate A Loyal Customer Base

Categories: DITA

[Slide Deck] Intelligent Delivery Systems: Moving Beyond The Book Paradigm

The Content Wrangler - Fri, 2014-12-05 00:00

If you missed The Content Wrangler Virtual Summit on Advanced Technical Communication Practices, you’re in luck. You can watch the recordings on-demand, whenever you like. This session was delivered by Joe Jenkins, Oberon Technologies.

The world is constantly moving faster and faster. People are no longer willing to wait or read significant amounts of content in order to find out how to do something or get answers to their questions. They take the shortest route to the solution – if they cannot get the information quickly they tend to ignore the information and try to figure it out on their own.

However, most technical information is still delivered in archaic book-like formats that require the user to traverse and interpret more content than they need in order to get their answers. If you want your content to serve its purpose, you have to adapt it to this fast-paced world.

Companies have significant investments in the creation and maintenance of content that is never used. Not because it is not useful or relevant but because the content is not presented in a way that is useful to the end user. Companies are so locked into the paradigm of presenting content in a book format that they adapt the technology to this paradigm rather than adapting the content to technology.

Watch this recorded webinar and discover how you can leverage your investment in structured content to deliver content that is highly targeted, personalized, dynamic and interactive. Discover how to enable your content to be delivered via more channels, including cloud-based publishing, on-demand publishing, and personalized mobile applications, to name just a few.

Slides: Intelligent Delivery Systems: Moving Beyond The Book Paradigm

Categories: DITA

The Role of Content Inventory and Audit in Governance

The Content Wrangler - Wed, 2014-12-03 21:06

Paula Land headshot

By Paula Ladenburg Land, special to The Content Wrangler

Note: This content appears in Content Audits and Inventories: A Handbook (2014, XML Press), by Paula Ladenburg Land, part of The Content Wrangler Content Strategy Series.

The Role of Content Inventory and Audit in Governance

Website governance covers a broad range of policies, standards, and structures for creating and maintaining data, content, and applications. In this book, I don’t cover all the complexities of site governance, but I would like to briefly address content governance and some ways that an inventory and audit can play a part.

Content governance is often expressed as lifecycle management – the rules and processes that underpin everything from content planning to creation to publication to ongoing optimization. The roles and tasks that accompany those steps include identifying who is responsible for creating and maintaining content, developing standards for content quality, and incorporating metrics and feedback into a process of continual improvement.

When governance policies are not in place or are not followed, website content can become disorganized, stale, and ineffective at meeting business and user needs. These problems can trigger a content strategy initiative when the business realizes that the site is failing. A time-consuming, expensive project gets kicked off, an inventory and audit are completed, and a strategy is developed. To avoid costly one-time improvement efforts like this, you need to create a “virtuous circle” – a feedback loop that enables your company to learn and improve over time. You need to update your style guides, your glossary, and your governance policies, and then feed all that back to your content creators so that new content is created to updated standards and you’re constantly improving rather than doing major overhauls.

The Rolling Inventory and Audit

How do you create that virtuous circle? Institute a rolling (ongoing, periodic) inventory and audit. A rolling inventory and audit allows you to assess content mix, quality, and effectiveness against ever-changing audience needs and business goals.

The inventory, done at regular intervals or after major content publishing initiatives, enables you to monitor the quantity and types of content on your site. The data you gather in an inventory, particularly if you are using an automated tool, can help you quickly identify trouble spots, such as missing metadata, unwieldy site structure, and problematic metrics. The inventory also gives you the structure to track information, such as the content owner and the age of the content, that helps when you audit.

Content planning, often the first step in a content lifecycle, can benefit from the inventory too, as you track what content exists, what’s effective, and what’s not, helping you plan to fill gaps or strengthen weak areas.

At the other end of the cycle, the data supports ongoing optimization of content as you analyze your metrics to see what should be pruned or revised.

The content audit can also be done on an ongoing basis. You probably don’t have the resources – nor is there a need – to audit every piece of content frequently. Instead, identify the content areas that are most likely to stray from your quality standards, either by becoming outdated or by no longer adhering to your brand guidelines.

For example, seasonal content must be reviewed at the end of each season. But there is no need to regularly revisit published press releases other than to consider archiving them after a certain number of months or years. Content that changes frequently should also be reviewed frequently – for example, content about products and services. Content that tends to be overlooked because it is considered static or not directly related to sales or other conversion metrics, such as company information and staff pages, should also be reviewed regularly.

Keeping track of your content’s age and setting a reminder to review any content older than, for example, a year is one way to trigger an audit exercise. You can also plan audits around your editorial calendar.

A rolling audit is also a great way to draw upon the larger content team. Just as you assembled a team to do the initial audit, dividing up responsibilities, you can assign team members ongoing audit duties, breaking up the audit by content area, for example. This not only distributes the workload but also helps ensure ongoing involvement with content quality and buy-in to the process across the organization.


Websites are living entities, constantly changing and adapting to new business strategies and new audiences. Organizational energy is often focused more on the creation of new content than on the governance and ongoing maintenance of existing content. The result can be sites that are overgrown and no longer effective at meeting goals. Rather than let your site get to the point where a major content repair project is required, adopt the rolling inventory and audit to keep the site in a state of constant review and improvement.

Copyright © 2014 Paula Land, used with permission.

About the Book

CW_Cover_LadenburgLand-frontSuccessful content strategy projects start with a thorough assessment of the current state of all content assets — their quantity, type, and quality. Beginning with a data-rich content inventory and layering in a qualitative assessment, the audit process allows content owners and business stakeholders to make informed decisions. Content Audits and Inventories: A Handbook, by veteran content strategist Paula Ladenburg Land, shows you how to begin with an inventory, scope and plan an audit, evaluate content against business and user goals, and move forward with a set of useful, actionable insights. This practical, tactic-filled handbook walks you through setting up and running an inventory using an automated tool, setting the stage for a successful audit. Specific audit tactics addressed include auditing for content quality, performance, global considerations, and legal and regulatory issues. You will also learn how to do a competitive audit and incorporate personas into an audit. Tips on presenting audit results to stakeholders will help you deliver effective strategies. Buy the book!

Additional Resources

Free Trial of Content Analysis Tool  

If you would like to try the Content Analysis Tool for free, contact Paula and request a 5000-page block free trial. Then, let us know what you think.

[Sample Chapter] The Role of Content Inventory and Audit in Governance from Scott Abel

Categories: DITA

How to Manage Multichannel Content Marketing

Content is certainly the unifying element of a brands’ marketing across physical as well as digital channels. Once you have created your killer content how do you maximize its reach? How do you push out your content beyond your own channels in ways that are manageable? This session includes presentations by two organizations that have […]
Categories: DITA

Marketing Technologists on Multichannel and Enterprise Integration

Marketing technologists are no longer rare birds, though they are often found in unfamiliar environments with less than obvious plumage. There are marketing technologists in many of our sessions this year, but we have selected a few to look at the two toughest challenges they, and their organizations, face in building modern digital strategies: support […]
Categories: DITA

Harvard Business Review and WGBH transforming digital engagement

Engaging customers and online audiences requires the right mix of technology, content, and tools, orchestrated in a way that leverages deep customer knowledge to deliver the right content at the right time in the right fashion. That’s a tall order, yet it is a “do or die” imperative for organizations that use content to make […]
Categories: DITA

Speaker Spotlight: It’s more than just making a website responsive

As we did last year we’ve posed some of our attendees’ most frequently asked questions to speakers who will be at this year’s Gilbane Conference. Today we’re spotlighting Arjé Cahn, Co-founder and CTO of Hippo. You can see all Speaker Spotlights from our upcoming conference as well as last year’s event. Arjé Cahn Co-founder & CTO, Hippo Follow Arjé @arjecahn […]
Categories: DITA
XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I