Improve Documentation by Automating Spelling and Grammar Checks

Chris WardNovember 25th, 2016Last Updated: November 24th, 2016

0 78 4 minutes read

What’s one of the first things you look at when trying a new piece of software? Or after you’ve hit that tempting Download button, what’s your usual next step? I will take a bet that for at least 70 percent of you, it’s the documentation that you check out next.

However, writing documentation is typically something that developers would rather not handle. Or at least, they don’t enjoy it very much.

A couple of months ago someone said to me:

Documentation is like housework; we all know we need to do it, but don’t actually want to do it.

This is not an article about writing better documentation. For that, I recommend reading my recent popular post on Medium. Instead, this is the first in a series of posts on automating the boring and repetitive tasks in a documentation process. This article covers spelling and checking documentation for common grammatical and style issues, and in future articles, I’ll cover testing code, generating screenshots, and other crazy ideas.

As with programming, technical writing has a plethora of formats and options to choose from. I will present my favorite methods in this article, but this doesn’t mean there aren’t other methods that suit you better. I’ll also use Codeship as my CI tool of choice, but all these methods should work with any other CI tool with slight modification.

Spelling

Good spelling helps give your documentation polish. However, even though we have had spell checkers for years, mistakes still slip through. For this example, I am going to assume that your documentation is in a plain text format such as markdown or restructured text and that you use a text editor. I personally use Atom, but similar principles should apply for any other editor.

The advantage of using plain text formats and more ‘open’ text editors is that you can often use the same tool chain on your desktop editor and local command line as on your CI server, including sharing custom dictionaries in version control.

Set Up Codeship

In the Configure Project screen of your project, select I want to create my own custom commands, then in the Setup Commands box, add the following:

npm install markdown-spellcheck -g

This installs the markdown-spellcheck npm module, which is specifically for checking markdown files. Other options to investigate are:

Coala: A cool new project that aims to consolidate many of tools discussed in this article into one package that also works well with CI tools. Covering Coala in detail may be a future article.
Sphinxcontrib-spelling: For those of you using restructured text for documentation.
Creating your own bindings around one of the myriad spellchecker modules on NPM.
Using aspell or Hunspell as a dictionary. These are far more fully featured and you can use them in desktop apps, but they require more configuration.

Next in the Test Commands section, add the following:

mdspell -r -n -a --en-us *.md

This runs the spellcheck over all markdown files in report mode (-r), ignoring numbers (-n), and acronyms (-a) with a US English dictionary. Now save the settings, push to the repository and watch the output.

When the build has run, click on the mdspell -r -n -a --en-us **/*.md command to see the output. At the moment, this output is extensive; a lot of words in our documents are not recognized as “real” words. You can create your own custom dictionary of words that mdspell will ignore in a .spelling file. I personally add this to a repository and use it as a submodule so that others can share and collaborate on it. Create your custom word list by adding each word on a new line in the file, like so:

Docker Codeship GitHub

Writing Better

English has grammar rules that, unless you break intentionally, you should adhere to for clearer writing. There are also guides and best practices for better writing, that you decide to follow.

This brings us into the territory of linting. Linting is a process of providing guidelines for improvement to code or text, but I’ll return to that subject more in a later article. For now, we will use a linter solely for English.

There are a collection of linters that offer improvements to language, but one of my favorites that includes a suite of different checks is the write-good linter. Its checks include checking for “weasel” (or unnecessary) words, cliches, repeated words, and passive voice.

Set up Codeship

Returning to the Configure Project screen, add the following in the Setup Commands box:

npm install write-good -g

Next in the Test Commands section, add the following:

write-good *.md

By default, this runs all checks across all files. You can turn checks on and off by specifying them:

write-good *.md --weasel --so

Or by excluding them with no-:

write-good *.md --no-weasel --no-so

And now you will see the results of this check in the build output.

Passing or Failing Builds

CI tools such as Codeship generally rely on the output codes of commands to determine if the command was successful or not, and thus marking the build as successful or not. While I consider running the two commands mentioned in this article important, you may not want to mark a build as broken because of their output.

By default, no matter how many spelling errors, mdspell outputs a 0 code, which Codeship treats as successful. However, write-good outputs the number of ‘errors’ as an output code, which means the build will always break. To manually override this and make the build pass, you can pre-pend || true to each command:

mdspell -r -n -a --en-us *.md || true
write-good *.md || true

And now you can review the checks by clicking on the command name in the Codeship log output and reserve breaking builds for other commands.

Next Steps

If you’re a CI guru, you are probably seeing ways to improve these commands, such as piping commands to notification systems or generating dashboards of frequent mistakes. Experiment, let me know what you create, and in my next post, I will look at testing code examples in your documentation.

Reference:

Improve Documentation by Automating Spelling and Grammar Checks from our WCG partner Chris Ward at the Codeship Blog blog.