xkcd.WTF!?

Image loading failed. try again

Invisible Formatting

To avoid errors like this, we render all text and pipe it through OCR before processing, fixing a handful of irregular bugs by burying them beneath a smooth, uniform layer of bugs.

Explanation

Most word processor programs allow the user to select sections of text, usually by clicking and dragging the cursor across the text, or by using common mouse shortcuts such as double-clicking to select a word and triple-clicking to select an entire line. The selection is usually indicated by highlighting the text's background, such as the bright blue highlight shown in the comic.

A common reason for selecting text is so that formatting can be applied to the selection (eg. italics or bold formatting). Since space characters are part of the typography, such formatting gets applied to them too; however, as the character has no visible glyph, the formatting has no visible effect (a bold space looks exactly the same as an unformatted space). However, the formatting is still there in the document's underlying markup - it just can't be seen.

This leads to a possibility that a user may accidentally introduce invisible formatting into a document without noticing. Such formatting has no effect on how the end user will read the document, but it could theoretically cause problems for programs that later come along to parse the document, if those programs have not been told to expect formatting. Randall worries about this invisible threat.

In the comic, Randall accidentally introduces the invisible formatting by selecting one more character than he needed to ("n", "o", "t", and an extra space character), applying bold formatting to those four characters, changing his mind, reselecting only the three characters "n", "o", "t", and removing the bold formatting. Because he failed to notice that a space character had been selected when applying the bold, he failed to remove the bold formatting from the space. As a result, the document now contains an invisible bold space that will likely go unchecked, as nobody can see it to fix it.

There are a couple of ways Randall could have avoided this problem. In many word processors, double-clicking a word will select all characters in the word and nothing else; this is an easier action than trying to drag the cursor, which can be fiddly and inaccurate. This would have prevented Randall from accidentally selecting the space character; although could create the problem if multiple words (and the space(s) between) were initially enboldened but then constituent word-groupings were unenboldened, leaving the whitespace between unreverted. Alternatively, if the program had an "undo" feature, Randall could simply have undone the bold formatting instead of removing the formatting manually. This would have undone the bold formatting on the space character, fixing the problem (and saving time, too), but only presuming that other changes had not occured in the interim which weren't more important/time-saving to keep.

Though Randall is likely thinking of computer-related problems caused by his invisible formatting, there is also another possible problem: it leaves trace evidence of Randall's formatting attempt. For example, if an editor later comes along and notices the bold space, they may figure out that Randall originally bolded the word "not" before changing his mind. Depending on the context, a bolded "not" could be enough to change the tone of the text from polite and formal to dismissive (eg. "We believe you are not suitable for this position." vs "We believe you are not suitable for this position.")

In the title text, Randall says that he fixes such invisible formatting errors by running the text through OCR, which turns images into text. Since OCR uses optical recognition, it would not be able to detect the invisible formatting and would therefore not reproduce it. Although this would "fix" the invisible formatting issue, it would likely introduce a bigger problem: OCR is not 100% reliable at recognizing characters or formatting, and often produces inaccurate results. However, Randall facetiously suggests that this is a preferable state of affairs, as OCR at least produces errors at a reasonably consistent rate, which Randall feels is better than irregular invisible formatting errors.

As the title text explains, Randall finds it very important to control all information he publishes. Real-world examples are governments changing the impact of reports for political reasons. Attempted tampering of this kind can be revealed by bold spaces. Another example would be a casual and short one-sentence reply e.g. to a romantic interest, which one takes one hour to formulate to sound as natural as possible.

There are also other occasions where a hidden bold space may be a problem for later editors (see the Trivia section below). Randall’s background in computer programming could also make him more attentive to these types of technical problems, and therefore add this as a reason for his worries about invisible formatting.