Dr. Drake and the Information Equation

One of the most fascinating films of 1997 was Contact, starring Jodie Foster. Foster portrays Dr. Ellie Arroway, an astronomer who discovers a complex signal transmitted from another world. The signal contains a giant user manual describing how to build a machine that will magically transport a single occupant to another galaxy. The machine is massive and costs $500 billion to build.

The film explores the mad race to build the machine, the decision of who will ride in it, and what they will encounter, if they even survive the trip.

A SETI is Not a Couch
Although the film is pure science fiction, the organization to which Dr. Arroway’s character belongs, the SETI Institute, is quite real. SETI, an abbreviation of “Search for Extraterrestrial Intelligence”, is made up of scientists and researchers working to find intelligent life on other planets.

One of the real scientists working for SETI is Dr. Frank Drake. Drake is a leading American astronomer who in 1960, while working for the National Radio Astronomy Observatory, conducted the first radio search for extraterrestrial intelligence.

The Drake Equation
He developed a formula, called the Drake Equation, which tries to calculate the number of planets in our galaxy with beings who are intelligent enough to communicate with us.

The Drake Equation goes something like this:

N = R* x fp x ne x fl x fi x fc x L
which roughly translates in English to:
N = the number of civilizations within our galaxy that can communicate equals:
R* = the rate of formation of stars that are suitable for the development of intelligent life
fp = the fraction of these stars that have planets
ne = the fraction of these planets have an environment suitable for life
fl = the fraction of these planets on which life actually appears
fi = the fraction of these planets where intelligent life develops
fc = the fraction of these planets that develop communication
fl = the number of years that these planets exist

Based on this formula, Drake estimates there are about 10,000 planets within our galaxy that could communicate with us. Of course, the big question is, why haven’t they?

There is no known solution to this equation, because is impossible to know what all these numbers are. However, this equation is still a useful tool for scientists. Although they disagree on the numbers, they mostly agree that the factors in this equation would be ones that determine what the final number (N) would be.

Don’t Chain Me Down
This formula is a mathematical way of saying that a chain is only as strong as its weakest link. Because if any of these numbers are 0, there can be no possibility of us Earthlings finding someone else to talk to. That would be unfortunate, if you think of all the potential business opportunities for tech writers if we ever could produce documentation for other planets.
In our field, we don’t seek other planets, but do seek to create the ultimate, perfectly complete and understandable document – this is our holy grail.

The Documentation Equation
Therefore, by applying the Drake Equation to information development, we can derive the Documentation Equation:
V = A x W x Aa x As x Tla x Tls x Tch x Tcl x Tc x Tr x Pa

In this formula, V is the Value of of an information set, or infoset, derived by multiplying the following values together:

  1. A = infoset Awareness
  2. W = Willingness to use infoset
  3. Aa = infoset Access ability
  4. As = infoset Access success
  5. Tla = Topic location ability
  6. Tls = Topic location success
  7. Tc = Topic comprehension
  8. Ut = User task completion
  9. At = Application task completion
  10. Tr = Task relevance
  11. P = Practicality of application

Let’s explore each of these values in turn.

V = Value of Infoset
This is the final value that we are after – a number on a scale of 0 to 100% that rates how valuable an infoset it is. An infoset is defined as any grouping of information or documentation, for example, a single document or a related set of documents, an online help system, a website, and so on.

Factors of V
V is derived by multiplying eleven factors together. The following sections describe each of the factors in this equation. The factors involve probabilities of what average users will do in an average information search situation. However, average user and average search situation are difficult things to define.

For the purposes of this formula, we’ll just have to imagine that the factors describe what most users will be doing, or will encounter, most of the time, in most of the situations where they need information to help them complete a task or understand a concept.

1. A = infoset Awareness
This is the probability that the average user knows that the infoset they require exists. In most cases, this should be close to 100%, but sometimes users aren’t even aware there is a manual or help system to refer to. If this is the case, the infoset has no value for that user. Fortunately, most users today have come to expect there will be some sort of information provided with their product, even though they may not always use it – silly users!

2. W = Willingness to use information set
This is the probability that the average user not only knows that the infoset exists, but is willing to use it. As we all know from experience, many users view documentation with the same affection as going to the dentist: it’s a pain, but it’s something you have to put up with once in a while. If a user does not want to use an infoset, it has no value.

3. Aa = infoset Access ability
This is the probability that the average user not only knows that an infoset exists and wants to use it, but knows where and how to access it. Sometimes a user won’t know the exact location of an infoset. For example, they may not know where to locate a particular PDF file or support website that they need. Again, for these users, the infoset would have no value.
4. As = infoset Access success

This is the probability that the average user not only knows how to access the infoset, but is successfully able to do so. Examples of not being unable to access an infoset include trying to view a website that is down, or read a PDF or other type of document that has become corrupt and is unreadable.If a user cannot successfully access an infoset, it has no value.

5. Tla = Topic location ability
This is the probability that the average user is not only able to successfully access the infoset, but knows how to locate a specific topic. For example, the user would have to know how to use the “contents”, “index” and “search” functions within an infoset to be able to even start looking for what they need. Inexperienced users may not have a clue where to start.If a user does not know how to locate a specific topic within an infoset, the set has little or no value.

6. Tls = Topic location success
This is the probability that the average user not only knows how to locate a topic in an infoset but successfully does so. This value will depend on the quality of the index, contents and search engine used in the infoset. It does not necessarily depend on the size of the infoset. Users should still be able to effectively search even the largest of infosets, provided the index, contents and search functions are well-developed. If a user cannot locate a specific topic within an infoset, the set has little or no value.

7. Tch = Topic comprehension
This is the probability that the average user can understand the specific topic they find. This factor is fairly straightforward: if a user can find the topic they were looking for but can’t understand it, then the topic has no value to the user. This, in turn, lowers the value of the infoset.

8. Ut = User task completion
This is the probability that the average user can complete the task using the specific topic they found. For the purposes of this formula, there are two basic types of tasks:

  • Procedural tasks: tasks that describe how to complete a specific procedure by following one or more steps.
  • Learning tasks: tasks in which the user simply wants to know more about something but does not actually complete a procedure. Learning tasks include reading an overview, learning high level concepts, and understanding descriptions of terms used in the application.

In many cases, users are searching for procedural tasks. Whether a user can successfully complete a procedural task depends on the clarity and completeness of the steps listed in the task. If the steps are unclear or cannot be easily followed, the topic has no value.
A user can successfully complete a learning task if they can accurately understand the concept, idea or definition they were after. Again, this will depend on the accuracy and clarity of the topic.

9. At = Application task completion
This is the probability that the application itself will correctly complete the request task after receiving input from the user. This factor does not apply to learning tasks because these only involve understanding by the user, and not any application processing. Unless, of course, the user experiences the dreaded “brain malfunction”.

This is similar to the previous factor but relates to what happens after a user successfully completes all the steps outlined in a procedural task. One would assume, at that point, that the application would take over and do whatever processing is required to complete the task. However, if there is a defect in the application, and the processing fails, then the topic becomes useless. Or to put it another way: the operation was a success, but the patient died.

The fact that this failure is not the information developer’s fault is irrelevant. From the user’s perspective, if they cannot complete the task, then the topic describing that task has no value.

10. Tr = Task relevance
This is the probability that the average user, after finding the topic they were looking for and successfully completing the task it contained, that the task itself was actually relevant to the user. That is, that it was the correct and appropriate task for what they were trying to do in the first place.

Now, you may argue that if the user has successfully found the topic they were looking for, wouldn’t that automatically mean the topic is relevant to them? In most cases, yes, but sometime users will find the topic they want, but not what they need.

Microsoft Word – Stop the Insanity!
For example, someone may want to send out a letter to a group of people using Microsoft Word. A novice user, completely unaware of the mail merge feature in Word, assumes they will have to copy and paste the text of the letter several times, and then just change the name on each letter. As the letter changes over time, they assume they will need to do a “search and replace” of the specific text. They therefore search for and successfully find information about copying, pasting, searching and replacing. Instead, what they really needed was information about mail merging.

Now, the big question is: does the information they successfully obtained have value? This question is really a litmus test of what you believe the purpose of information development is:

Is it to give users the information they:
a) want?
b) may not know they want, but in fact, need?
If you believe (a), then you will always assign a value to task relevance of 100%.
If you believe (b), then you may assign a value to task relevance of less than 100%.

Neither choice is “right” but is something to be aware of as you try to determine the value of information.

One of the greatest challenges in information development is the fact that users don’t know what they don’t know. Great documentation is one that is able to anticipate users needs and clearly present them with information about how to do things in the most effective way possible, in ways that they may not have even anticipated. In other words, to read their minds!

11. P = Practicality of application

This is the probability that for the average user, the entire application is practical and relevant to them. This is similar to the previous factor, but deals with the application as a whole rather than the tasks that comprise it.

In most cases, we would hope this value would be close to 100%, but again users don’t know what they don’t know and sometimes work with the wrong tool. For example, a user could use WordPad to create basic documents. However, if they want to add elements such as borders, automatic page numbers, styles, tables, TOCs, indices, auto-numbering, and so on, they should use a full-featured word processor such as Word. Users who are using WordPad when they should a word processor may be doing what they want, but not what is best for their needs in the long run. They may find what they need to do in the WordPad help, but the help has little value if they are using the wrong tool.

Again, it comes down to whether you believe information should only give users what they want, even if what they want is not the right thing. It is the ultimate catch-22: if we try to give users what they need, we are called arrogant, self-righteous, know-it-alls. If we try to give users what they want, we are called weak, unhelpful panderers who are too lazy to tell users what they really need to know. Select whichever insult bothers you less and run with it.

The V Files – “The Truth is out There”
By multiplying all these factors together, we obtain V – the value of the infoset. The highest possible value for V is 100%, which although theoretically possible, is in practice, impossible. This is because there are too many uncontrollable variables, not the least of which are the users themselves. It is impossible to predict if or how every user will try to search for information, if they will find and understand the information, if they will successfully complete the task, and so on.

In fact, because V is derived by multiplying eleven numbers together (rather than adding them), the value of V will usually be very small. For example, if an infoset scored 90% in every variable, the value of V would be .9011 or a measly 31%. Even if the infoset scored a whopping 97% for every value, V would be just 71%.

Therefore, V, by itself, is a meaningless value. A score of 31% simply indicates that the particular infoset has a value of 31% of what is theoretically perfect. To give meaning to this value, you’d have to compare it to all other appropriate infosets. So, for example, if similar infosets for similar products scored an average of 23%, and the infoset you developed scored 31%, yours would be considered “above average”. The challenge is to define what are similar enough infosets to the one you are rating that you are making a meaningful comparison. That is, you want to compares apples to apples, and not apples to coffee mugs.

Summing Up the Numbers

The Drake Equation and the Information Equation have much in common.

In both, it is impossible to obtain completely accurate numbers for all the values. You may be able to get some of the values if you were to spent huge amounts of money on research, however then you would have to justify your return on investment.

Therefore, for both equations, the purpose is not necessarily to find the true numbers, but rather to make people aware of the factors that lead to the final value.

Your assignment, therefore, should you choose to accept it, is to find other factors that could be incorporated into the Information Equation. In doing so, you will help solve one of the greatest mysteries of our time:

What makes information have value?