Low-cost evaluation

Comments on the paper Guerrilla HCI: Using Discount Usability Engineering to Penetrate the Intimidation Barrier, by Jakob Nielsen, in Cost-Justifying Usability (edited by Randolph G. Bias and Deborah J. Mayhew), Academic Press, May 1994. Related links:

Good points

  1. Neilson gives a (mostly) convincing account of the usefulness of "low"-cost evaluation. The low-cost user testing seems to me to be pragmatic and approachable. I am somewhat relieved to find out that statistical significance is less, er, significant, than presented by Landauer. Three to five users for empirical testing sounds much more achieveable than 24.
  2. Neilson's list of ten heuristics looks like good material for wanna-be experts like me :-) to start with. My favorites are Error Prevention, and Flexibility and Efficiency of Use. Despite my objections to his benefits analysis (below), I do believe that the technique has value, and here is the essence distilled. What is not clear is whether this list can be effectively used by developers themselves to avoid or reduce the number of problems in the first place.
  3. "Anything is better than nothing." I like his description of conquering from within...

Bad points

  1. I was a little surprised that heuristic evaluators evaluate alone. I would have thought that the synergy of multiple minds that works in situations such as software reviews might work here as well.
  2. Neilson's benefits analysis for heuristic evaluation is hard to swallow: the analysis of the benefits gained from the experts' usability evaluation is based entirely on the experts' own evaluation of the improvements they expected when the problems they identified were fixed! Add to that some handy guesswork, and frankly, I am surprised that he could plonk down a number like a half-million dollars with such "conservative" conviction. He even cites this supposed result as "a case study" in the linked-to paper (I'm not saying that he's not right, just that I don't buy his argument.)
  3. Neilson gives exactly the same cost-benefits graph for number of users in thinking-aloud tests, and number of evaluators in heuristic evaluation (in the linked-to page). I think that's rather fishy. Actually, I am a a little bit skeptical of Jakob Neilson since discovering that his page on the Sun Web site design (http://www.sun.com/sun-on-net/uidesign/index.html) fails the simple usability criterion of being findable again from the home page.


Comments on the paper Faster, Cheaper!! Are Usability Inspection Methods as Effective as Empirical Testing?, by Heather W. Desurvire, Ch. 7 of Usability Inspection Methods, edited by Jakob Nielsen & Robert L. Mack, John Wiley & Sons, New York, 1994, pp. 173-202.

Good points

  1. Great data on the actual effectiveness of heuristic evaluation!
  2. Well, now I know that software developers make poor heuristic evaluators. Still, the developers still found nearly half of the problems that the experts did (who in turn found less than half of those found in the laboratory testing). (Table 7.3.)
  3. Aha, an answer to the group vs lone evaluator question. A small (15%) improvement.

Bad points

  1. Um... I'm afraid the complete lack of explanation of the meanings of R^2 and F left me unable to interpret the results in Tables 7.1 and 7.2. I gather p is significance.
  2. I don't understand what the difference between the experts' heuristic evaluation and their "best guess" is. Tables 7.1 and 7.2 again.
  3. Table 7.8 appears to indicate that non-experts using PAVE are as effective as experts using heuristic evaluation, but the authors didn't say so. Did I miss something?

John Reekie, February 17th, 1998.