Thursday, February 4, 2010

8.4 Additional Documentation Sources



[ Team LiB ]





8.4 Additional Documentation Sources


When looking for source code documentation, consider nontraditional sources such as comments, standards, publications, test cases, mailing lists, newsgroups, revision logs, issue-tracking databases, marketing material, and the source code itself. When investigating a large body of code, it is natural to miss documentation embedded in comments in the quest for more formal sources such as requirements and design documents. Yet source code comments are often better maintained than the corresponding formal documents and often hide gems of information, sometimes even including elaborate ASCII diagrams. As an example, the diagrams in Figure 8.3 and the formal mathematical proof in Figure 8.4[44] are all excerpts from actual source code comments. The ASCII diagrams depict the block structure of an audio interface hardware[45] (top left), the logical channel reshuffling procedure in X.25 networking code[46] (top right), the m4 macro processor stack-based data structure[47] (bottom left), and the page format employed in the hashed db database implementation[48] (bottom right). However, keep in mind that elaborate diagrams such as these are rarely kept up-to-date when the code changes.

[44] netbsdsrc/sys/kern/kern_synch.c:102�135

[45] netbsdsrc/sys/dev/ic/cs4231reg.h:44�75

[46] netbsdsrc/sys/netccitt/pk_subr.c:567�591

[47] netbsdsrc/usr.bin/m4/mdef.h:155�174

[48] netbsdsrc/lib/libc/db/hash/page.h:48�59


Figure 8.3. ASCII drawings in source code comments.


Always view documentation with a critical mind. Since documentation is never executed and rarely tested or formally reviewed to the extent code is, it can often be misleading or outright wrong. As an example, consider some of the problems[49] of the comment in Figure 8.4.

[49] Contributed by Guy Steele.


  • It inexplicably uses the symbol ~= the first three times to indicate "approximately equal to" and thereafter uses =~.

  • It uses both loadavg and loadav to refer to the same quantity. The surrounding code, not shown in the figure, uses loadav and a structure field named ldavg!

  • The approximations are very sloppy and there is no justification that they are close enough. For example, the result of the division 5/2.3 is approximated as 2. This has the effect of replacing the constant .1 in the original equation with 0.08, a 20% error. The comment does not justify the correctness of this approximation.

    Figure 8.4 A mathematical proof in a source code comment.


    * We wish to prove that the system's computation of decay
    * will always fulfill the equation:
    * decay ** (5 * loadavg) ~= .1
    *
    * If we compute b as:
    * b = 2 * loadavg
    * then
    * decay = b / (b + 1)
    *
    * We now need to prove two things:
    * 1) Given factor ** (5 * loadavg) ~= .1, prove factor == b/(b+1)
    * 2) Given b/(b+1) ** power ~= .1, prove power == (5 * loadavg)
    *
    * Facts:
    * For x close to zero, exp(x) =~ 1+ x, since
    * exp(x) = 0! + x**1/1! + x**2/2! + ... .
    * therefore exp(-1/b) =~ 1 - (1/b) = (b-1)/b.
    * For x close to zero, ln(1+x) =~ x, since
    * ln(1+x) = x - x**2/2 + x**3/3 - ... -1<x<1
    * therefore ln(b/(b+1)) = ln(1 - 1/(b+1)) =~ -1/(b+1).
    * ln(.1) =~ -2.30
    *
    * Proof of (1):
    * Solve (factor)**(power) =~ .1given power (5*loadav):
    * solving for factor,
    * ln(factor) =~ (-2.30/5*loadav), or
    * factor =~ exp(-1/((5/2.30)*loadav)) =~ exp(-1/(2*loadav)) =
    * exp(-1/b) =~ (b-1)/b =~ b/(b+1). QED
    *
    * Proof of (2):
    * Solve (factor)**(power) =~ .1given factor == (b/(b+1)):
    * solving for power,
    * power*ln(b/(b+1)) =~ -2.30, or
    * power =~ 2.3 * (b + 1) = 4.6*loadav + 2.3 =~ 5*loadav. QED
  • The approximations used for exp(x) and ln(1+x) depend on x being "close to zero," but there is no explanation of how close is close enough and no evidence or explanation provided to justify the assumption that x will indeed be close enough to zero in the actual application at hand. (A little analysis shows that for x to be "close to zero," loadavg needs to be "large," but there is no discussion of this, either.)

  • Before the proofs, the comment methodically lays out two useful "facts" about approximations on which the proofs rely. This is good. But the proofs also rely on a third fact about approximations that is not laid out ahead of time: (b-1)/b=~b/(b+1) if b is "large enough." The QED line of the first proof simply pulls a fast one.

  • Finally, the comment is actually much more verbose than it needs to be because the two proofs are redundant! They prove essentially the same mathematical fact from two different directions.


A nontraditional documentation element, intimately bound to the source code, is the revision control system. This repository contains a detailed history of the source code's evolution and, often, comments justifying each change. In Section 6.5 we examine how you can benefit from using such a system when reading code. Associated with a revision control system is also often an issue-tracking database. There you will find details of bug reports, change requests, and other maintenance documentation. When these originate from inside the development organization, they may provide background on design and implementation issues related to the code you are reading.


The tautology may appear to be an oxymoron, but the source code can sometimes be its own documentation. Apart from the obvious case of self-documenting code, sometimes you can read code between the lines as a specification, even if the actual code does not implement the underlying intention of what the code should actually do. Consider the following (trivial) shell-script excerpt.[50]

[50] netbsdsrc/usr.bin/lorder/lorder.sh:82



for file in $* ; do echo $file":" ; done

The code will display each one of the space-separated arguments appearing in the $* list on a separate line. It is obvious, however, from the loop structure that the intention of the code is to display each file name in the $* list on a separate line. Although the file variable should more appropriately have been named filename, and the code will not work correctly when file names contain whitespace, you can still read the code as the specification of what it should do, rather than what it actually claims it does.


Finally, you can often find additional documentation on the periphery or outside the development organization. Standards documents can be treated as the functional specification for software that implements a specific standard (for example, an MP3 player).Similarly, you can often find a description of a given design, system, algorithm, or implementation in journal or conference publications. If verification and validation are handled by a separate test group, you can use their test cases as a substitute or a supplement of a functional specification. When desperate, even marketing material can provide you with an (inflated) list of a system's features. And you can always search the Web for discussions, unofficial information, FAQ pages, and user experiences; archived newsgroups and mailing lists may sometimes reveal the rationale behind the design of the code you are reading. When the code is open-source, a particularly effective search strategy is to use three or four nonword identifiers from the code part you are reading (for example, bbp, indouble, addch) as search terms in a major search engine. Open-source code also provides you with the option to contact the code's original author. Try not to abuse this privilege, and be sure to give something back to the community for any help you receive in this way. Remember: most open-source projects are developed and maintained by (typically overworked) volunteers.


Exercise 8.13
Create an annotated list of documentation sources for the apache Web server or Perl. Categorize the sources based on the type of information they provide. Strive for wide coverage of areas (for example, specifications, design, user documentation) rather than completeness.





    [ Team LiB ]



    0 comments:

    Post a Comment