Use Literate Programming For Tricky Code and Especially For Scripts

Literate programming was invented by Donald Knuth and used by him to develop his TeX system. Literate programming flows from the observations that software in textbooks is easier to understand than software in computers and that part of the reason for this is that software in textbooks doesn't stand alne. In fact, it is embedded within human readable explanations.

With literate programming, you write a textbook explanation of your code and include some annotations to identify the embedded code. Knuth built two tools to help. One of these weave, creates cross references and formats your text for printing. The other tangle, produces code for your compiler/interpretor. You can read about Knuth's system and its use in his book on the subject.

The literate programming tool I use was written by Norman Ramsey and is called noweb. You can read about it here. This language independent tool helps you weave both Latex and HTML versions of your literate program as well as tangle it for use with your compiler/interpretor.

In the approximately twenty years since it was invented, literate programming has not become a significant tool in producing production software. The reason probably is that writing a textbook, or a chapter of a textbook, to explain and present a a computer program is quite a bit more work than merely writing the program.

On the other hand, software that is embedded in a textbook is easier to understand than software in a computer. Tools to help understanding are important for the original programmer if the software is tricky. Tools to help understanding for maintenance programmers are important most of the time.

The Value Of Literate Programming

Ask yourself, "Where is need for understanding so great that the extra work of literate programming is worthwhile?"

One answer is that the tricky part of a program -- the part I previously advocated you should isolate in its own module -- is a good candidate for writing in a literate style. After all, there is no reason why a literate program cannot describe a module rather than a complete program.

Another answer is that scripts written for adaptation by others should be literate. Almost every mass distributed program worth its salt now has a scripting language. It is not uncommon for a programmer to need to deal with several of scripting languages during a year. Many encounters are rather brief -- the programmer customizes a tool for a user and goes on to something else. Because of this, the programmer may not know the scripting language very well and may rely on an example script to see what needs to be done.

Relying on an example script can be frustrating. Why is it the way it is? Explaining why can also be frustrating. Sure you can write comments, but comments are a poor substitute for the flexibility literate programming gives you in your presentation style.

Here's an example -- its a subroutine do do lexical analysis written in pseudocode that is based on the Perl scripting language:

 

<<lexical subsystem>>= 

sub advance {

  return unless $Toke;

  $Toke = '';

  local($Where,$OldCur);

  for(;;) {

    if( <<a new Toke is found in Buffer>> ) {

       <<put Buffer up to Toke into Prev>>

       $StartChunk = 0;

       return;

    } else {

       <<put Buffer into Prev>>

       if( $Final ) {

          return;

       } else {

          &newBuffer;

       }

    }

  } 

}  

Pseudocode statements appear this way

 

<< ... >>  

The rest is Perl.

When this literate program is tangled, pseudocode phrases are replaced with Perl code. Although the example is in Perl, the noweb system can be used with any scripting language. When the literate program is woven for reading with a WWW browser, each pseudocode phrase becomes a hyperlink to the code it represents.

For simplicity in the above example, only one pseudocode instance has a hyperlink. It is the phrase "a new Toke is found in Buffer." The actual Perl code for this statement is quite short, it looks like this

 

<<a new Toke is found in Buffer>>= 

($OldCur=$Cur) <= $LastCur && &search

but the explanation for why this code works is much longer. If it were placed as a comment it would make the code unreadable. If it were explained externally, the link between explanation and code would be tenuous. If it were missing, the side effects would drive you crazy. Literate programming strikes the right balance.

Some of you are, of course, saying "don't write code with such side effects". Here, its true that one visible side effect could be avoided. However, in scripting languages, side effects are often the only way things get done.

If you are writing scripts for others to adapt, you can save time for those people by writing in a literate style. Even if they don't have a literate programming tool, they can read what you have written with a WWW browser. If they do have a tool, they will find adaptations of your script even easier. In short, literate programming adds value to your scripts.

Now, suppose you need to adapt a script for a spreadsheet. One spreadsheet publisher provides examples written in a literate style and the other does not. Which is more useful to you?

Or, suppose you must get your example script from a third party. One source is free but not literate. The other costs a little but is literate. Which do you choose?

This programming tip is obviously not for everybody. But, if you are distributing scripts for others to adapt, you should strongly consider the investment required to make them literate.

Copyright and Permissions

Copyright, 1995 by J Adrian Zimmer

This tip is distributed to individuals free of charge from the Software Build and Fix web site. All other distribution (including but not limited to internal distribution within an organization and mirroring of any kind) is forbidden without written consent of the copyright holder.

Return to the top of this document.

Context  Some Tips for Programmers    Author J Adrian Zimmer  
Dated: October 15, 1995, Revised Dec 10, 1996 ; Revised: Oct 07 1998