Programming In the Large Versus Programming In the Small

This was the title of a paper written twenty years ago by DeRemer and Kron. Their title heralded their justification for a module interconnection language to handle programming in the large.

My intent is somewhat different. I want to suggest a way of distinguishing between programming in the small and programming in the large and then to suggest that the distinction can help to make decisions about a number of software engineering issues.

Programming in the small is programming by one person (or by a small closely knit group of people). It produces code a single human can understand.

Here's what I mean by "understand". You understand a piece of code when you can modify it without unexpected consequences and without referencing outside documentation. You may need the outside documentation to know what somebody else expects the code to do, but you don't need it to understand how the code works unless it implements a technique you haven't learned.

Programming in the large is programming by larger groups of people or by smaller groups over longer time. It produces code that cannot be understood without a divide and conquer approach.

With programming in the small, the emphasis is on clean code that can be can be understood. This one way to support change.

With programming in the large, the emphasis is on partitioning the work into modules whose interactions are precisely specified. This requires careful planning and careful documentation.

We all know the process: once the system of modules has been shown to be well defined and viable, work can begin on individual modules. This work may be programming in the large or programming in the small depending on the complexity of the individual module.

With programming in the large, change can be difficult. If the change crosses module boundaries, the work of many people may need to be redone. Because of this, a goal of programming in the large is to create modules that need not be altered when probable changes are made.

DeRemer and Kron had this to say about the difference between programming in the small and programming in the large

"... structuring a large collection of modules to form a `system' is an essentially distinct and different intellectual activity from that of constructing the individual modules."
They were so right.

Programming in the small requires the skill of writing code that will do what it is supposed to, that others can read, and that isn't too hard to alter.

Programming in the large requires abstraction-creating skills. Until it's implemented that's all a module is, an abstraction. Taken together, the abstractions should create an architecture that is unlikely to need change. They should define interactions that are precise and demonstrably correct.

Programming in the large requires management skills. The point of the abstractions is not just to describe something that can work but to direct the efforts of people who will make it work.

Now, what does this distinction between types of programming mean for software engineering? I suggest four consequences.

The first consequence is that managers should be wary of promoting people to do programming in the large merely because they have been good at programming in the small. In a similar vein, programmers who are so promoted should consider getting extra training. Although it may seem heretical, organizations should consider creating career paths for those who would stay programmers in the small. It is not good for organizations to downplay the importance of the necessary skills.

The second consequence is that code documentation which describes the abstractions of programming in the large must be consulted, and if necessary, maintained with every change to a software system.

In practice, this is an onerous responsibility and not to be taken lightly. Here's two ways to avoid it.

  1. Do not not overspecify your abstractions. If it is possible to leave a detail to be decided with a single module's implementation, do so.

  2. Do not assume that because you have written a "general system design" that it represents programming in the large. A relevant question is whether your code could be cleaned up enough so that your successor could understand why it works without looking at your purported general system design.

It is not uncommon to see programming shops require general system designs without requiring that those designs describe the finished product. Those shops are probably not doing programming in the large. What they are doing is requiring two passes through the implementation, one somewhat vague and one for real. This isn't entirely bad. My own experience is that a first vague pass is often very helpful but only to me. Others need a clearer and more accurate representation than I am capable of in a first pass. Perhaps, programming shops that accept inaccurate general system designs should require the final code be clean and throw the purported system design away when it has served its purpose.

If the code implements a module whose interaction with other application modules is part of the reason it works, then programming in the large is involved and there must be a requirement for a general system design that stays accurate as the software is finished and maintained.

Often general system designs are necessary but they contain things that are not. Thus you shouldn't ask yourself "does this documentation belong to programming in the large" but rather "what part, if any, of this documentation belongs to programming in the large?"

The third consequence is that your choice of tools should be influenced by your understanding of programming in the small versus programming in the large.

Be wary of tools that claim to support both kinds of programming. Intellectual and commercial purveyors of such tools will claim that the development process is simplified when there is only one paradigm to learn and that one paradigm makes the process seamless from start to finish. Whether that one paradigm be stepwise refinement, object-orientation, or anything else, it either doesn't work with both kinds of programming or it simply isn't one paradigm. Once two paradigms are involved, the development process isn't as seamless as it may seem on the surface.

On the border between the two kinds of programming you need language tools that support modularity and you need automated sanity checks (such as provided by strong type checking). These things help to implement the designs created by programming in the large. But are they necessary for programming in the small? The answer may be "only so far as the small interfaces with the large".

A question that generates some heat is "to design in advance and use a compiled language or to design iteratively and use an interpreted language?" Since programming in the large seems to go well with the first alternative and programming in the small with the second, you should consider answering the question with "why not both?"

One possible set of tools that might help you get the best of both worlds is Tcl/Tk with C/C++. If the project requires programming in the large you could do the top level in C++. Within modules that can be handled with programming in the small, you could drop into interpreted Tcl/Tk code.

If you find later that the interpreted part of the program uses too many resources, you could translate part of it into C. Since the interpreter is implemented with a toolkit of C functions, the translation isn't particularly difficult or error prone.

For more about Tcl/Tk, here's an introductory book, and here's a web page referencing existing applications in both industry and research.

The fourth consequence is that we can separate what is practical from what is not in the area of formal methods.

There is a body of research that has applied formal methods to programming in the small. This research has been aimed at formal specification of programming languages and proofs of correctness for small programs. Problems of scaling up have been addressed. Compare this kind of research with my description of programming in the small and you will not be surprised it hasn't been picked up in practice. Efforts to scale it up are essentially efforts to create a seamless development system and should be viewed with extreme scepticism.

Another approach to formal methods is to use them to specify the abstractions of programming in the large and to prove some desirable consequences of those abstractions. This approach fits the needs of programming in the large so well, it is surprising formal methods are not in greater use. Or rather, it would be surprising if we were not aware of the tendency to seek a seamless development process.

There is a lot of information available about formal methods. Here's an excellent introductory book by a practitioner of my favorite technique Z notation, and here's a comprehensive web site devoted to formal methods.

The overall point here has been that, if you keep the distinction between two kinds of programming in mind when you make your software engineering choices, then you will make better choices. If as a community we keep this distinction in mind, we will do research that is more relevant and make tools that are more useful.

I haven't made suggestions about particular languages and methods in order to claim that these are your best choices. Rather I mention these tools because they have caught my interest and seem to be underused by a software community that has tended to search for seamless development processes and ignore the distinctions between programming in the small and programming in the large.

Copyright 1996, J A Zimmer

These programming tips are distributed to individuals by the copyright holder, J Adrian Zimmer. All other distribution (including but not limited to internal distribution within an organization and mirroring of any kind) is forbidden without written consent of the copyright holder. The meaning of the word "distribution" here includes any kind of copying for use by a different person or organization than the one who obtained the programming tip from copyright holder.

Context  Some Tips for Programmers    Author J Adrian Zimmer  
Dated: Nov 12, 1996 ; Revised: Oct 07 1998