Managing multi-file C/C++ project


The key to better software engineering is to focus away from developing
monolithic applications that do only one job, and focus on developing libraries.
One way to think of libraries is as a program with multiple entry
points. Every library you write becomes a legacy that you can pass on to
other developers. Just like in mathematics you develop little theorems and
use the little theorems to hide the complexity in proving bigger theorems,
in software engineering you develop libraries to take care of low-level details
once and for all so that they are out of the way everytime you make a
different implementation for a variation of the problem.

On a higher level you still don’t create just one application. You create
many little applications that work together. The centralized all-in-one approach
in my experience is far less flexible than the decentralized approach
in which a set of applications work together as a team to accomplish the
goal. In fact this is the fundamental principle behind the design of the Unix
operating system. Of course, it is still important to glue together the various
components to do the job. This you can do either with scripting or with
actually building a suite of specialized monolithic applications derived from
the underlying tools.



The name of the game is like this: Break down the program to parts.
And the parts to smaller parts, until you get down to simple subproblems
that can be easily tested, and from which you can construct variations of the
original problem. Implement each one of these as a library, write test code
for each library and make sure that the library works. It is very important
for your library to have a complete test suite, a collection of programs that
are supposed to run silently and return normally (exit(0);) if they execute
successfully, and return abnormally (assert(false); exit(1);) if they fail. The
purpose of the test suite is to detect bugs in the library, and to convince
you, the developer, that the library works. The best time to write a test
program is as soon as it is possible! Don’t be lazy. Don’t just keep throwing
in code after code after code. The minute there is enough new code in there
to put together some kind of test program, just do it! I can not emphasize
that enough. When you write new code you have the illusion that you are
producing work, only to find out tomorrow that you need an entire week to
debug it. As a rule, internalize the reality that you know you have produced
new work everytime you write a working test program for the new features,
and not a minute before. Another time when you should definetly write a
test suite is when you find a bug while ordinarily using the library. Then,
before you even fix the bug, write a test program that detects the bug.
Then go fix it. This way, as you add new features to your libraries you have
insurance that they won’t reawaken old bugs.

Please keep documentation up to date as you go. The best time to write
documentation is right after you get a few new test programs working. You
might feel that you are too busy to write documentation, but the truth of
the matter is that you will always be too busy. After long hours debugging
these seg faults, think of it as a celebration of triumph to fire up the editor
and document your brand-spanking new cool features.

Please make sure that computational code is completely seperated from
I/O code so that someone else can reuse your computational code without
being forced to also follow your I/O model. Then write programs that
invoke your collection of libraries to solve various problems. By dividing
and conquering the problem library by library with a test suite for each
step along the way, you can write good and robust code. Also, if you are
developing numerical software, please don’t expect that other users of your
code will be getting a high while entering data for your input files. Instead
write an interactive utility that will allow users to configure input files in a
user friendly way. Granted, this is too much work in Fortran. Then again,
you do know more powerful languages, don’t you?

Examples of useful libraries are things like linear algebra libraries, general
ODE solvers, interpolation algorithms, and so on. As a result you end
up with two packages. A package of libraries complete with a test suite, and
a package of applications that invoke the libraries. The package of libraries
is well-tested code that can be passed down to future developers. It is code
that won’t have to be rewritten if it’s treated with respect. The package
of applications is something that each developer will probably rewrite since
different people will probably want to solve different problems. The effect of
having a package of libraries is that C++ is elevated to a Very High Level
Language that’s closer to the problems you are solving. In fact a good rule
of thumb is to make the libraries sufficiently sophisticated so that each executable
that you produce can be expressed in one source file. All this may
sound like common sense, but you will be surprised at how many scientific
developers maintain just one does-everything-program that they perpetually
hack until it becomes impossible to maintain. And then you will be even
more surprised when you find that some professors don’t understand why a
”simple mathematical modification” of someone else’s code is taking you so
long.

Every library must have its own directory and Makefile. So a library
package will have many subdirectories, each directory being one library.
And perhaps if you have too many of them, you might want to group them
even further down. Then, there’s the applications. If you’ve done everything
right, there should be enough stuff in your libraries to enable you to have
one source file per application. Which means that all the source files can
probably go down under the same directory.

Very often you will come to a situation where there’s something that
your libraries to-date can’t do, so you implement it and stick it along in
your source file for the application. If you find yourself cut and pasting that
implementation to other source files, then this means that you have to put
this in a library somewhere. And if it doesn’t belong to any library you’ve
written so far, maybe to a new library. When you are in a deadline crunch,
there’s a tendency not to do this since it’s easier to cut and paste. The
problem is that if you don’t take action right then, eventually your code will
degenerate to a hard-to-use mess. Keeping the entropy down is something
that must be done on a daily basis.

No comments:

Post a Comment