Introduction

I've seen a lot of standards over the years. Some were simply a means of enforcing the particular style of their author, regardless of other considerations. Many standards are treated with near-religious reverence. By contrast, my programming standards have been evolving over time as I discover what truly makes for long-term maintainable code. Hence, my older code doesn't much resemble my current coding style. No doubt, my code now will not look like my code in the distant future - because I am always learning, adopting changes that make sense, and refining existing practices.

The purpose of these standards is to speed development and maintenance of software. These guidelines are mostly language and platform independent, but we will address unique situations as needed. Note that these are guidelines, not laws. There are times where it may make sense to purposely go contrary to these standards, but the purpose should always be to make the code easier to understand (and thus, to maintain). If the purpose of these standards could be distilled down to a single statement, it would be "assume that the next guy looking at your code is an idiot". And, yes, sometimes that idiot will be you.

Avoid the temptation to write "temporary" code. If the code is useful enough to address the current situation, it is likely that a more generalized implementation will have application to a wider array of solutions. An extra 10% investment now will yield significant savings in the future - and one should always strive to take the long view. Likewise, avoid the temptation to write "quick and dirty" code for supposedly temporary programs. Such programs have a way of becoming long-term. It is best to write clean code from the start than to go back and clean it up later. However, sometimes emergencies may require that the code be refactored later. Just be sure to remember to do so.

Consistency

There are numerous standards in use these days. Which one is used is largely unimportant, but adhering to it within a given source file is important. Mixing styles within a source file should be avoided. If the source file uses camel case for identifiers, that practice ought to be continued within that file.

Whitespace and Indentation

Horizontal whitespace (tabs and spaces) and vertical whitespace (blank lines) are visual cues to the person looking at the code. It is easier for humans to read and parse text that has whitespace than text that has little or none. Vertical white space is used to separate logical blocks of code. For instance, a series of statements at the start of a function, that validate function parameters should be visually separated from the following code that operates on those parameters. Occasionally, a comment line of dashes (or such) can be used to separate blocks of code, but that should be rare - such as indicating some temporary code.

Indentation indicates two different things: 1) continuation of statement from previous line, and 2) indicate nested code. Nested code should be indented at least 4 characters, but no more than 8. 2 character indentation is commonly used for block indentation, but two spaces is not enough to visually line up start/end of blocks when there are more than two or three levels. However 2 spaces is sufficient for line continuation.

Line length ought to be limited to 80 characters. Lines longer than this usually indicate complicated constructs that probably should be broken out across several lines to make the sub-constructs visually separate. This practice is especially important when dealing with complicated logical operations. This also allows comments to be affixed to individual comparisons. For instance:


if(((date >= target) && (x || y)) || (z < 0) || global_effects)

would be easier to understand if it were arranged as:


if(
    (
      (date >= target) && (x || y)
    )
    || (amount < 0) || global_effects
  )

or even:


if(
    (
      (date >= target) && (x || y) // Date is in range and either special condition
    )
    || 
    (amount < 0) // Refunds only
    || 
    global_effects // Apply in all cases
  )

If you find misleading indentation, this ought to be corrected where and when noticed.

Comments

Comments for a specific line should follow that line on the right. Comments that are flush with the current indent level apply to the following block of code. In some case, there isn't enough room for a meaningful comment at the right side, in which case the comment can be placed above the line at the indent level. However, some visual indicator ought to be used to distinguish between line and block comments. Thus, block comments should end with "..." to distingish them from line comments. Block comments should never be right of the block's outermost indent level.

Of course, comments should be useful. Comments that simply repeat the obvious statement are worse than useless. Such as:



order += 1; // Add one to order

A better approach would be no comment at all, or something that indicates the purpose of the line, such as:


order += 1; // Move to the next order

Constants and Literals

Literals ought not to be used, except in terms of assigning them to constants. The only exception to this is zero (0) or when there is a specific number of iterations of a loop. For instance:



for(i=1;i<=10;i++)

Use of any other numeric literal is an indication of code that is not easy to alter. Chances are that the literal value is used in multiple places, meaning changes made to one and missed on another will result in code that doesn't work after the change. When the constant is defined once, only a single change need to be made. Just be sure that the constant being used is unique to some specific thing and used that way. Nothing useful will come out of defining something like:



const one = 1; // Bad idea

Compound Statements

When more than one statement is included in a compound statement, the use of the symbols that delimit this situation can be positioned in many different ways. This standard defines that the starting symbol should be on the next line, indented to the same level as the statement it is part of. Likewise, the ending symbol should be at that same level. For instance:



if(amount==0)
{
    Log_null();
    break;
}

This makes it easy verify the matching braces. This construct should also be used for a single statement, where a compound statement is not required. For instance:



if(count==0)
{
    return;
}

The then portion of an if statement should never follow the if on the same line. Otherwise, it requires more visual work to see if the opening brace is at the end of the if line and all the following code is part of the if, or if there is only a single statement that follows the if. Yes, indentation should also indicate nesting, but I've lost track of the number of times I've found something that was misindented because the last programmer in the code was sloppy or lazy.

Identifiers

Identifier naming conventions are nearly religiously-held opinions in the development community. The standard here is to provide easy to read and understand identifiers. Thus, a function whose purpose is to do some processing should be named as a verb. For instance: "Calculate_royalties". Not "Royalties" or even "Do_royalties". The recommended case indicates clearly what the function is doing. "Royalties" doesn't tell us what is happening and certainly gives us no indication that some processing is going to happen. "Do_royalties" tells us that processing happens, but not what is happening (are we reporting on royalties? or deleting them? or calculating them?)

Likewise, variables and function identifiers ought to be nouns indicating that they are, or return. For instance, Amount is clearly the amount. Count() is clearly a count of something. If a function returns a value (noun) but also does other processing as a side-effect, one must decide which is the main point of the function and name it appropriately.

Prefixing the name with a type indicator is generally contrary to the point of identifier names being easy for humans to read and parse. Worse, indicating that a variable is a pointer with a "p" prefix is problematic in two ways: 1) it doesn't hide implementation details. From the point of most code, it shouldn't matter if we're dealing with a pointer or a string or an integer. Only the detail implementation code (which ought to be in a class) should even care what the variable is. 2) if the type of the variable changes (let's say from a pointer to an integer into a pointer to a float), the identifier name has to be changed globally. A minor change like this shouldn't result in such a large amount of busy work. That slows down maintenance. This isn't to say that prefixes should never be used, but their use should be limited.

Identifier case should try to match natural language for ease of reading purposes. When multiple words are part of the name, they should be separated by white space. Of course, you can actually use white space within a variable name, but you can simulate it with underscores (_). Since noun functions appear within a larger construct (such as if), they should start with lowercase. Processing functions should start with uppercase, as in the start of a statement. If an acronymn is included, it should be uppercase. For instance:


last_date_processed
current_customer_ID
Calculate_royalties

Since identifiers are intended to be read as text, avoid the use of single-character variable names and abbreviations. However, loop control variables and certain cases where a variable is required but not used, the variable name is not important. In these cases, names like "i", "loop", "dummy", and so forth are acceptable. However, one ought to consider if the loop variable name would result in easier-to-understand code if it were something that indicated what it was (such as "String_index").

OOP

Most code should be object-oriented. However, that is not to say that there aren't some cases where such an approach is overkill. For instance, a function that simply alters the case within a string can be a utility function rather than encapsulating it within a class. Another way to put this: use objects to accomplish a task in a way that provides a benefit over not using an object in that case - not just for the sake of using OOP.

Avoid using multiple inheritance. There are few instances of problems that cannot be resolved in a more straightforward way by reorganizing the class hierarchy, or using embedded objects. Besides making the code simpler, there is the issue of migration to other languages - not all of them support multiple inheritance fully (or at all).

Programming is a matter of managing complexity, and objects can help a lot. Always consider how they can be used for this purpose. When it makes sense to inherit or encapsulate, use an object. But don't use classes as nothing more than a form of namespace - such as a collection of related functions but which don't operate on instance data. Yes, there are some languages that require this, like C#. However, it should be avoided when possible. Just because some languages are badly designed doesn't mean you should carry that bad design into sources for other languages.

Nesting depth

Studies have shown that humans can hold about seven levels of nesting in their heads at a time. Thus, don't nest if statements more than seven levels. In fact, try not to even approach that many levels. If you find a need for that many levels, it indicates that some of the inner levels ought to be extracted to a function or class method. Of course, nesting depth issues also apply to nested function calls as well, but in properly designed code with nesting functions is easier to understand than deeply nested blocks.

Cross-platform compatibility

Avoid using constructs that are unique to a particular language (or only common to a few languages), as these make code much harder to convert to another platform. For example: Pascal sets. They don't exist in most other languages. Use bitmasks instead as those work nearly everywhere. However, this should not be construed as limiting oneself to the least-common denominator. For instance, not every language supports OOP. But that doesn't mean you should avoid classes. They are essential to modern programming. Fortunately, most migration is away from older languages that don't support modern features to ones that do.

C-style syntax is all the rage these days (used by C, C++, C#, PHP, Java, Javascript, Lua, etc.) If you write code in another language, try to match the case of C syntax (such as lowercase if), where allowed. When supported, use other syntactic artifacts that match C (for instance, using parentheses after an if statement or function, even if not needed).

Conditional Symbols

Conditional compilation is useful for generating code for different platforms or variations of an application. However, excessive use of conditionals make the code difficult to maintain. Sometimes it is unavoidable, but when possible all such variant code ought to be included within functions or classes so that conditionals are restricted to a few certain places rather than throughout the code.

Quality Assurance

Pay attention to warnings and hints from your tools. Use source verifiers (such as lint). Attempt to compile your source under another IDE/tool set. Some tools will catch issues that other tools will not. Once all reported issues have been fixed or verified as non-issues, proceed to testing. Always assume that variables are uninitialized and always validate inputs. Always be sure to properly escape strings destined for use in such things as SQL statements. Many of the bugs and security flaws found in commercial software these days are due to sheer sloppiness.

There are four levels of testing that need to occur once you think the code is complete. These should be done in the following order. Do not progress to the next one until testing is clean on the current step.

  1. Functional testing: check that basic operation of the code is what is expected. Be sure to test for border situations. What if the inputs are null? What if the inputs are invalid? Be especially mindful of code that handles things in chunks of bytes of a specific size. For instance, source that encodes 256 bytes at a time - how does it work if passed 255 bytes, or 257 bytes? Completing functional testing is what I call level 1 certification. This is the absolute minimum any programmer should aim for.
  2. Coverage testing: In most code, it is impossible to test all possible paths through the code since the number increases exponentially with the number of conditional situations. For instance, a program with a mere 32 if statements will have 4 billion possible paths of execution. In practice, the number is much smaller since many of the ifs are nested. Nevertheless, the sheer number of paths make an exhaustive test impractical. However, one means of accomplishing a test that is nearly as good as an exhaustive test are coverage tests. These involve tests that make sure that each line of code is executed at least once. This is the most difficult test to write since crafting inputs in a way that will reach certain lines my require some complex reverse engineering. There will always be a few odd cases that cannot be tested in-situ because they require situations outside of the control of the tester. For instance, handling a disk fault. Although code could be added to allow simulation of such cases, the additional code would only be useful during tests and could introduce bugs of its own. Such lines should be commented as UNTESTED. Completing coverage testing is what I call level 2 certification.
  3. Stress testing: Not all code needs to be stress tested. This is more of an issue with code that operates over and over on the same data set. For instance, a heap. Because the input is so complex, in terms of the permutations of the cascading effects of how data from one operation feeds into another operation, this cannot always be adequately tested using the above two testing steps. Such testing requires many iterations (hundreds of millions or billions) of randomly-chosen operations, which must be verified for proper operation. In turn, this verification may require keeping local data that can be used to verify the operations. In the instance of a heap, a local list of allocated and deallocated addresses would need to be maintained and used to validate that allocations of the heap didn't reallocate over areas that were already allocated. Further, all operations must be logged so that when an error is found, the process can be run from the start to recreate the precise situation. But always check the verification code first to make sure that the error isn't with the validation instead of the code being validated. Completing this is what I call level 3 certification.
    If you don't have a coverage testing tool, you can do it simply within an IDE: set a breakpoint at the start of every if and every else block. When you run the coverage test, if you hit a breakpoint, you can remove it. When you are done, remaining breakpoints indicate lines that have not been executed and you can work on coming up with a test for that situation. In cases where there is no else block, set the breakpoint on the if itself. When encountered, step over it and if it jumps past the block, you can remove the breakpoint, since that indicates the else condition and you should have had a breakpoint within the if block for the other case. Also be sure to set a breakpoint within each loop and at the start of the loop to catch the situations where the loop executes at least once and when it doesn't execute at all (loops are essentially if statements).
  4. Performance testing: This must always be the last step in testing. As a fellow developer once told me: "it is easier to make working code fast than to make fast code work". Only after correct operation of the code has been verified is it time to address performance. Most programmers attack performance too early in the development process. It is easy to make code work fast if you don't need it to be accurate. Many times I've looked at some high performance code to realize that it had serious bugs that, once fixed, made the code much slower. If proper operation had been verified using the above steps, the poor performance would have been immediately obvious and an entirely different approach could have been chosen.

This may sound strange, but part of being a successful developer lies in being lazy. One merely has to optimize for global laziness rather than local laziness. That is, consider how to reduce the amount of work needed to maintain the code over time even though this may require more work at the start. All the work you do ought to be thought of as an investment in the future. A key example of this is the writing of regression tests. Think about it: you already spent the time to consider how to do functional and coverage testing, why not write that down in code - a routine that calls the code with the specific inputs necessary to carry out those tests? Then, in the future, you can run that routine again to verify proper operation (this is called regression testing). Note however, that changes to the control flow (such as the addition of another if) will require that you determine and implement the tests necessary to ensure complete coverage testing. Once a new conditional execution path has been added to the code, it potentially invalidates all previous coverage testing.

Migrations

Migrations - or other mass changes - to large code bases, or even small ones, can be accomplished in a straight-forward fashion (if somewhat slowly) by adhering to the following guidelines. Not doing so will likely result in a complete failure of the effort. The biggest mistake people make in migrations is an attempt to do it all at once. Instead, it should be done in a stepwise/evolutionary fashion reminiscent of the agile programming method. One need not release the changes to the public at each step, but one of the important points is that one CAN do so, if necessary. Otherwise, other changes to the old code base will require forking the sources with messy and error-prone reconciliations in the end (after all, the new code won't much resemble the old code, so merging changes is going to be an order of magnitude harder). Besides, if you can't develop a working program with step-wise refinement, how can you hope to develop a working program after making ALL of the changes in one big step?

Large codebases don't usually lend themselves to gradual migration to the new platform. For instance, converting Miscrosoft Word doesn't mean you migrate the UI, and then migrate the means of data access, and then migrate the help system, and so forth. You won't have a working product until all pieces are migrated. It is better to change as much on the current platform as you can, then switch platforms with a constrained set of changes.

The precise steps will depend upon the nature of the migration, but the point is to make an incremental change, test and repair bugs, and then move to the next iteration. This limits introduced bugs to a constrained set of code that will be more quickly corrected. If the entire code base is changed before testing can even begin, the bugs will cascade between the various changes that were made, leaving a code base in such disrepair that fixing it may never be able to happen. There are some general kinds of changes that should happen independently of each other. For instance, code enhancements/bug fixes, translations to new programming languages, translation to new hardware/software platforms, code cleanup/refactoring, and UI changes. Here are some simple examples:

Migrating to new Operating System
Assume that you are migrating your code to run on a new operating system. You need to identify the touchpoints between your code and the underlying operating system (at least, those that require changes). Rather than attempting to convert all of the code to the new operating system, first make changes in the existing code base to isolate it from ANY operating system. This can be done via a class or unit that is O/S-specific and can be changed without touching any of the rest of your code. The first version of this class will be for the current operating system. Next change your existing code to use only this new class for all O/S-specific operations. Now test your program on your current O/S. In theory, no program behavior should have changed, but you've now limited all potential after-migration bugs to a single place. Now write a version of the class for the new O/S. If, after migrating to the new O/S, something breaks, it is probably in that one class rather than spread throughout the rest of your code.

What about incompatibilities between the two operating systems? First, you only have to worry about those which directly affect your code. Second, if your code relies on some feature that is not in the target O/S, you will have to come up with an alterative approach anyway - so see about implementing that alternative approach on your current O/S (although this qualifies as a whole different step in the migration process). When done with this step, you will have a program that still operates on the current platform, but is ready to work on the new one. Any other maintenance to the program can continune without the need for later source reconciliation. Note: to avoid these dependencies from creeping back in again, remove all linkages to the current O/S platform (specific classes, includes, etc.) from everywhere except your new interface class. This will also help you make sure that you have caught all instances of that coupling by doing a build (without the linkages, you will get compile or link errors in those places where you still directly reference the O/S). Make thorougly sure the program operates correctly after this change. When you are sure, you have completed an important step along the migration path and can proceed with the next one. At no point should you be trying to do more than one step at a time - and you should always make sure to test before starting the next step - yet another good reason for regression tests.

Not only is the decoupling and modularization a good idea in general, but it allows you to use the same source code for both platforms, prepares you for possible other platforms in the future, and it helps characterize your program better because you can clearly see exactly how and where it interacts with any platform. In fact, this approach ought to be used even if you aren't planning to migrate your code. But it is absolutely essential for a smooth migration.

Migrating to a new data source
Consider what you are addressing rather than the specific's of how you are addressing it. Then write a generalized interface that you can use and hide the implementation details inside that. Gather all such functionality into one place so that migration requires changes only to that one place. If you simply need to read information from a new place (such as from a database instead of from a file), create a data interface class that will provide the information you need. Initially, this should be a wrapper to the file access so you can verify operation in the current environment. Then convert the interface to use the database instead of a file. The interface to the code should not change at all, nor should the behavior, even if the implementation is completely different.

Other considerations

In this day and age, it should not be necessary to warn people away from the use of the goto statement. It hides structure, which hides bugs. This is not to say that it should never be used. It almost never should be used. In all of the code in my personal subroutine library, there is one single goto statement. And that is only there because it truly was the clearer way to code things than to go through code contortions to avoid its use. But the fact that there is but one goto in something like 100,000 lines of code should illustrate the extreme rareness of the use of such a construct. We ignore, of course, assembly code - which must use the assembler equivalents of goto.