HomeThoughts and MusingsThomas M. Tuerke on Design • Coding in Style
 

Coding in Style

design: /di·'zin/

   n a deliberate plan for the creation or development of an object. vt: to create something according to plan.
   good design: /'gud —/ the product of deliberate forethought and careful understanding of the purpose of a subject, resulting in a subject which significantly improves its utility, allowing it to integrate seamlessly and naturally into the role for which it is intended.
false synonyms: fashion, decor.


Table of Contents [show/hide]
 
Coding in Style

Code serves one purpose: to communicate intent. It does so to two audiences: the compiler that will reduce the code to machine instructions, and the hapless souls who will have to look at your code in the future... namely test developers, maintenence engineers, and the hiring manager for your next job. So good code is code that clearly communicates intent, both to the computer (so you don't introduce bugs) and to future humans (so they don't, either.)

The problem is, most coding standards don't attempt to improve the reliability of the code (except by the vague definition of "improving readability"—an attribute I call into question.) No, most coding standards are sorry things listing a bunch of superficial cosmetic issues like how deeply to indent, and where to place braces and parentheses. I almost never see a standard that says what to do, and follows it with an explanation why it is a Good Thing that such a practice is done.

In fact, I would argue that most coding standards—being purely cosmetic in nature—result in worse code rather than better, simply because eyes become lazy by seeing the same stuff all the time. And productivity often suffers, too. If all developers are ever exposed to is code written to one particular style, those developers will be completely out of their element when looking at a new piece of code, and let's face it: it will happen.

Imagine that you've managed to get all the developers on your team goose-stepping to one particular set of standards. Is that goodness?

Probably not.

  • Can you share code with other teams, "borrowing" something they've already written and tested? If you do, will you risk breaking that code (sending you back to square one testing-wise) by renaming all those identifiers and altering the whitespace and curly-brace positions?
  • What about Open Source? Third Party software? Will you rewrite that to "conform" to "standards" or will you leave it as it is? You know you're eventually going to have to look in that chunk of code. Have you budgeted for the additional testing effort, realizing that you've invalidated a significant chunk of the testing that has already been done on that code.

Unless and until the One True Style (which each developer assumes is the style they happen to use) is universally adopted, there will be a multitude of styles. Some folks will insist that identifiers follow lower_case_and_underscore convention, others ProperCase, others still camelCase, and this whole contingent over here will insist on piHungarianNotation. Some standards mix liberally, one convention for member variables, another for methods and functions, another still for class names. Microsoft has an elaborate naming scheme for its APIs, which it recommends. And then there's the bunch of developers who bristle at that: "yuck. it's too much like Microsoft's."

Let's face it: we're not going to agree on silly minutua, and quite frankly, I don't think we need to.

That said, I think there's a lot more to coding style than following a bunch of superficial regulations. I think that using a particular indentation style simply because founder Joe Codewonk said so is not a good enough reason, unless there is a clear material (read: not hazy, ambiguous, hand-wavy) benefit. Every single element of a shop coding standard should stand up to scrutiny. If it doesn't, it shouldn't be part of the standard.

Does each rule:

  • Catch specific kinds of coding errors in the compiler? (list which)
  • Catch specific kinds of logic errors at runtime? (list how)
  • Communicate intent to the subsequent reader? (describe how)
  • Make debugging and maintenance easier? (list how)
  • Make the code easier to reuse with a minimum of fuss? (list how)

Yes: note that each rule is required to justify its existence by listing how it contributes to the greater good of the code—and not just by allusions to "improved readability". And yes, note that by requiring to justify its existence, periodic review is sometimes necessary. Certain coding stipulations may be mandated by shortcomings of the environment, (I'm thinking "yoda notation" to overcome C-style languages' misuse of assignment for equality operators, or certain limitations in an editor or other development environment.)

As long as we're quibbling over capitalization, spaces, and curly braces, we're missing the point of coding standards. But once we get into styles of coding that transcend the cosmetic and materially result in better code, then we'll really be coding in style.


Replies: 6
One possible Coding Standard
- Thomas M. Tuerke

This is a list of items that would plausibly belong in a coding standard. Note the absence of cosmetic trivialities such as bracing style or indentation depth, conventions that seem to favor consistency over communicative options, and (to my mind) makes as much sense as dictating that everyone will henceforth speak and write in Iambic Pentameter, for the "consistent meter" that style imparts.

Remember that code exists to efficiently and effectively communicate intent to machine and human, and the coding standard should exist to facilitate that better code: code that effectively solves the problem at hand, while containing fewer defects, and less likely to have defects introduced into it.

The standard should cover improving the substance, not merely the form, of the code. As such, whether one uses Allman, K&R, KNF, Whitesmiths, or some other scheme, or the manner in which one indents, should be secondary, even irrelevant, to the extent that it continues to reinforce the intent that is being communicated.

As such, it should not merely serve the vanity of the standards writer, nor impose undue restrictions on the creative process. Coding is as much an artistic expression as it is a rigorous description of a problem's solution, and some of the cleverest (and effective) solutions I've seen came from moments of somebody's unconstrained inspiration: solutions that might not have been possible under some more oppressive coding styles.

Furthermore, allowing the developer some freedom of expression can result in code that is a joy to read (without necessarily compromising on the primary communicative intent,) making it is easier to work with all through that code's life cycle.

Personally speaking, I'm not a big fan of superficial coding "style" standards, as this seldom does anything more than make the reading eye lazy to the point that one is at a disadvantage in reading code not conforming the same arbitrary style.

That said, what might a coding standard contain? Here are a few thoughts, based on my own experience. It's not meant to be exhaustive (nor even uncontroversial,) and individual items may be re-examined as time goes on, and technologies develop.

The Golden Rules

  • Code to the Spirit of the Standard. Don't let the letter of the standard be an excuse for not making the best code possible. A good reason (one with tangible benefits) for doing something should trump any element of the standard (but be sure to document why it's being done.) A dogmatic approach to adherence, in the face of obvious benefit to clarity by some variation, is not to be admired.
  • Don't make coding standards—or the code they're meant to produce—an intelligence test. Instead, make them an exercise in common sense. Simplicity is golden. To the extent possible that its practice can be applied to more than one coding environment, goodness results.

In General

  • Code intent clearly. Given two (or more) ways to express something, pick the one that communicates intent most clearly, most obviously, most intuitively to the human reader... (unless the intent is obfuscation, which short of "security" applications, seldom is the case.)
    - For example, given int newCapacity = capacity << 1 or int newCapacity = capacity * 2, (that is to say, a numeric operation, not a bit-shifting one) the latter can generally be assumed to be clearer. Numerically, both operations produce the same result, and in fact an optimizing compiler may convert the multiplication to a shift if it deems the latter to be more efficient in terms of CPU cycles. There's no need to do the compiler's job for it at the expense of reader clarity, unless you're working with a very bad compiler.
  • Comment intent clearly. Make comments meaningful to a user that has never seen the code before.
  • Use the most semantically meaningful type possible.
    - for example, don't use int in place of bool, an enum, or pointer. The more interchangeable a type is, the more likely it is to be interchanged with other types... accidentally.
    - Even though NULL is defined to be 0, use NULL instead of 0 in the context of pointers, (Why: the semantic value.)
    - Similarly, I've seen functions where a return of 0 means OK or Success, while for others, it means failure... often within the same API, necessitating a jump to the documentation or function source to resolve. The meaning of the return code should be self-evident in the calling code, to not waste the reader's time, by using an enum return type, or a bool return type, and not a generic int.
  • Minimize "magic numbers". Any number other than 0 and 1 should be replaced by named constants that describe the semantic value of the number. Even for 0 and 1, if there's any special significance to the number, a constant should be used.
    - If the same (numeric) value is used in two different places, the relationship between those two places should be established or broken by using the same or distinct constants or functions: modifying such a value should use a single point of contact (the definition of the constant or function) and not several tenuously related instances of use.
  • Minimize the distance between "{" and "}".
    - Functions or other code blocks with too much code are candidates for factoring; they're as easy to read, understand, and maintain, as the old run-on-sentences in English, and are as undesirable, too, because the just go on and on and on without any clear sense of why everything is lumped together and really, really, really should be clarified. Brevity is the soul of much more than just wit.
  • Same Form implies Same Function. This common design tenet applies equally well to code as other things.
    - Overloads of functions should be semantically similar (they should do the same or similar things.)
    - Methods in different classes but with the same name should perform the same logical operation.
    - Similarly, overloaded operators should not stray too far from the basic, unoverloaded meaning. For example, operator + suggests addition, so it is not too much of a stretch to understand string + string is concatenation, an additive operation on strings; this understanding is further supported by its use in a variety of languages and class libraries, including BASIC (other than VB,) JavaScript, and a host of string classes. However, <<, understood to be a shift operator (shifting bits to the left) is somewhat obtusely implemented as a means of writing to a stream.
  • Don't over-use language-specific idioms, unless they materially improve the code (either for an essential performance consideration, or to prevent a kind of bug that code might otherwise be vulnerable to.)
    - Idioms just to be cool should be avoided.
    - If you do use them, then depending on how obscure a specific idiom is, document its use.
    - Again, the << operator in C++ is somewhat idiomatic. It seems to be frequently used in simple console applications and in textbooks, but I've seldom seen it used to any great extent in a production application. Experience suggests that, in spite of it being a clever idiom, it does not lend itself to globalization, since it codifies the order of individual elements being pushed to the receiving stream in the same way that the concatenation operator does, and that's a big internationalization faux pas.
  • Code should be as self-describing as possible. Various IDE-hosted accessories (such as an object browser) are nice, but aren't always available (using a colleague's editor, or a print out, for example.)
    - For this reason, Hungarian Notation is generally a Good Thing.
    - Code should be clean and easy for a newcomer to read (meaning anyone new to the team and unfamiliar with the codebase, be they novice or advanced). It should not "read like a murder mystery" requiring one to dig deeply to sleuth out intent.
  • Wrap long lines at logical places; most people are used to scrolling code up and down, but less so left and right. While the 80x24-line editors of yore are pretty much history—and editors can handily present longer lines—the capability is not ubiquitous. A hardcopy listing of source code may wrap awkwardly, and a colleague's editor may not be configured to handle such wide lines. The risk of missing something important because it's off-screen to the right is greater when lines are long, and that risks the introduction of bugs.
    - Method declarations: place each set of related args on their own line. For example, x,y pairs or x,y,z triplets, or buffer,length pairs that might logically be treated as one virtual parameter, and kept together on the same line; a one-line comment is useful to describe the parameter(s).
    - In code, wrap long actual-argument lists in similar fashion, grouping logically related parameters on the same line, if possible. (Ideally, subsequent parameters should line up with the function's open parens.)
    - "Paragraph Wrapping" is generally not a good thing, resulting in dense code that isn't necessarily easy to read, and should be avoided. It may be easy for a pretty-printer to generate, but the visual arrangemment of code doesn't necessarily help communicate much.
    - Illustrate parallelism by lining things up on consecutive lines. On more than one occasion, I've fixed a bug discovered by taking paragraph-style code and indenting it to line up corresponding elements: use the power of whitespace to draw attention to what should be the same and what should be different.
    - A useful notational idiom from Lisp is to not surround parentheses that open and close on the same line with blanks, but put a space before a closing paren that matches an open paren not on the same line as the previous closing paren's matching open.
  • Use whitespace to support communicating intended "affinity" or relationship. Desired operator precedence, for example, can be communicated using whitespace, as can related statements, methods, classes, etc. For inline use, no whitespace communicates high affinity, space communicates another (lower) level of affinity, and a line break communicates the lowest level of affinity.
  • "Tweet" your code! Consise but clear code is usually the most effective. If your algorithm is too ponderous, examine it to see if there's a simpler, cleaner way to to express it (without compromising the primary goals of effective communicating the algorithm to machine and human.)
    - For example, in coding a conditional expression, don't unnecessarily complicate an already boolean sub-expression with "== true"
    - An if/then/else expression that operates on the same variable in both cases might legitemately be "factored out" to a ternary operator. For example:
     if(condition)
      val = a;
     else
      val = b;
    could communicate more clearly the commonality of the alternatives by factoring out the assignment into
     val = (condition)? a : b;
  • Code the main-line case with a minimum of indentation when possible. That is to say, handle the trouble cases early and get them out of the way so that you have a point in your code beyond which is "smooth sailing."
    - Some antiquated standards hold that there should only be one return in a functional body, but this is just an artifact of developing an environment that doesn't support RAII, when it is too easy to forget to release resources. Modern day throw/catch scenarios similarly represent multiple points of exit; it would be impractical to suggest that your code should never throw exceptions. If you can neatly clean up resources, then multiple points of exit are not a bad thing.
    - Naturally, all this is made much easier if you have RAII-based tools (like a file class that closes in the destructor, etc.)
    - Deep indentation is an indicator of the need to factor code, since it's probably quite long to be so deeply indented; moreover it's too easy to have some important part of the logic be "off screen" and missed.
    - Similarly, long bodies of code are an indicator of the need to factor code.
  • Exceptions versus Result Codes. When is a result code better than an exception? When that code can be ignored without harm. Otherwise, an exception is better when what is being communicated out really should not be ignored.
  • Use a switch instead of an if when the conditional logic is, conceptually, a non-boolean scalar (such as an enum, an int, or a character) even if there are only two paths possible today unless there's no possibility of that ever changing.
    - For example, with an enum has two states, it is tempting to simply write an if condition to test for one of the enum values, and handle the other in the else statement. However, if the enum ever gets a third value, all those if statements handling the flow of logic will have to be rewritten, either to a cascading if-then-else-if flow, or to a switch statement. If it's already expressed as a switch, the code changes (and subsequent revalidation testing) are minimal, and the logic is easier to understand.
    - Of course, to the extent that you can have the vtable make appropriate decisions (derivations with virtual methods, instead of conditional logic in your code) that's even better, but under some circumstances, decision logic is unavoidable.
  • Code to compile cleanly (no warnings) but don't silently suppress problems just to get there. (Sweeping the problem under the rug doesn't solve the problem.) The intent is to write the best code possible by (a) taking the advice of the compiler when appropriate, and (b) get a minimum of noise from the build, so errors can be more easily detected.
    - For example, its usually quite easy to get rid of "unused references" types of warnings with minimal risk or impact to the code. Some kinds of warnings, such as signed/unsigned mismatches may require careful handling to resolve, but don't just hastily cast them away as you may be masking a problem by doing so.
  • Empty catch blocks are evil and must not exist in the code.
    - However, a comment describing in reasonable detail why the exception is expected and/or innocuous, placed within the block shall make it not empty.
    - On the other hand, empty finally blocks are merely pointless. If the IDE inserts a complete try/catch/finally block into the code, remove what you won't use (for clarity and brevity.)
    - Similarly, catch blocks that merely re-throw the exception without doing anything else accomplish nothing but obscure the exception's origin. Remove them, and if necessary (because they're the only catch, and there's no finally) remove the try. Remember: if you're not going to materially handle an exception at a given level, don't incur the overhead of a try/catch construct. It is not necessary to handle every catch at every level; there's nothing wrong with letting an outer context handle the exception.
  • Throw Meaningful Exceptions. The value of the throw/catch paradigm is that different types of exceptions can be handled differently by surrounding code (and even if not, the types themselves communicate meaningful information: ArgumentNullException, for example, tells you a lot, as does FileNotFoundException.) If all you ever throw is a generic Exception type, you prevent calling code from acting intelligently. It might, for example, want to catch a FileNotFoundException in order to try a secondary strategy. Relying on the contents of a message string is not safe coding, and is not globalization friendly, either.
  • Don't write "schizophrenic" code. Each function, method, etc. should have only one clear purpose, and one very clear means of determining the outcome.
    - That purpose should be made self-evident by its name.
    - For example, a method that does one of two or more distinctly unrelated things in-line in its logic flow is better served by two or more functions that each do its one thing and nothing else.
    - Similarly, a method should have exactly one self-evident means of indicating failure. For example, don't return a bool success-or-failure and an outbound parameter that will contain the error code (pick one or the other.) The contract for using the method should be unambiguous about what other outputs are reliably available or unavailable when the call ends either successfully or unsuccessfully. (It should not be necessary to examine the error condition and verify that other output parameters are intact before proceeding.)
  • Similarly, Don't create "hundred-handed" functions (a.k.a. swiss-army-knife functions). These are do-it-all functions with many arguments, each of which slightly influence the outcome of the code in perhaps bizarre and unexpected ways. It's better to factor these beasts into smaller single-purpose functions, allowing the caller to pick the intended behavior by function rather than by argument.
  • Codify your Units when naming identifiers. If something is measuring dollars, inches, kilobytes, or cubits, make that part of the name if there's even the remotest chance of abiguity. This is especially true for length (number of elements) vs size (the memory requirements) and doubly true for strings: a character is not always a byte; if you're indexing, you probably want to index by character, but if you're allocating memory, you probably want bytes. Also be aware of count (the actual number of a collection) versus capacity (the total count before overflowing.)
    - Similarly, be specific about what the identifier represents. For example, if it's the id for a Quux, then don't just call it a quux, but idToQuux or something like that, especially if later on you'll be exchanging the id for the thing itself (which is probably what you want to name quux. By the same token, it's useful distinguish file names from files themselves, and even their contents, and the name should reflect this: file = open(fileName), fileContents = read(file), and so on.

Process

  • All code must be reviewed prior to submission. It's been proven many times: ten minutes now can save ten or more hours later.
  • Identify code fixes with comments, when necessary.
    - This helps the future "archeologist" that will benefit from your reasoning and insight.
    - If it's a hack (say, an 11th-hour fix that should be revisted soon), say so, to ensure its (re)discovery.
  • Do not use the trunk as a sandbox. Submit experimental code into sandbox folders away from production code.
    - If your revision control supports branching, that's probably the way to go, otherwise
    - Use a code-overlay strategy if you don't want to duplicate large portions of the trunk (user checks out the trunk, then checks out the sandbox files on top of that.)

C/C++

  • Pointer declarations should be their own statement.
    - Don't mix pointer declarations with non-pointer declarations in the same statement, as in /int *i1,i2;/ (Why: those unfamiliar with C/C++ often confuse the fact that i2 is not a pointer to an int.)
  • Use const when possible
    - const args
    - const methods
  • Treat char * as suspect.
  • Use typedef liberally to document semantic "derivations" of scalar types. Even though this is not as strong as a class or struct, it does communicate the different intended roles these type definitions play; one is less likely to mistakenly use one typedef for another.
    - By the same token, it is often useful to use typedefs to do away with "decorated" variants of types. Having distinct types for const x, pointer to x, reference to x, etc. helps avoid the problem above, of mixing pointer and non-pointer declarations. (It's also proven useful in migrating classic C++ to C++/CLI, where the decoration changes.)
    - Also, use typedef to meaningfully describe (and do away with the prolix caused by) complicated templated types. Can the type map<string, map<int, string> > communicate anything more meaningful as to its purpose or use?
  • Minimize the number of #includes in your public include file. A public include file should only #include those headers absolutely necessary to compile the contents of that header file. If possible, have the implementation in the cpp file include those headers necessary for implementation. In this spirit, since the public header requires describing private elements as well, try to design the class to leverage forward declarations for these private members. In short, put an #include in your cpp file if you can, and only in your public header if you must.
  • When Hoisting Exit Conditions in for loops, if you're going to hoist, consider putting the hoisted value in the first (initialization) part of the for loop, along with the initialization (and preferably declaration) of the loop variable; this declares the scope of the hoisted value, and ensures it is the same type as the loop variable. To wit:
    for(int i=0, count=thingie.GetCount(); i<count; i++)
        DoSomethingWith(thingie[i]);

Design

  • Avoid argument types of boolean, unless the meaning of true or false is self-evident in the calling function's context; for example EnableWindow(true) is probably Okay, but DoSomething(...,true,...) doesn't really describe what the true is communicating. An enum type, even if it consists of only two items, is usually more descriptive.
    - This is particularly true if several boolean args are used, since a long list of true's and false's don't really self-document the calling code.
  • When the above isn't observed, the caller can compensate by inserting descriptive comments within the parameter list of the calling function, as in
    DoSomething(..., /* refresh = */true, ...)
  • Name things well. What you name something does make a big difference.
    - Methods should generally take the form "verb" or "verb adverb", along with whatever (linguistic) object is appropriate.
    - Properties should generally be adjectives (when representing state) or nouns (when representing composite members.)
    - In general, avoid "amphiboles"—in this case, words that have many different parts of speech—because of the ambiguity it produces. For example, "empty" is both an adjective and a noun, so it's not implicitly clear whether "empty()" is an inquiry as to whether something is empty, or an imperative that causes something to become empty (adjectives/predicates might reasonably be indicated by an "is" prefix.) Similarly, the term last is ambiguous: should it indicate previous or final?
    - Follow common naming conventions, and be aware of linguistic nuances. For example, Delete() typically implies that something is destroyed by the operation, while Remove() typically implies that the subject continues to exist, but is no longer associated with some other object (say, a collection.) Similarly, open typically implies a related close, and attach a related detach, etc.
    - In that vein, use well established antonyms for opposing operations: open vs close, enable vs disable, etc. On the matter of begin and start pick one and stick with it, as appropriate, though there's a strong argument for associating begin with end and start with stop.

Various Metrics of "Goodness"

Goodness is the thing that:

  • Achieves the desired objective more effectively (efficiency in implementation, and/or conciseness and clarity of expression of algorithm.)
  • Communicates intent more effectively (not just by rote conformity, but by active and passive communicative devices.)
  • Makes code easier to use, harder to abuse.
  • Makes maintenance easier, in that it is easier to introduce the intended (features) while harder to introduce the unintended (bugs).

Notes:

  • Consistency is reasonable, but not paramount. Where a "higher goodness" is achieved by inconsistency, let it be so.
  • Conformity for its own sake is not to be admired.

Replies: 2
Anti-Standards
- Thomas M. Tuerke

Here are some guidelines of things to not do:

Exceptions

  • Don't catch and ignore. Do something in every catch, or remove that catch. A comment that describes why it is acceptable to ignore the exception is doing something, but the catch should only be for a specific exception type, not a general catch-all by which unanticipated exceptions would be silently lost.
  • Don't catch and throw. Do eliminate all catch(Something ex) { throw ex; } unless you plan on doing something else in the catch handler other than just throwing. You don't need to clutter up code, or hamper performance or debugging with this useless construct. It is acceptable to delegate unhandled exceptions to some outer context that is prepared to handle it
    (discussion: documentation value....That's fallible; it's not a checked system)
  • Don't throw Exception (the base type.) Do throw the most specific type of exception.

Initialization

  • Don't Declare Far from Use. Do declare as near as possible to first use. C used to require declarations to be at the top of a scope. C++, C#, and many other languages allow in-situ declarations, which is much better. Do Declare in the smallest scope possible.
  • Don't Double-Initialize. Do initialize to null (for pointer or reference types) when you must declare a variable some distance before the first use (because the declaration happens at a different scope than first use.) Needless initialization of a dummy, throw-away type consumes time, memory, and maybe other resources during the constructor call, only to be thrown away without being used. It also masks inappropriate use before the first proper initialization, by silently allowing that inappropriate use to succeed when it should have failed, alerting the developer to the problem.


Coding the Main-line case—Untangling Logic Flow
- Thomas M. Tuerke

Coding the main-line case with a minimum of indentation takes a bit of discipline, but results in cleaner code. Indentation is frequently the result of nested blocks brought about by control structures (loops, conditions, etc.) and makes for more complicated code. Taking the time to minimize indentation often results in code that has a lower complexity, and that results in better (more readable, more reliable) code.

Here's a very simple example (taken from real code) illustrating Tangled Logic Flow. The lines of code alternate (needlessly) between normal path and exception path.

{
  FILE* fp = fopen(fileName, ...);
  if(fp != NULL)
  {
    // ...
    // Do a bunch of processing
    // ...

    fclose(fp);
  }
  else
    throw invalid_file_error();

  // Do More Processing
  // ...
  // ...
}

This can generally be untangled as follows:

{
  FILE* fp = fopen(fileName, ...);
  if(fp == NULL)
    throw invalid_file_error();

  // ...
  // Do a bunch of processing
  // ...

  fclose(fp);

  // Do More Processing
  // ...
  // ...
}

Given a piece of code's functional requirements, there is, of course, a minimum complexity possible—some times you just can't make the code any simpler—but all too often, it's easy to make the code needlessly more complex than necessary.



Improved Readability
- Thomas M. Tuerke

Much of what is accepted as "improved readability" is really just laziness.

I'm currently involved in a project where I've collaborated with three separate groups, each having its own coding style. In fact, in the past twelve months I've had to delve into areas of code that adhered to about five or six different coding styles.

In no instance did the names of things, or indentation, or bracing improve or detract from readability. Admittedly, they were different but one was not inherently more "readable" than any other. This gives lie to the claim that these cosmetics improve readability. They don't.

What did improve readability was the liberal use of comments, describing what the intent of the code was. (Some pretty sorry comments really just mimicked the code in human terms: that was a waste.) Then there were vast wastelands where there wasn't a single comment visible on the screen.

What significantly decreased readability:

  • The extensive use of C++ #defines for complex (as in multi-statement) operations. Debugging through these was tedious.
  • The extensive use of templated types (and STL is a particularly bad offender here, in spite of its performance claims.)
  • Long, meandering functions that tried to do too many things. Some should have been factored into named sub-functions.
  • Overloaded methods that weren't logically related. A fundamental design principle is "same form => same function" so you would think that three overloads with the same name would do the same thing, but with three different kinds of arguments. Not so.


Yoda Notation and Safe Switching
- Thomas M. Tuerke

Here are some non-cosmetic practices.

C and C++ have this construct where assignment is just another operator, and can be mixed in with other operations. This is cool, but the source of trouble, considering that the assignment operator is the equal sign (=), and the equality operator is the double-equal sign (==). As such, you can easily use one when you meant the other, as in

 if(theSky = "blue") // ...

which probably isn't what you meant. Therefore, some coding standards mandate that you use what I refer to as yoda notation ("backwards it is, yes!") where you reverse the operands so that the compiler won't let you do an assignment.

 if("blue" = theSky) // ...

will simply fail to compile, not letting you get by with = when you meant ==. Now, I'll say I really don't care for yoda notation, but the fact remains: you can't mis-use the assignment operator with it, and that results in better code, so if you're using C or C++, it's probably a good idea to use. (C# thankfully doesn't suffer from that problem, so yoda notation isn't as important there... but if you jump between C++ and C#, then it may be a good habit to maintain.)

Another code rule states that the break in a switch's case statement should always appear outside of any curly-brace block, as in

 case foo:
 {
   // ...
 }
 break;

as opposed to

 case foo:
 {
  // ...
  break;
 }

The reason for this is subtle but important. As the code between the braces grows (which happens in production environments) you might find it necessary to apply more logic, possibly as a condition (don't laugh or roll your eyes, this actually happened...) and somebody may (that is, did) quickly solve the problem like this:

 case foo:
  if(someCondition)
  {
    // ...

As you can see, if the block ends with the break inside, the break suddenly is conditional (and if someCondition is false, the next case, if any, is executed instead. Yes, the maintenance programmer should have been more careful. Yes, the body of the case should probably be factored. Yes, object-oriented design says that switches are evil. But in spite of all that, they persist. So it makes sense to write code that is hard for somebody else to jeopardize. The original author allowed the error to happen as much as the later developer made it happen.


Replies: 2
More on Yoda Notation
- Thomas M. Tuerke

I mentioned above the practice of reversing the operands of the equality operator, as in

 if("blue" == theSky) ...

for the very real benefit of preventing the accidental use of the assignment operator instead.

However, I've noticed some cases where this practice is done to other relational comparisons, the various inequalities, such as

 if(NULL != ptr) ...

or even

 if(1 < listOfThings.Count()) ...

Now, I've said I don't like Yoda notation. It's really just a band-aid over the poor design (albeit one that most developers in the C family of languages have gotten used to.) But I still concede, given the circumstances, that for the equality operator, it's a necessity. We make concessions to readability because it catches real bugs.

I'm less convinced, though, that coding standards benefit from applying Yoda notation to anything but equality operations. Yes, it's consistent—important for those that do battle against those little hobgoblins of inconsistency—but as I've said, unless it specifically prevents bugs, I don't see the value. I'm not willing to lose readability in exchange for ... nothing, really. Ungoodness.

There are degrees of ungoodness, too. Yoda not-equals is just on the rim of purgatory as far as things go, since not-equals, !=, is commutative so a != b is the same as b != a. But other operators deserve to be put in a deeper circle of ungoodness, precisely because they are not commutative: a < b is not the same as b < a, so in order to "Yodafy" these, you actually have to use a different operator. Forgetting to do this is as likely to cause a bug as mistakenly using the assignment operator when you in fact meant to compare for equality. So, in the name of consistency, you actually run the risk of introducing more bugs. This is coding style evil.

And it's just plain harder to read.

Now, to those for whom inconsistency is so vital, I say that the inconsistency is actually useful. I don't think we should get used to seeing Yoda notation. It should remain that mental speedbump that it is, to draw attention to its use. Remember, we're trying to catch bugs.



Yet More on Yoda Notation—Java
- Thomas M. Tuerke

By the way, different languages have different idioms that fall into the category of Yoda Notation. For example, in Java, the following practice is not uncommon:

 if("blue".equals(theSky)) ...

as opposed to

if(theSky.equals("blue")) ...

Functionally they're equivalent, in that they both determine whether the string theSky contains (is equal to) "blue".

The difference is that if the variable theSky is null, the second example will fail with a null pointer exception, but the former example will simply return false. A slightly different condition being avoided, but both are solved by transposing commutative operands.



Coding in Style—How rigid should it be?
- Thomas M. Tuerke

At least one coding standard I've seen has rigid stipulations on the form of a switch statement, stating that each case label shall be on its own line, indented some amount, with code further indented beneath it, yada-yada.

It's akin to saying that writers can't use similes or metaphors in their writing style. Possible to do, but severely limiting their communicative options.

The problem is, while making every switch statement conform to a rigid style makes for highly consistent code (and, in fact, reducing it to something a machine without any intelligence can generate) it may lose some of its communicative power in the process.

For example, imagine this switch statement:

 switch(obj->type()) {
   case ID_LESTYPE_ADV:
     retVal = ADV;
     break;

   case ID_LESTYPE_INTERM:
     retVal = INTERM;
     break;

   case ID_LESTYPE_BASIC:
     retVal = BASIC;
     break;

   case ID_LESTYPE_BEGIN:
     retVal = BASIC;
     break;

   default:
     retVal = UNK;
 }

Notice anything wrong with it? Syntactically, nothing is out of place. But if it were formatted this way, you might catch it:

 switch(obj->type()) {
   case ID_LESTYPE_ADV:     retVal = ADV;    break;
   case ID_LESTYPE_INTERM:  retVal = INTERM; break;
   case ID_LESTYPE_BASIC:   retVal = BASIC;  break;
   case ID_LESTYPE_BEGIN:   retVal = BASIC;  break;
   default:                 retVal = UNK;
 }

The third and fourth case both return BASIC. This may be correct, or it may be a bug. But with this style of indentation, you see the parallelism of the switch structure, thus noticing where things are the same, and where they're different.

Now, this format of switch isn't always useful, but where it is, the coding standard should allow it. The over-arching rule should always be: allow the notation that affords the clearest communication of intent. Clarity and communication should trump consistency. In this case, the fact that all the cases essentially assign something to the same variable is supported by visually illustrating that. The sameness is emphasized by the same elements being similarly indented. (And yes, it would be advisable to put a break on the default case; notice how it's absence is evident.)


Replies: 1
Using Whitespace Judiciously
- Thomas M. Tuerke

I'm really not a fan of "paragraph-style" code where text is just allowed to flow to the next line when it's reached some limit on the right margin, preferring to use whitespace to reinforce relationships between bits of code.

For example, just found a bug where another coder was initializing an array by repeated calls to a method that took several arguments. By lining up the arguments, it became clear that one of the arguments was incorrect: it was the result of a copy-and-paste, where all the other arguments were corrected, but this one wasn't. Once lined up, the fact that this value was the same as the line above it really made the problem stand out.

I've heard the counter argument that lining things up makes code "hard to read"—but my take on that (as I've said elsewhere) is that this is just the arguments of a "lazy" eye. Lining things up just caught a bug. Given the choice of molly-coddling laziness, or materially catching unintended code, my vote is for the latter.



Coding in Style—Justifying the Rules
- Thomas M. Tuerke

Above, I mention that each rule should justify its existence.

Here's an excellent document put out by Google, which illustrates the point. You or I may not necessarily agree with everything that this guide stipulates, but it clearly lays out the good, the bad, and the decision why things are the way they are:

http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml

Another interesting document is:

http://www.chris-lott.org/resources/cstyle/Wildfire-C++Style.html (mirrored from the original http://www.wildfire.com/~ag/Engineering/Development/C++Style/ which appears to have been taken down.) Though not every rule has it, many are justified with explanations in italics.

Yet another interesting site is https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=637, which lists a collection of recommendations for C, C++, and Java. This would be a useful resource in defining (or refining) a coding standard.

Just to round things up, here's a fun little quote, (obviously meant tongue-in-cheek)...

Any coding standard which insists on syntactic clarity at the expense of algorithmic clarity should be rewritten. If your employer fires you for using this trick, tell them that repeatedly as the security staff drag you out of the building.
- Simon Tatham


References and Additional Reading
- Thomas M. Tuerke

Some good sites on the web:




Share:
Share this page with all your friends on Google. Share this page with all your friends on del.icio.us.
Share this page with all your friends on Digg. Share this page with all your friends on Technorati.
Share this page with all your friends on Reddit. Share this page with all your friends on furl.
Share this page with all your friends on BlinkList. Share this page with all your friends on Newsvine.
Share this page with all your friends on Facebook. Share this page with all your friends on MySpace.