HomeThoughts and MusingsThomas M. Tuerke on Design • Coding in Style
 

Coding in Style

design: /di·'zin/

   n a deliberate plan for the creation or development of an object. vt: to create something according to plan.
   good design: /'gud —/ the product of deliberate forethought and careful understanding of the purpose of a subject, resulting in a subject which significantly improves its utility, allowing it to integrate seamlessly and naturally into the role for which it is intended.
false synonyms: fashion, decor.


Table of Contents [show/hide]
 
Coding in Style

Code serves one purpose: to communicate intent. It does so to two audiences: the compiler that will reduce the code to machine instructions, and the hapless souls who will have to look at your code in the future... namely test developers, maintenence engineers, and the hiring manager for your next job. So good code is code that clearly communicates intent, both to the computer (so you don't introduce bugs) and to future humans (so they don't, either.)

The problem is, most coding standards don't attempt to improve the reliability of the code (except by the vague definition of "improving readability"—an attribute I call into question.) No, most coding standards are sorry things listing a bunch of superficial cosmetic issues like how deeply to indent, and where to place braces and parentheses. I almost never see a standard that says what to do, and follows it with an explanation why it is a Good Thing that such a practice is done.

In fact, I would argue that most coding standards—being purely cosmetic in nature—result in worse code rather than better, simply because eyes become lazy by seeing the same stuff all the time. And productivity often suffers, too. If all developers are ever exposed to is code written to one particular style, those developers will be completely out of their element when looking at a new piece of code, and let's face it: it will happen.

Imagine that you've managed to get all the developers on your team goose-stepping to one particular set of standards. Is that goodness?

Probably not.

  • Can you share code with other teams, "borrowing" something they've already written and tested? If you do, will you risk breaking that code (sending you back to square one testing-wise) by renaming all those identifiers and altering the whitespace and curly-brace positions?
  • What about Open Source? Third Party software? Will you rewrite that to "conform" to "standards" or will you leave it as it is? You know you're eventually going to have to look in that chunk of code. Have you budgeted for the additional testing effort, realizing that you've invalidated a significant chunk of the testing that has already been done on that code.

Unless and until the One True Style (which each developer assumes is the style they happen to use) is universally adopted, there will be a multitude of styles. Some folks will insist that identifiers follow lower_case_and_underscore convention, others ProperCase, others still camelCase, and this whole contingent over here will insist on piHungarianNotation. Some standards mix liberally, one convention for member variables, another for methods and functions, another still for class names. Microsoft has an elaborate naming scheme for its APIs, which it recommends. And then there's the bunch of developers who bristle at that: "yuck. it's too much like Microsoft's."

Let's face it: we're not going to agree on silly minutua, and quite frankly, I don't think we need to.

That said, I think there's a lot more to coding style than following a bunch of superficial regulations. I think that using a particular indentation style simply because founder Joe Codewonk said so is not a good enough reason, unless there is a clear material (read: not hazy, ambiguous, hand-wavy) benefit. Every single element of a shop coding standard should stand up to scrutiny. If it doesn't, it shouldn't be part of the standard.

Does each rule:

  • Catch specific kinds of coding errors in the compiler? (list which)
  • Catch specific kinds of logic errors at runtime? (list how)
  • Communicate intent to the subsequent reader? (describe how)
  • Make debugging and maintenance easier? (list how)
  • Make the code easier to reuse with a minimum of fuss? (list how)

Yes: note that each rule is required to justify its existence by listing how it contributes to the greater good of the code—and not just by allusions to "improved readability". And yes, note that by requiring to justify its existence, periodic review is sometimes necessary. Certain coding stipulations may be mandated by shortcomings of the environment, (I'm thinking "yoda notation" to overcome C-style languages' misuse of assignment for equality operators, or certain limitations in an editor or other development environment.)

As long as we're quibbling over capitalization, spaces, and curly braces, we're missing the point of coding standards. But once we get into styles of coding that transcend the cosmetic and materially result in better code, then we'll really be coding in style.


Sections: 6
One possible Coding Standard
- Thomas M. Tuerke

This is a list of items that would plausibly belong in a coding standard. Note the absence of cosmetic trivialities such as bracing style or indentation depth, conventions that seem to favor consistency over communicative options, and (to my mind) makes as much sense as dictating that everyone will henceforth speak and write in Iambic Pentameter, for the "consistent meter" that style imparts.

Remember that code exists to efficiently and effectively communicate intent to machine and human, and the coding standard should exist to facilitate that better code: code that effectively solves the problem at hand, while containing fewer defects, and less likely to have defects introduced into it.

The standard should cover improving the substance, not merely the form, of the code. As such, whether one uses Allman, K&R, KNF, Whitesmiths, or some other scheme, or the manner in which one indents, should be secondary, even irrelevant, to the extent that it continues to reinforce the intent that is being communicated.

As such, it should not merely serve the vanity of the standards writer, nor impose undue restrictions on the creative process. Coding is as much an artistic expression as it is a rigorous description of a problem's solution, and some of the cleverest (and effective) solutions I've seen came from moments of somebody's unconstrained inspiration: solutions that might not have been possible under some more oppressive coding styles.

Furthermore, allowing the developer some freedom of expression can result in code that is a joy to read (without necessarily compromising on the primary communicative intent,) making it is easier to work with all through that code's life cycle.

Personally speaking, I'm not a big fan of superficial coding "style" standards, as this seldom does anything more than make the reading eye lazy to the point that one is at a disadvantage in reading code not conforming the same arbitrary style.

That said, what might a coding standard contain? Here are a few thoughts, based on my own experience. It's not meant to be exhaustive (nor even uncontroversial,) and individual items may be re-examined as time goes on, and technologies develop.

The Golden Rules

In General

Process

  • All code must be reviewed prior to submission. It's been proven many times: ten minutes now can save ten or more hours later.
  • Identify code fixes with comments, when necessary.
    - This helps the future "archeologist" that will benefit from your reasoning and insight.
    - If it's a hack (say, an 11th-hour fix that should be revisted soon), say so, to ensure its (re)discovery.
  • Do not use the trunk as a sandbox. Submit experimental code into sandbox folders away from production code.
    - If your revision control supports branching, that's probably the way to go, otherwise
    - Use a code-overlay strategy if you don't want to duplicate large portions of the trunk (user checks out the trunk, then checks out the sandbox files on top of that.)

C/C++

  • Pointer declarations should be their own statement.
    - Don't mix pointer declarations with non-pointer declarations in the same statement, as in /int *i1,i2;/ (Why: those unfamiliar with C/C++ often confuse the fact that i2 is not a pointer to an int.)
  • Use const when possible
    - const args
    - const methods
  • Treat char * as suspect.
  • Use typedef liberally to document semantic "derivations" of scalar types. Even though this is not as strong as a class or struct, it does communicate the different intended roles these type definitions play; one is less likely to mistakenly use one typedef for another.
    - By the same token, it is often useful to use typedefs to do away with "decorated" variants of types. Having distinct types for const x, pointer to x, reference to x, etc. helps avoid the problem above, of mixing pointer and non-pointer declarations. (It's also proven useful in migrating classic C++ to C++/CLI, where the decoration changes.)
    - Also, use typedef to meaningfully describe (and do away with the prolix caused by) complicated templated types. Can the type map<string, map<int, string> > communicate anything more meaningful as to its purpose or use?
  • Minimize the number of #includes in your public include file. A public include file should only #include those headers absolutely necessary to compile the contents of that header file. If possible, have the implementation in the cpp file include those headers necessary for implementation. In this spirit, since the public header requires describing private elements as well, try to design the class to leverage forward declarations for these private members. In short, put an #include in your cpp file if you can, and only in your public header if you must.
  • When Hoisting Exit Conditions in for loops, if you're going to hoist, consider putting the hoisted value in the first (initialization) part of the for loop, along with the initialization (and preferably declaration) of the loop variable; this declares the scope of the hoisted value, and ensures it is the same type as the loop variable. To wit:
    for(int i=0, count=thingie.GetCount(); i<count; i++)
        DoSomethingWith(thingie[i]);

Design

  • Avoid argument types of boolean, unless the meaning of true or false is self-evident in the calling function's context; for example EnableWindow(true) is probably Okay, but DoSomething(...,true,...) doesn't really describe what the true is communicating. An enum type, even if it consists of only two items, is usually more descriptive.
    - This is particularly true if several boolean args are used, since a long list of true's and false's don't really self-document the calling code.
  • When the above isn't observed, the caller can compensate by inserting descriptive comments within the parameter list of the calling function, as in
    DoSomething(..., /* refresh = */true, ...)
  • Name things well. What you name something does make a big difference.
    - Methods should generally take the form "verb" or "verb adverb", along with whatever (linguistic) object is appropriate.
    - Properties should generally be adjectives (when representing state) or nouns (when representing composite members.)
    - In general, avoid "amphiboles"—in this case, words that have many different parts of speech—because of the ambiguity it produces. For example, "empty" is both an adjective and a noun, so it's not implicitly clear whether "empty()" is an inquiry as to whether something is empty, or an imperative that causes something to become empty (adjectives/predicates might reasonably be indicated by an "is" prefix.) Similarly, the term last is ambiguous: should it indicate previous or final?
    - Follow common naming conventions, and be aware of linguistic nuances. For example, Delete() typically implies that something is destroyed by the operation, while Remove() typically implies that the subject continues to exist, but is no longer associated with some other object (say, a collection.) Similarly, open typically implies a related close, and attach a related detach, etc.
    - In that vein, use well established antonyms for opposing operations: open vs close, enable vs disable, etc. On the matter of begin and start pick one and stick with it, as appropriate, though there's a strong argument for associating begin with end and start with stop.

Various Metrics of "Goodness"

Goodness is the thing that:

  • Achieves the desired objective more effectively (efficiency in implementation, and/or conciseness and clarity of expression of algorithm.)
  • Communicates intent more effectively (not just by rote conformity, but by active and passive communicative devices.)
  • Makes code easier to use, harder to abuse.
  • Makes maintenance easier, in that it is easier to introduce the intended (features) while harder to introduce the unintended (bugs).

Notes:

  • Consistency is reasonable, but not paramount. Where a "higher goodness" is achieved by inconsistency, let it be so.
  • Conformity for its own sake is not to be admired.

Sections: 2
Anti-Standards
- Thomas M. Tuerke

Here are some guidelines of things to not do:

Exceptions

  • Don't catch and ignore. Do something in every catch, or remove that catch. A comment that describes why it is acceptable to ignore the exception is doing something, but the catch should only be for a specific exception type, not a general catch-all by which unanticipated exceptions would be silently lost.
  • Don't catch and throw. Do eliminate all catch(Something ex) { throw ex; } unless you plan on doing something else in the catch handler other than just throwing. You don't need to clutter up code, or hamper performance or debugging with this useless construct. It is acceptable to delegate unhandled exceptions to some outer context that is prepared to handle it
    (discussion: documentation value....That's fallible; it's not a checked system)
  • Don't log and throw when the exception class has stack tracing built in (ie, Java and .NET) Do either logging or throwing at the error site (you either handle or abandon the situation) or do logging only at the catch site, but don't litter throw sites with logging that will duplicate another log entry at the catch site anyway.
  • Don't throw Exception (the base type.) Do throw the most specific type of exception.

Initialization

  • Don't Declare Far from Use. Do declare as near as possible to first use. C used to require declarations to be at the top of a scope. C++, C#, and many other languages allow in-situ declarations, which is much better. Do Declare in the smallest scope possible.
  • Don't Double-Initialize. Do initialize to null (for pointer or reference types) when you must declare a variable some distance before the first use (because the declaration happens at a different scope than first use.) Needless initialization of a dummy, throw-away type consumes time, memory, and maybe other resources during the constructor call, only to be thrown away without being used. It also masks inappropriate use before the first proper initialization, by silently allowing that inappropriate use to succeed when it should have failed, alerting the developer to the problem.


Coding the Main-line case—Untangling Logic Flow
- Thomas M. Tuerke

Coding the main-line case with a minimum of indentation takes a bit of discipline, but results in cleaner code. Indentation is frequently the result of nested blocks brought about by control structures (loops, conditions, etc.) and makes for more complicated code. Taking the time to minimize indentation often results in code that has a lower complexity, and that results in better (more readable, more reliable) code.

Here's a very simple example (taken from real code) illustrating Tangled Logic Flow. The lines of code alternate (needlessly) between normal path and exception path.

{
  FILE* fp = fopen(fileName, ...);
  if(fp != NULL)
  {
    // ...
    // Do a bunch of processing
    // ...

    fclose(fp);
  }
  else
    throw invalid_file_error();

  // Do More Processing
  // ...
  // ...
}

This can generally be untangled as follows:

{
  FILE* fp = fopen(fileName, ...);
  if(fp == NULL)
    throw invalid_file_error();

  // ...
  // Do a bunch of processing
  // ...

  fclose(fp);

  // Do More Processing
  // ...
  // ...
}

Given a piece of code's functional requirements, there is, of course, a minimum complexity possible—some times you just can't make the code any simpler—but all too often, it's easy to make the code needlessly more complex than necessary.



Improved Readability
- Thomas M. Tuerke

Much of what is accepted as "improved readability" is really just laziness.

I'm currently involved in a project where I've collaborated with three separate groups, each having its own coding style. In fact, in the past twelve months I've had to delve into areas of code that adhered to about five or six different coding styles.

In no instance did the names of things, or indentation, or bracing improve or detract from readability. Admittedly, they were different but one was not inherently more "readable" than any other. This gives lie to the claim that these cosmetics improve readability. They don't.

What did improve readability was the liberal use of comments, describing what the intent of the code was. (Some pretty sorry comments really just mimicked the code in human terms: that was a waste.) Then there were vast wastelands where there wasn't a single comment visible on the screen.

What significantly decreased readability:

  • The extensive use of C++ #defines for complex (as in multi-statement) operations. Debugging through these was tedious.
  • The extensive use of templated types (and STL is a particularly bad offender here, in spite of its performance claims.)
  • Long, meandering functions that tried to do too many things. Some should have been factored into named sub-functions.
  • Overloaded methods that weren't logically related. A fundamental design principle is "same form => same function" so you would think that three overloads with the same name would do the same thing, but with three different kinds of arguments. Not so.


Yoda Notation and Safe Switching
- Thomas M. Tuerke

Here are some non-cosmetic practices.

C and C++ have this construct where assignment is just another operator, and can be mixed in with other operations. This is cool, but the source of trouble, considering that the assignment operator is the equal sign (=), and the equality operator is the double-equal sign (==). As such, you can easily use one when you meant the other, as in

 if(theSky = "blue") // ...

which probably isn't what you meant. Therefore, some coding standards mandate that you use what I refer to as yoda notation ("backwards it is, yes!") where you reverse the operands so that the compiler won't let you do an assignment.

 if("blue" = theSky) // ...

will simply fail to compile, not letting you get by with = when you meant ==. Now, I'll say I really don't care for yoda notation, but the fact remains: you can't mis-use the assignment operator with it, and that results in better code, so if you're using C or C++, it's probably a good idea to use. (C# thankfully doesn't suffer from that problem, so yoda notation isn't as important there... but if you jump between C++ and C#, then it may be a good habit to maintain.)

Another code rule states that the break in a switch's case statement should always appear outside of any curly-brace block, as in

 case foo:
 {
   // ...
 }
 break;

as opposed to

 case foo:
 {
  // ...
  break;
 }

The reason for this is subtle but important. As the code between the braces grows (which happens in production environments) you might find it necessary to apply more logic, possibly as a condition (don't laugh or roll your eyes, this actually happened...) and somebody may (that is, did) quickly solve the problem like this:

 case foo:
  if(someCondition)
  {
    // ...

As you can see, if the block ends with the break inside, the break suddenly is conditional (and if someCondition is false, the next case, if any, is executed instead. Yes, the maintenance programmer should have been more careful. Yes, the body of the case should probably be factored. Yes, object-oriented design says that switches are evil. But in spite of all that, they persist. So it makes sense to write code that is hard for somebody else to jeopardize. The original author allowed the error to happen as much as the later developer made it happen.


Sections: 3
More on Yoda Notation
- Thomas M. Tuerke

I mentioned above the practice of reversing the operands of the equality operator, as in

 if("blue" == theSky) ...

for the very real benefit of preventing the accidental use of the assignment operator instead.

However, I've noticed some cases where this practice is done to other relational comparisons, the various inequalities, such as

 if(NULL != ptr) ...

or even

 if(1 < listOfThings.Count()) ...

Now, I've said I don't like Yoda notation. It's really just a band-aid over the poor design (albeit one that most developers in the C family of languages have gotten used to.) But I still concede, given the circumstances, that for the equality operator, it's a necessity. We make concessions to readability because it catches real bugs.

I'm less convinced, though, that coding standards benefit from applying Yoda notation to anything but equality operations. Yes, it's consistent—important for those that do battle against those little hobgoblins of inconsistency—but as I've said, unless it specifically prevents bugs, I don't see the value. I'm not willing to lose readability in exchange for ... nothing, really. Ungoodness.

There are degrees of ungoodness, too. Yoda not-equals is just on the rim of purgatory as far as things go, since not-equals, !=, is commutative so a != b is the same as b != a. But other operators deserve to be put in a deeper circle of ungoodness, precisely because they are not commutative: a < b is not the same as b < a, so in order to "Yodafy" these, you actually have to use a different operator. Forgetting to do this is as likely to cause a bug as mistakenly using the assignment operator when you in fact meant to compare for equality. So, in the name of consistency, you actually run the risk of introducing more bugs. This is coding style evil.

And it's just plain harder to read.

Now, to those for whom inconsistency is so vital, I say that the inconsistency is actually useful. I don't think we should get used to seeing Yoda notation. It should remain that mental speedbump that it is, to draw attention to its use. Remember, we're trying to catch bugs.



Yet More on Yoda Notation—Java
- Thomas M. Tuerke

By the way, different languages have different idioms that fall into the category of Yoda Notation. For example, in Java, the following practice is not uncommon:

 if("blue".equals(theSky)) ...

as opposed to

if(theSky.equals("blue")) ...

Functionally they're equivalent, in that they both determine whether the string theSky contains (is equal to) "blue".

The difference is that if the variable theSky is null, the second example will fail with a null pointer exception, but the former example will simply return false. A slightly different condition being avoided, but both are solved by transposing commutative operands.



Yoda Notation (aka Yoda Condition)—Origin of the term
- Thomas M. Tuerke

As an aside, I've written about the term Yoda Notation both here and elsewhere on this site. The term has—happily—taken off, and even transmogrified to Yoda Condition (that is to say, a condition written in Yoda notation) in some circles. As far as I can tell, the spate of references to it (on StackOverflow.com, CodingHorror.com, et al) seem to date from about two or three years ago (circa 2010, 2011) and some incorrectly cite StackOverflow user zneak (aka Félix Cloutier) as coining the term in May of 2010. The astute reader will note that the archive.org Wayback Machine has a copy of this site, captured two years before, showing the term already in existence, and clearly already influential as the propagated blue-sky example illustrates.

To set the record straight, the Yoda-based term describing reversed operands of the assignment/equality operators frequently used in C and derivative languages clearly predates zneaks ostensible "coining"—I've been using the term for a number of years before even writing about it here on my site—though I do appreciate zneak's contributions in helping popularize it.



Coding in Style—How rigid should it be?
- Thomas M. Tuerke

At least one coding standard I've seen has rigid stipulations on the form of a switch statement, stating that each case label shall be on its own line, indented some amount, with code further indented beneath it, yada-yada.

It's akin to saying that writers can't use similes or metaphors in their writing style. Possible to do, but severely limiting their communicative options.

The problem is, while making every switch statement conform to a rigid style makes for highly consistent code (and, in fact, reducing it to something a machine without any intelligence can generate) it may lose some of its communicative power in the process.

For example, imagine this switch statement:

 switch(obj->type()) {
   case ID_LESTYPE_ADV:
     retVal = ADV;
     break;

   case ID_LESTYPE_INTERM:
     retVal = INTERM;
     break;

   case ID_LESTYPE_BASIC:
     retVal = BASIC;
     break;

   case ID_LESTYPE_BEGIN:
     retVal = BASIC;
     break;

   default:
     retVal = UNK;
 }

Notice anything wrong with it? Syntactically, nothing is out of place. But if it were formatted this way, you might catch it:

 switch(obj->type()) {
   case ID_LESTYPE_ADV:     retVal = ADV;    break;
   case ID_LESTYPE_INTERM:  retVal = INTERM; break;
   case ID_LESTYPE_BASIC:   retVal = BASIC;  break;
   case ID_LESTYPE_BEGIN:   retVal = BASIC;  break;
   default:                 retVal = UNK;
 }

The third and fourth case both return BASIC. This may be correct, or it may be a bug. But with this style of indentation, you see the parallelism of the switch structure, thus noticing where things are the same, and where they're different.

Now, this format of switch isn't always useful, but where it is, the coding standard should allow it. The over-arching rule should always be: allow the notation that affords the clearest communication of intent. Clarity and communication should trump consistency. In this case, the fact that all the cases essentially assign something to the same variable is supported by visually illustrating that. The sameness is emphasized by the same elements being similarly indented. (And yes, it would be advisable to put a break on the default case; notice how it's absence is evident.)


Sections: 2
Using Whitespace Judiciously
- Thomas M. Tuerke

I'm really not a fan of "paragraph-style" code where text is just allowed to flow to the next line when it's reached some limit on the right margin, preferring to use whitespace to reinforce relationships between bits of code.

For example, just found a bug where another coder was initializing an array by repeated calls to a method that took several arguments. By lining up the arguments, it became clear that one of the arguments was incorrect: it was the result of a copy-and-paste, where all the other arguments were corrected, but this one wasn't. Once lined up, the fact that this value was the same as the line above it really made the problem stand out.

I've heard the counter argument that lining things up makes code "hard to read"—but my take on that (as I've said elsewhere) is that this is just the arguments of a "lazy" eye. Lining things up just caught a bug. Given the choice of molly-coddling laziness, or materially catching unintended code, my vote is for the latter.



Using Whitespace Judiciously (Part II)
- Thomas M. Tuerke

This came up in conversation recently.

I have—for some years now—had this habit of putting a space on both sides of the unary negation operator, !

Specifically, I tend to do this:

 if( ! some_condition) {
   ...

instead of

 if(!some_condition) {
   ...

This offended the sensibilities of a colleague who pointed out that might be inconsistent with certain coding standards concerning where to put whitespace.

I granted him the point that it was inconsistent in that respect, but argued that given the power of that single character—namely, completely reversing the polarity of the condition—it warranted more than the one pixel's worth of real-estate. And I mean a pixel. Many typefaces, be they fixed or variable pitch, render the exclamation mark as a single-pixel glyph. Given the likely proximity to parentheses, braces, and other punctuation, it's really too easy to miss such a petite but profound operator.

This gets to my point about tangible benefits trumping consistency. I want this operator to scream its existence, to demand your attention. Why? How many times have you gotten it wrong? Forgotten to put a negation in, or put one in by accident? I'll warrant more than once. And the consequences are pretty serious, as in the complete opposite of what you meant.

The thing is, it's not an automatic space-bang-space for me. I don't want to commit it to motor memory. I make it my own little development shisa kanko, during which I stop and re-evaluate what I'm about to express: first, to see if I really want the negation or not; and if so, I see if there's a way of expressing it in the non-negated polarity; only if there is no alternative, I continue, with space-bang-space being the result. (Okay, so I don't go all overboard with point-and-call, but mentally, it's a rough equivalent.)

So: inconsistent? Yes. Addressing a reasonably risky bit of coding? Yes. Let's face it: the chances of it happening are not miniscule, and when it happens, the consequences are, well, exactly not what you really meant.



Coding in Style—Justifying the Rules
- Thomas M. Tuerke

Above, I mention that each rule should justify its existence.

Here's an excellent document put out by Google, which illustrates the point. You or I may not necessarily agree with everything that this guide stipulates, but it clearly lays out the good, the bad, and the decision why things are the way they are:

http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml

Another interesting document is:

http://www.chris-lott.org/resources/cstyle/Wildfire-C++Style.html (mirrored from the original http://www.wildfire.com/~ag/Engineering/Development/C++Style/ which appears to have been taken down.) Though not every rule has it, many are justified with explanations in italics.

Yet another interesting site is https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=637, which lists a collection of recommendations for C, C++, and Java. This would be a useful resource in defining (or refining) a coding standard.

Just to round things up, here's a fun little quote, (obviously meant tongue-in-cheek)...

Any coding standard which insists on syntactic clarity at the expense of algorithmic clarity should be rewritten. If your employer fires you for using this trick, tell them that repeatedly as the security staff drag you out of the building.
- Simon Tatham


References and Additional Reading
- Thomas M. Tuerke

Some good sites on the web:


Sections: 3
More References and Additional Reading
- Thomas M. Tuerke

An interesting site I've come across, JavaPractices.com, which goes into great depth on good things to do in Java (and curiously in line with what's presented here.) In particular, I like Wisdom, not rules...



Yet Another Reference
- Thomas M. Tuerke

Here's an interesting site: PRQA's High Integrity C++ Coding Standard. The downloadable PDF document contains an extensive set of rules notable for these reasons:

  1. Each rule is followed by a justification. This is not just "well, because I say so" justification.
  2. The word "brace" appears only twice in the entire document, and neither reference suggests a bracing style.
  3. Some rules cite exceptions, recognizing that there is no absolute one-size-fits-all.
  4. There are abundant references to known authorities, and the Bibliography is a who's-who of the C++ world: Stroustrup, Meyers, Sutter, etc.

This is not a cosmetic coding standard intended to be a human-implemented code pretty-printer, but one meant to facilitate writing better code by catching subtle bugs as soon as possible. The PR site claims their standard "is currently used on more C++ projects than both MISRA C++ and JSF AV C++ combined..." and their intent is to "...help [their] customers in automotive, aerospace, medical and other industries to develop high quality C and C++ code..." where correctness is paramount: where the curly braces line up may have some small measure of usefulness, but you want that medical device's firmware (or an aircraft's avionics) to not crash at the worst possible moment.

The PDF is a free download (but you need to fill in a form) and well worth examining.



Another Reference Still—and some thoughts
- Thomas M. Tuerke

Here's another link worth a read, from Rochester Institute of Technology:

http://www.cs.rit.edu/~cs4/Documents/progstyle-cpp-2009.html

Passages of note:

These standards exist to
...
• Prohibit practices that greatly increase the chances of errors;

and

These standards strike a balance between discipline and flexibility. Forbidden things have been common sources of error, but special situations may arise in which a forbidden idiom seems to be the right thing. If you feel you have found such a situation, explain it to your instructor and ask permission to violate the standard for that specific case.

and

9. Nothing is Cast in Stone

You will by default lose points for not following these standards, but that does not have to be the case. If you can make a good, strong argument to your instructor explaining how and why you would like to stray from the standards, you are welcome to do so. Be prepared to demonstrate, by comparison, why a different approach would yield a more understandable and/or less error-prone piece of code. And apply the variation consistently.

This is excellent. I don't agree with everything they present there, but everything is justified, and that's the basis for an intelligent conversation about how to write better code. And possibly amend the standard to reflect improved understanding of how to do that.

For example, section 3, Routine Roles, suggests that functions should be void if they alter internal state (that is, have "side effects") and if they return values, they should be stateless and free of side effects. This seems to stem from the Pascal mindset—where one had the procedure keyword and a function keyword to codify that distinction—and a Functional Programming mindset, where everything is side-effect free. It is, without a doubt, an excellent discipline to employ when learning to code, as it is altogether too easy to form really bad habits (believe me, I've seen lots of that kind of code) but I find that as a dogma, it proves too impractical in the real world.

Imagine, for example, a set of functions (or methods) that interact with a resource. In the pure case, you would have to modify the resource with a mutator "routine" (procedure, void function, what have you) and then inspect the results with a side-effect-free function.

This can result in awkwardly-reading code. But that's a highly subjective claim, so I won't tout it.

More materially, however, it can result in accidental abuse of the resource, such as repeatedly modifying the resource and failing to observe the state change in between. After all, syntactically this is legal, even if logically it is not, so the designer made it possible to do, and that's just bad design. This frequently requires higher-level protocols (such as only-weakly-enforceable instructions to not do that) and other sorts of inefficiencies, such as internal guards against such abuse, if in fact the abuse is anticipated during implementation (as opposed to discovering it out in the world.)

Fortunately, the example suggests the conflation of the two is "Not Recommended" as opposed to being entirely prohibited. This is a good general guideline for training the unformed mind, but it is also an exemplar of how a dogmatic approach to coding guidelines can result in worse code by making it possible to write bad code.

Let's get concrete here, and maybe a bit Socratic.

Suppose we have an iterator, something that abstracts the traversal over some collection of things. According to the guideline, we would need both a mutator (to move to the next item) and a separate inspector to determine whether there is in fact a next item. This would require a paradigm of mutate-then-inspect, as in

 x.moveNext();
 if(x.valid()) { doSomething(x.value()) }

"But isn't that exactly what the standard C++ collection classes do?"

Yes, actually, it is. And yes, I'm asserting that it's not the best design. A reasonable design, given the prevailing circumstances at the time, but an unnecessarily complicated one. The STL collections (the precursors to the standard collection classes in C++ now) were designed to fit into the for loop, as in

 for(X::iterator it = x.begin(); it != x.end(); it++) { doSomethingWith(it); }

and when you do so, you're fairly safe. But it is really quite messy. The inspection isn't even a function, it's a comparison expression. Uncoupling the mutation of the iterator means you can it++ independent of that inspection. Sure, code (had to be written that) protects walking past end(), so things don't go terribly wrong, but it could lead to some mysterious iteration. Oh, and if you want to traverse the collection in reverse, it's an entirely different loop, using rbegin() and rend() instead of begin() and end(). That means it's very easy to express loops that spin over a collection in reverse order as

 for(X::iterator it = x.rbegin(); it != x.end(); it++) {doSomethingWith(it); }

There... did you see it? (Let me give you a second.)

Yes, that's right, we used the wrong inspector. Sure, a higher-level protocol exists ("don't do that") and maybe you've got some static analysis tool to catch the mismatch, but the compiler probably swallowed it just fine. And if none of these safety measures kicked in, you just wrote bad code. If you're lucky, it didn't work from the outset. If you're unlucky it sort of worked at first, and bit you long after you wrote it... and then stole many hours of your life trying to find and fix it.

"Okay, Thomas... point made: it's a sharp tool, and you can cut yourself with it. Got any better ideas?"

Well, this Socratic aside was posed in defense of hybrid functions, ones that side-effect and return a value. Imagine an iterator paradigm that allowed you to express traversal as

 X::iterator it(x);
 while(it.more()) { doSomethingWith(it); }

Here, the more() method both applies a side-effect, selecting the next element in the collection x, as well as immediately reporting the success of that selection.

This is far cleaner (that is, easier to read, and thus convey the intent of the operation) and eminently more safe from misuse.

In short, better code.

"But you lose so much flexibility that way, too!"

Okay, so that's a bit off topic: we're justifying hybrid functions here, but let me touch on that and then crawl out of the rabbit hole.

My experience has shown that nearly every traversal loop is one flavor: spinning over the entire collection in order. Very, very seldom do you need to do otherwise. This is born out by C++11 introducing the new range-based for, finally catching up with so many other languages that have the foreach variant that does exactly that.

But in those few—those very few—instances when more precise iteration is required, once again imagine an iterator paradigm that allowed

 X::iterator_reverse it(x);
 while(it.more()) { doSomethingWith(it); }

or

 X::iterator_range it(x,start,end);
 while(it.more()) { doSomethingWith(it); }

A benefit is that the iterator itself localizes in one spot (its construction) the complete definition of the things it traverses, rather than distributing that—and quite possibly inconsistently—to disparate parts of a for loop. Another benefit: you now have a single object you can pass around that knows when it's done. This is something you can't do with standard iterators: you always need to pass it and the desired end condition.

Anyway, to wrap up: the overarching goal is always to write better code.

Most of the time, the division of side-effecting routines/procedures from side-effect-free functions is a good one. But when the dogmatic application of such a policy imparts constraints that makes it easy to write bad code—code that by its design facilitates syntactically-correct flawed code, or even just code that by virtue of prolix is harder to fathom—then obviously, that dogma must yield to reasoned argument for another way.

Which, as I point out, the guideline does—at least tenuously—allow.




Share: