|Home • Thoughts and Musings • Thomas M. Tuerke on Design • Complexity vs. Usefulness—Regular Expressions
Complexity vs. Usefulness—Regular Expressions
n a deliberate plan for the creation or development of an object. vt: to create something according to plan.
good design: /'gud —/ the product of deliberate forethought and careful understanding of the purpose of a subject, resulting in a subject which significantly improves its utility, allowing it to integrate seamlessly and naturally into the role for which it is intended.
false synonyms: fashion, decor.
Table of Contents [show/hide]
Tue May 16, 2006 Link to this message
Complexity vs. Usefulness—Regular Expressions
In short, regular expressions are a means of pattern matching. They're like string comparison on steroids. In most computing languages you can say something like
if(SkyColor == "blue")
or some variation. You can even tweak it so case didn't matter: "Blue" would also compare true. But most languages give you only "exact matches" to the string: it's either equal to the whole thing character-for-character, or it's not. It gets more complex if you want to compare against a bunch of strings, as in this pseudo-code example:
if(SkyColor == "blue" or SkyColor == "aqua" or SkyColor == "teal" or SkyColor == "cyan")
On the other hand, regular expressions make quick work of this (I'll use the Perl =~ operator in my pseudo-code examples to mean "compare against the regular expression"):
if(SkyColor =~ /blue|aqua|teal|cyan/i)
For even more flexibility, try this:
if(SkyColor =~ /((deep|dark|pale|light) )?(blue|aqua|teal|cyan)/i)
Here, we optionally allow an adjective like deep, dark, pale, or light to precede one of the colors, allowing strings like "deep cyan" or "pale teal" to match, as well as unadorned "aqua" or "blue". All in one line. Lots of power, very compact notation. Imagine having to do that without regular expressions, comparing SkyColor to each possible string value.
Lots of power in that one (not so simple) line. But unless you know what you're looking at, it's just a bunch strange characters. So, now that I've illustrated an example, let's consider its complexity...
As you can see, despite their incredible capabilities, regular expressions aren't exactly the friendliest kids on the block. I'm just speaking for myself when I say I love Regular Expressions. Other developers either love them (for their power) or hate them (for their arcane syntax.) Here's what I mean; some of the rules for regular expressions are as follows:
So far, these regular expression critters don't seem so odd. But wait, there's more. Lots more (as in lots more rules to know.) In particular, several characters have special meaning:
See? All that, just to understand the one-line expression above: eight rules to know instead of just three for simple strings. Complexity.
So the question becomes: is that added complexity necessary? The answer isn't that straightforward. If you're in an environment where knowing regular expressions is a given—like in a Perl application, nearly anything *nix, and so on—you might be Okay; those are just the rules of the road. But you probably don't want to surface that to the end user, unless you're targeting an audience for whom regular expressions are also a well-known thing (and that's not always synonymous with "all developers"!)
Summing up, a technology as powerful and wide-spread as regular expressions still has barriers to adoption, even in the technical community, so imagine what some "new" technology—some parochial construct you're considering for your application—will face. Is it worth it? You have some questions to answer, and you need to be objective in answering them:
So consider how important that new invention is before subjecting your users to it.
1 Almost the same. I should point out that, for simplicity, I've omitted the ^ and $ notations. The description above is neither rigorous nor complete. Really, in order for the examples of regular expressions above to be truly the same as the simple string comparison counterparts, they should start with ^ and end with $; these force the pattern to match the entire string, and not just any fragment within it. Without them, the semantics are "is the expression contained within the string?" ... but let's not further complicate an already complex discussion. To the purists I say with a wink and a nod: just assume that they are present for the purpose of this discussion, and let's be done with that.
This page and all constituent elements are copyright © Thomas M. Tuerke 2019
unless otherwise indicated. The TMT-Diamond Logo is a Servicemark of Thomas M. Tuerke
All Rights Reserved
Reproduction or distribution without prior written permission is strictly prohibited.
Scripting and DHTML by Technomancer Software