In short, regular expressions are a means of pattern matching. They're like string comparison on steroids. In most computing languages you can say something like
if(SkyColor == "blue")
or some variation. You can even tweak it so case didn't matter: "Blue" would also compare true. But most languages give you only "exact matches" to the string: it's either equal to the whole thing character-for-character, or it's not. It gets more complex if you want to compare against a bunch of strings, as in this pseudo-code example:
if(SkyColor == "blue" or SkyColor == "aqua" or SkyColor == "teal" or SkyColor == "cyan")
On the other hand, regular expressions make quick work of this (I'll use the Perl =~ operator in my pseudo-code examples to mean "compare against the regular expression"):
if(SkyColor =~ /blue|aqua|teal|cyan/i)
For even more flexibility, try this:
if(SkyColor =~ /((deep|dark|pale|light) )?(blue|aqua|teal|cyan)/i)
Here, we optionally allow an adjective like deep, dark, pale, or light to precede one of the colors, allowing strings like "deep cyan" or "pale teal" to match, as well as unadorned "aqua" or "blue". All in one line. Lots of power, very compact notation. Imagine having to do that without regular expressions, comparing SkyColor to each possible string value.
Lots of power in that one (not so simple) line. But unless you know what you're looking at, it's just a bunch strange characters. So, now that I've illustrated an example, let's consider its complexity...
As you can see, despite their incredible capabilities, regular expressions aren't exactly the friendliest kids on the block. I'm just speaking for myself when I say I love Regular Expressions. Other developers either love them (for their power) or hate them (for their arcane syntax.) Here's what I mean; some of the rules for regular expressions are as follows:
- Just like strings are enclosed in quotation marks, regular expressions are usually enclosed in pairs of slashes, as in /hi mom/.
- Most characters—letters, numbers, punctuation—stand for themselves. If you say x =~ /a/, that's essentially the same as saying x == "a". 1
- Strings of these normal characters must be in the order given, just like ordinary string comparisons. If you say x =~ /ab/, that's just like saying x == "ab".
So far, these regular expression critters don't seem so odd. But wait, there's more. Lots more (as in lots more rules to know.) In particular, several characters have special meaning:
- The vertical bar means "either what's to the left of me, or to the right, but not both," so x =~ /a|b/ is the same as saying (x == "a" or x == "b"), but it's not the same as saying x == "ab" nor is it the same as x == "a|b".
- Parentheses enclose things to group them together, just like in mathematics. This is useful because...
- The question mark means that what is in front of it is optional. The thing that precedes it can appear once, or not at all in order for the expression to match. x =~ /be?ar/ is the same as (x == "bear" or x == "bar") in that the "e" is optional. A question mark after a parenthesized expression means the whole thing is optional.
- If you want to include one of these special characters (vertical bar, parentheses, or even a slash) in your regular expression, you need to put a back-slash in front of it, so x =~ /what?/ is the same as (x == "what" or x == "wha"), but not x == "what?"; for that you would need x =~ /what\?/.
- To make the entire expression case-insensitive, just follow that trailing slash with an the letter i: x =~ /ab/i is the same as (x == "ab" or x == "Ab" or x == "aB" or x == "AB").
See? All that, just to understand the one-line expression above: eight rules to know instead of just three for simple strings. Complexity.
So the question becomes: is that added complexity necessary? The answer isn't that straightforward. If you're in an environment where knowing regular expressions is a given—like in a Perl application, nearly anything *nix, and so on—you might be Okay; those are just the rules of the road. But you probably don't want to surface that to the end user, unless you're targeting an audience for whom regular expressions are also a well-known thing (and that's not always synonymous with "all developers"!)
Summing up, a technology as powerful and wide-spread as regular expressions still has barriers to adoption, even in the technical community, so imagine what some "new" technology—some parochial construct you're considering for your application—will face. Is it worth it? You have some questions to answer, and you need to be objective in answering them:
- Does the complexity of learning many rules, terms, or metaphors outweigh the benefits achieved? (This one is particularly difficult to keep in perspective, as many designers tend to over-estimate the importance of the thing they're designing.)
- Are the rules ones that the user will bring with them from previous experience, and use in other parts of their lives, or are they just acquired for this one little part of their lives?
- Is there another way—albeit with perhaps more effort on your part—to more simply solve their problem and allow them to accomplish their tasks?
So consider how important that new invention is before subjecting your users to it.
1 Almost the same. I should point out that, for simplicity, I've omitted the ^ and $ notations. The description above is neither rigorous nor complete. Really, in order for the examples of regular expressions above to be truly the same as the simple string comparison counterparts, they should start with ^ and end with $; these force the pattern to match the entire string, and not just any fragment within it. Without them, the semantics are "is the expression contained within the string?" ... but let's not further complicate an already complex discussion. To the purists I say with a wink and a nod: just assume that they are present for the purpose of this discussion, and let's be done with that.