change language:

Regular Expressionss, also called regexp, regex of RE, are a way to describe a pattern that a text string must match. Regular expressions were developed in the 1950's and are still supported by most programming languages. Regular expressions can also be used in VBA. Well-known variants of a regular expression are the wildcards (* en ?) that can be used to search for file names, for example. Regular expressions are among other things suitable for checking the correctness of data input on forms, for example. The great advantage of regular expressions is that simple to very complex input checks can be performed with just a few lines of code. The Like statement can also be used in VBA for simple syntax checks, but regular expressions are much more versatile. In this tutorial it will be explained how to make regular expressions and what can be done with them. This will mainly be done on the basis of a few examples. The intention here is not to give a complete overview of all the possibilities of regular expressions. It is intended more as a first introduction.

Dutch postal code check

Regular expressions have a special syntax, with a number of characters playing a special role. In this section, a simple example will first be presented for checking the entry of a Dutch postal code. This shows how regular expressions can be called in the VBA. The syntax will be discussed in more detail in the next section. The VBA code of the function to check the entry of a Dutch postal code reads as follows:

Function CheckDutchPostalCode(PostalCode As String) As Boolean
    With CreateObject("vbscript.regexp")
        .Pattern = "^[1-9][0-9][0-9][0-9] [A-Z][A-Z]$"
        CheckDutchPostalCode = .Test(CStr(PostalCode))
    End With
End Function

As can be seen, the code is very short. In order to use regular expressions in VBA, this object library must be made available. This can be done by checking this library in the Visual Basic Editor under the Tools → References tab or, as done in this example, by means of so-called late binding. That is, the object library is loaded at runtime. This is done on the first line of the function. Then the regular expression is set at Pattern,after which it is checked by means of the Test method whether the specified postal code satisfies the regular expression. If yes, the function returns TRUE, if not FALSE. The regular expression used in the above function with Pattern can be even shorter, but it is easier to discuss the more extensive simple variant first. The short version is discussed in the next section. The regular expression starts with a ^ and ends with a $. This Indicates that the expression applies to the whole word and not part of it. Then it is indicated that the postal code must first consist of 4 random numbers between 0 and 9, where the first number cannot be zero. Then a space must follow and then two arbitrary capital letters. Any postal code that matches this pattern will be TRUE and any other post code code will be FALSE.

Syntax

As shown in the previous example, regular expressions have their own syntax. Before a few more examples are discussed, it is good to first discuss this syntax. In this syntax, a number of characters have a special meaning. The most important are:

  • [ ] square brackets: are used to allow a list of characters. For example, [pqr] allows the letter p, q or r.A minus sign can refer to a series of characters, as in the example above [0-9] and [A-Z].
  • ( ) round brackets: characters in parentheses indicate a pattern. For example l(oo|ea)p recognizes loop and leap.
  • | pipe: separator for patterns
  • { } braces: the number in braces indicates how many times to repeat the preceding. 0{5 }thus means 00000. A minimum and maximum can also be indicated with {min,max}. So k {2,4} recognizes kk, kkk and kkkk
  • ^ caret: start of line and, if used within square brackets, it matches NOT. So [^ g] means all characters except the letter g, [^0-9] means anything but a number.
  • $ dollar sign: end of a line.
  • . dot: represents any character.
  • + plus sign: indicates that the preceding character must occur at least once. For example bo+m recognizes bom, boom, booom, etc..
  • ? question mark: indicates that the preceding character may appear at most once. For example br?eak recognizes break and beak.
  • * asterisk: indicates that the preceding character can be zero or more. For example 10*1 recognizes 11, 101, 1001, 10001 etc.
  • \ backslash: if one of the special characters is to be used as a normal character, it must be preceded by a backslash. In addition, the backslash is used for a number of short codes:
    • \d: matches any number and is the same as [0-9].
    • \w: matches any character. That corresponds to [a-zA-Z0-9_], so all letters and numbers including underscore.
    • \s: means whitespace. This can be, for example, a space, tab or line feed.

Based on this syntax, the regular expression for the Dutch postal code "^[1-9][0-9][0-9][0-9] [A-Z][A-Z]$" be shortened to "^[1-9]\d{3} [A-Z]{2}$". If lowercase letters are also allowed in the postal code, the expression can be changed to "^[1-9]\d{3} [a-zA-Z]{2}$". Another option is to add the following code line instead: .IgnoreCase = True. This line must be placed before the line with CheckDutchPostalCode.

Examples

Following are some examples of regular expressions. These expressions can easily be fitted into the VBA function below. All that has to be done is to insert the expression after .Pattern =. If several check functions are required, give each function its own name.

Function CheckSyntax(TextString As String) As Boolean
    With CreateObject("vbscript.regexp")
        .Pattern = ""
        CheckSyntax = .Test(CStr(TextString))
    End With
End Function

Initials: "^([A-Z][a-z]?\.){1,5}$"

This expression checks the entry of initials of a name. This may be a minimum of 1 to a maximum of 5 initials. Each letter must be followed by a period. Each initial must be uppercase. This capital letter can optionally be followed by a lowercase letter. So are allowed for example: W.A. and A.F.Th. Not allowed, for example A.F.TH, g.h. and GH.

International Bank Account Number (IBAN): "^NL\d{2} [A-Z]{4} 0\d{9}$"

This expression checks the syntax of a Dutch IBAN number. So first the letters NL followed by 2 random numbers. Then a space followed by 4 capital letters, then a space followed by a zero and 9 numbers. Of course this is only a simple syntax check. This does not guarantee that it is an existing valid IBAN number.

Year of birth: "^(19\d\d|20(0\d|1[0123]))$"

Here the entry of a year of birth is checked. This can be 19 followed by 2 random numbers, or 20 followed by either 0 and a number or 1 followed by 0,1,2, or 3. So a given date of birth is only valid in this case if it is between 1900 and 2013.

E-mail address: "^[A-Za-z0-9._%+-]+@([A-Za-z0-9+-]+\.)+[A-Za-z]{2,4}$"

One or more of the following characters may appear before the @: A-Z, a-z, 0-9, . _ % + -. This must be at least one character. A-Z, a-z, 0-9, +,- may occur after the @. This must be at least one character. Such a character series must be closed with a period. There may be several points, but each point must be preceeded with at least one of the characters mentioned. So This email address is being protected from spambots. You need JavaScript enabled to view it. and This email address is being protected from spambots. You need JavaScript enabled to view it. is sufficient, but name @ subdomain..domain.nl does not suffice (because of the 2 consecutive points).Next comes the main domain name, which must consist of a minimum of 2 and a maximum of 4 letters. This expression works for almost any email address, except for top-level domains with more than 4 letters. If desired, the expression can be extended with valid longer top level domains, such as travel and museum:

"^[A-Za-z0-9._%+-]+@([A-Za-z0-9+-]+\.)+([A-Za-z]{2,4}|museum|travel)$"

Since most domain names still have an extension of 2 letters, the following expression could also be chosen:

"^[A-Za-z0-9._%+-]+@([A-Za-z0-9+-]+\.)+([A-Za-z]{2}|com|info|net|org)$"

Here we have chosen to add the four most common main domains that consist of more than 2 letters. The vast majority of e-mail addresses are recognized with this. If desired, the other main domain names can also be added. The advantage of the latter expression is that there is less chance of approving non-existent main domain names. In the first expression, all main domain names of 2, 3 and 4 letters are approved. In the last expression, all 2 letter main domain names are approved plus four more named main domains consisting of more letters. You can also choose to explicitly name all main domains.

Conclusion

This tutorial has covered the basics of regular expressions. This allows very complex input checks to be made in a relatively simple manner. If that were to be programmed without regular expressions, then in many cases it would have to be elaborate complex code. With regular expressions, this is just a few lines of code. A lot more is possible with regular expressions, but it is too much to cover in this tutorial.

Questions / suggestions

Hopefully this article has helped you understand and apply regular expressions in VBA. If you have any questions about this topic or suggestions for improvement, please post a comment below.

arrow_up