Help - Search - Members - Calendar
Full Version: Regular Expressions
hsc message board > Main > hsc Software Support
Jeff Hendrickson
Purify's Blacklist, Whitelist, and Ignorelist all support Regular Expressions. Regular Expressions are a search shorthand to allow the user to match patterns in text. It is very powerful.

A simple example of Regular Expression use..., say you don't want to ever receive an email that contains the word replica.
You can add the Regular Expression :
\breplica\b
: to your Purify Blacklist. Using the magic decoder below, you can figure out what this Regular Expression does.

A more complex example, by user Enoch (Lance) to catch the word viagra.
Add the Regular Expression :
\b(V|v).{0,12}(G|g).{0,6}(R|r).{0,6}(A|a|@)\b
: to your Purify Blacklist.

Purify has a test feature for all filters that support Regular Expressions. We recommend its use.

I'm going to keep this topic pinned, and open, for users to share, and ask about this powerful Purify feature.

. -Matches any character except newline.
[a-z0-9] -Matches any single character of set.
[^a-z0-9] -Matches any single character not in set.
\d -Matches a digit. Same as [0-9].
\D -Matches a non-digit. Same as [^0-9].
\w -Matches an alphanumeric (word) character -- [a-zA-Z0-9_].
\W -Matches a non-word character [^a-zA-Z0-9_].
\s -Matches a whitespace character (space, tab, newline, etc.).
\S -Matches a non-whitespace character.
\n -Matches a newline (line feed).
\r -Matches a return.
\t -Matches a tab.
\f -Matches a formfeed.
\0 -Matches a null character.
\000 -Also matches a null character because of the following:
\nnn -Matches an ASCII character of that octal value.
\xnn -Matches an ASCII character of that hexadecimal value.
\cX -Matches an ASCII control character.
\metachar -Matches the meta-character (e.g., \, ., |).
(abc) -Used to create subexpressions. Remembers the match for later backreferences. Referenced by replacement patterns that use \1, \2, etc.
\1, \2, -Matches whatever first (second, and so on) of parens matched.
x? -Matches 0 or 1 x's, where x is any of above.
x* -Matches 0 or more x's.
x+ -Matches 1 or more x's.
x{m,n} -Matches at least m x's, but no more than n.
abc -Matches all of a, b, and c in order.
a|b|c -Matches one of a, b, or c.
\b -Matches a word boundary (outside [] only, inside [] it matches backspace).
\B -Matches a non-word boundary.
^ -Anchors match to the beginning of a line or string.
$ -Anchors match to the end of a line or string.
paulbel
Thanks for adding this, Jeff. I know it will be very useful.

QUOTE (Jeff Hendrickson @ Aug 22 2008, 05:20 AM) *
A more complex example, by user Enoch (Lance) to catch the word viagra.
Add the Regular Expression :
\b(V|v).{0,12}(G|g).{0,6}(R|r).{0,6}(A|a|@)\b
: to your Purify Blacklist.


Regular expressions can be tricky. For example, this extremely useful RegEx of Enoch's happens also to catch the phrase

"VS GEORGIA" which you wouldn't think would be a big deal until there was a battle between Russia and Georgia which dominated the news.

Jeff Hendrickson
QUOTE
Regular expressions can be tricky.


Indeed. It's always best to test them with the Purify test utility.

Since the massive update of the GeoIP database, I've been relying less and less on the blacklist, and bayesian filtering, and more on the Country Filter in Purify.

If you turn on Enforce Country Filter For URLs, and set up your Country Filter to filter email from Russia, China, Korea, India, etc..., you will be shocked at how effective this ONE Purify feature is.... smile.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2012 Invision Power Services, Inc.