Navigation Logo 7.7  Search and Replace Navigation Logo

 

 

Tcl has one more command that deals with regular expressions. It is regsub and its purpose is to do search and replace operations. This command is invoked as follows:

regsub ?SWITCHES? PATTERN STRING REPLACE_PATTERN VAR_NAME
Finds the first occurrence of PATTERN in STRING, replaces it in the manner determined by REPLACE_PATTERN and assigns the resulting string to VAR_NAME.

There is an -all switch to force replacement of all occurrences of PATTERN. In any case, the return value is the number of replacements.

If there is no occurrence of PATTERN, VAR_NAME gets an unchanged copy of STRING.

REPLACE_PATTERN may be a simple string you want substituted someplace inside STRING. For example,

regsub -all dog $Script cat Script
This command replaces all instances of "dog" in Script with "cat" and puts the resulting string back into Script.

As with glob and regular-expression patterns, REPLACE_PATTERN may contain special characters that alter its meaning. This is a third kind of pattern you must learn. Happily, it is much easier than the other two. There are two special characters, & and \.

If REPLACE_PATTERN contains the special character & then the & stands for the entire substring that was matched. So,

regsub -all cat $Str (&) Str
will put parentheses around all occurrences of "cat."

Exercise 7.7a

Write a regsub command line that doubles each occurrence of the character "&" in a string Str.

Solution

More complicated substitutions are possible by identifying substrings that match subpatterns of PATTERN and using those substrings to build the replacement pattern. Substrings are identified with parentheses in PATTERN the same way they are for the regexp command.

Suppose we have a line with a date in the form MONTH/DAY/YEAR you want to put it into the form YEAR-MONTH-DAY. For example, you want "06/23/96" to become "96-06-23." I am simplifying this problem a little by assuming that the month, the day, and the year all contain exactly two digits.

Using the preassigned pattern, Digit_, a date has this pattern:

($Digit_+)/($Digit_+)/($Digit_+)
and the three sets of parentheses identify the numbers you need.

This use of regexp would extract the month, day, and year.

regexp "($Digit_+)/($Digit_+)/($Digit_+)" $Line \
       Junk Month Day Year

The replacement string you want is

$Year-$Month-$Day
However, you cannot write it that way when using regsub. You do not get to name the variables that match the parentheses. Instead, the substrings that match subpatterns are represented in REPLACE_PATTERN with \1, \2, and so on. The subpattern represented inside the leftmost parentheses is represented with \1, and so on to the right.

Here is the complete regsub command for the date transforming example.

regsub "($Digit_+)/($Digit_+)/($Digit_+)" $Line \
       \\3-\\1-\\2 Line
Note REPlACE_PATTERN is not a regular expression and I do not have any style rules for writing it. As shown here, REPLACE_PATTERN is interpreted first by the Tcl interpreter and then by regsub. The first interpretation replaces each \\ with \. The second replaces each \i with the corresponding matched substring.

Finally, you should know that regsub will do backslash quoting to permit you to have things like & and \ in your patterns. Also, regsub will treat \0 just like &.

Exercise 7.7b

Write a regsub command line to replace all occurrences of "cat" that are words with "dog" in the string Str. For this exercise, "word" means a string of letters which is bounded on the left and right by something that is not a letter.

Now adjust it so that "cats" is replaced with "dogs."

Solution

 

 

[Sample TK Application]
Author's Home Page
Navigation Logo [Book's Cover]
Order from Amazon.