| |
Regular expressions come from a field called formal language theory. In
this theory, a language is a set of strings and one's efforts are divided
between defining languages, on one hand, and developing algorithms to
recognize which strings belong to a defined language, on the other hand.
-
The word "language" is used differently in this section
than in other sections. Here it simply means a set of strings nothing
more.
Although the presentation in the following sections is self-explanatory,
it is helpful for many readers to give a little thought to the differences
between the theory they have seen and the practice that is explained in
this chapter. Here are some differences between theory and practice
as they apply to regular expressions.
- The theory is concerned with recognizing whether a given string belongs
to a language that has been defined with a regular expression. The practice
is not concerned with whether the given string belongs to the language, but
with determining whether a substring does. If such a substring exists, it is
said to match the regular expression. In any case, the regular expression
is said to be a pattern which a substring may match.
- When multiple substrings can be found in the given string, it is often
important to know which substring the finite automaton will find. This
problem simply does not arise in the theoretical setting.
- The theory is presented with typographical tricks to distinguish between
symbols of the underlying language and symbols of the regular expressions that
define the language. The practice (so far) lives in a world where we are
restricted to what we see on a typewriter keyboard.
- The theory assumes the strings of the language are made from a set of
symbols about which nothing further need be said. In practice, it is necessary
to say something about the symbols. Partly this is because of point 3,
partly this is because we need a way of representing characters that do not
appear on the keyboard or the screen, and partly this is because we need
a shorthand for representing important subsets of the symbols, for example, the
alphabetic letters.
Remark -
Often, when we see theory in class and a related-but-different
practice in the real world, we decide that the theory is not very useful.
Sometimes that is true. This is not one of those times. Without formal
language theory, this chapter would have nothing to say.
What makes the theory seem irrelevant is that it must be presented in the
classroom as simply as possible. Imagine what studying the theory would have
been like if you had needed to deal with the complications listed below. Be
glad that others have dealt with them for you.
|
|