Class AwkCompiler

  • All Implemented Interfaces:
    PatternCompiler

    public final class AwkCompiler
    extends java.lang.Object
    implements PatternCompiler
    The AwkCompiler class is used to create compiled regular expressions conforming to the Awk regular expression syntax. It generates AwkPattern instances upon compilation to be used in conjunction with an AwkMatcher instance. AwkMatcher finds true leftmost-longest matches, so you must take care with how you formulate your regular expression to avoid matching more than you really want.

    The supported regular expression syntax is a superset of traditional AWK, but NOT to be confused with GNU AWK or other AWK variants. Additionally, this AWK implementation is DFA-based and only supports 8-bit ASCII. Consequently, these classes can perform very fast pattern matches in most cases.

    This is the traditional Awk syntax that is supported:

    • Alternatives separated by |
    • Quantified atoms
      *
      Match 0 or more times.
      +
      Match 1 or more times.
      ?
      Match 0 or 1 times.
    • Atoms
      • regular expression within parentheses
      • a . matches everything including newline
      • a ^ is a null token matching the beginning of a string but has no relation to newlines (and is only valid at the beginning of a regex; this differs from traditional awk for the sake of efficiency in Java).
      • a $ is a null token matching the end of a string but has no relation to newlines (and is only valid at the end of a regex; this differs from traditional awk for the sake of efficiency in Java).
      • Character classes (e.g., [abcd]) and ranges (e.g. [a-z])
        • Special backslashed characters work within a character class
      • Special backslashed characters
        \b
        backspace
        \n
        newline
        \r
        carriage return
        \t
        tab
        \f
        formfeed
        \xnn
        hexadecimal representation of character
        \nn or \nnn
        octal representation of character
        Any other backslashed character matches itself

    This is the extended syntax that is supported:

    • Quantified atoms
      {n,m}
      Match at least n but not more than m times.
      {n,}
      Match at least n times.
      {n}
      Match exactly n times.
    • Atoms
      • Special backslashed characters
        \d
        digit [0-9]
        \D
        non-digit [^0-9]
        \w
        word character [0-9a-z_A-Z]
        \W
        a non-word character [^0-9a-z_A-Z]
        \s
        a whitespace character [ \t\n\r\f]
        \S
        a non-whitespace character [^ \t\n\r\f]
        \cD
        matches the corresponding control character
        \0
        matches null character
    Since:
    1.0
    Version:
    ,
    See Also:
    PatternCompiler, MalformedPatternException, AwkPattern, AwkMatcher
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int CASE_INSENSITIVE_MASK
      A mask passed as an option to the compile methods to indicate a compiled regular expression should be case insensitive.
      static int DEFAULT_MASK
      The default mask for the compile methods.
      static int MULTILINE_MASK
      A mask passed as an option to the compile methods to indicate a compiled regular expression should treat input as having multiple lines.
    • Constructor Summary

      Constructors 
      Constructor Description
      AwkCompiler()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Pattern compile​(char[] pattern)
      Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK);
      Pattern compile​(char[] pattern, int options)
      Compiles an Awk regular expression into an AwkPattern instance that can be used by an AwkMatcher object to perform pattern matching.
      Pattern compile​(java.lang.String pattern)
      Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK);
      Pattern compile​(java.lang.String pattern, int options)
      Compiles an Awk regular expression into an AwkPattern instance that can be used by an AwkMatcher object to perform pattern matching.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • DEFAULT_MASK

        public static final int DEFAULT_MASK
        The default mask for the compile methods. It is equal to 0 and indicates no special options are active.
        See Also:
        Constant Field Values
      • CASE_INSENSITIVE_MASK

        public static final int CASE_INSENSITIVE_MASK
        A mask passed as an option to the compile methods to indicate a compiled regular expression should be case insensitive.
        See Also:
        Constant Field Values
      • MULTILINE_MASK

        public static final int MULTILINE_MASK
        A mask passed as an option to the compile methods to indicate a compiled regular expression should treat input as having multiple lines. This option affects the interpretation of the . metacharacters. When this mask is used, the . metacharacter will not match newlines. The default behavior is for . to match newlines.
        See Also:
        Constant Field Values
    • Constructor Detail

      • AwkCompiler

        public AwkCompiler()
    • Method Detail

      • compile

        public Pattern compile​(char[] pattern,
                               int options)
                        throws MalformedPatternException
        Compiles an Awk regular expression into an AwkPattern instance that can be used by an AwkMatcher object to perform pattern matching.

        Specified by:
        compile in interface PatternCompiler
        Parameters:
        pattern - An Awk regular expression to compile.
        options - A set of flags giving the compiler instructions on how to treat the regular expression. Currently the only meaningful flag is AwkCompiler.CASE_INSENSITIVE_MASK.
        Returns:
        A Pattern instance constituting the compiled regular expression. This instance will always be an AwkPattern and can be reliably be casted to an AwkPattern.
        Throws:
        MalformedPatternException - If the compiled expression is not a valid Awk regular expression.
      • compile

        public Pattern compile​(java.lang.String pattern,
                               int options)
                        throws MalformedPatternException
        Compiles an Awk regular expression into an AwkPattern instance that can be used by an AwkMatcher object to perform pattern matching.

        Specified by:
        compile in interface PatternCompiler
        Parameters:
        pattern - An Awk regular expression to compile.
        options - A set of flags giving the compiler instructions on how to treat the regular expression. Currently the only meaningful flag is AwkCompiler.CASE_INSENSITIVE_MASK.
        Returns:
        A Pattern instance constituting the compiled regular expression. This instance will always be an AwkPattern and can be reliably be casted to an AwkPattern.
        Throws:
        MalformedPatternException - If the compiled expression is not a valid Awk regular expression.
      • compile

        public Pattern compile​(char[] pattern)
                        throws MalformedPatternException
        Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK);

        Specified by:
        compile in interface PatternCompiler
        Parameters:
        pattern - A regular expression to compile.
        Returns:
        A Pattern instance constituting the compiled regular expression. This instance will always be an AwkPattern and can be reliably be casted to an AwkPattern.
        Throws:
        MalformedPatternException - If the compiled expression is not a valid Awk regular expression.
      • compile

        public Pattern compile​(java.lang.String pattern)
                        throws MalformedPatternException
        Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK);

        Specified by:
        compile in interface PatternCompiler
        Parameters:
        pattern - A regular expression to compile.
        Returns:
        A Pattern instance constituting the compiled regular expression. This instance will always be an AwkPattern and can be reliably be casted to an AwkPattern.
        Throws:
        MalformedPatternException - If the compiled expression is not a valid Awk regular expression.