Package Torello.HTML

Class TagNode.AttrRegEx

  • Enclosing class:
    TagNode

    public static final class TagNode.AttrRegEx
    extends java.lang.Object
    TagNode Attribute Regular Expressions - Documentation

    All instances of class TagNode are simply wrapped Java-String instance objects. The class TagNode, indeed has well over a dozen instance methods, but the internal data is nothing more than a String that contains the exact text of the HTML Element. Since Java String's are always immutable, modifying the internal-attributes of an element requires creating a new TagNode Object. The way information / data about the individual attribute key-value pairs (for example: HREF="http://some.url.com") involves using standard Java regular-expressions to parse the attribute key-value pairs, and then returning the data via a standard-Java Stream<String>, or another, ubiquitous, Java data structure. This inner, static class just keeps the regular-expressions used by the class TagNode together. Generally, it is not very crucial to understand how the regular-expressions parse the attributes inside of an HTML Element Tag, but if further understanding of this HTML package is needed, the expressions are all here for review. They are fully documented, and links to their use inside class TagNode are even provided in some cases.



    • Field Detail

      • KEY_VALUE_REGEX

        public static final java.util.regex.Pattern KEY_VALUE_REGEX
        NOTE: Knowledge and understanding of java.util.regex.* can be helpful for many of the search and update routines in this JAR library. However, neither class TagNode, nor any of the advanced Node-Search class routines mandate using java's Regular-Expression Package Library

        This is the regular-expression used to match inner-tag key-value pairs inside an already instantiated HTML Element. This regular-expression will match 3 types of key-value pairs:
        Regular-Expression Sub-Part Explanation
        '[^']*?' Single-Quote Match: A regular-expression for matching a key-value pair surrounded by single-quotes.
        \"[^\"]*?\" Double-Quote Match: Matches a key-value pair that employs double-quotes.
        [^\"'>\\s]* No Quotes Used: This will match a key-value pair that doesn't use quotation marks. Note that white-space may not be used in the value-String.
        ([\\w-]+?)= Attribute-Key: This is the "Attribute Name" or also called "Inner-Tag" of the key-value pair.
        \\s+? Mandatory Leading White-Space: When inner-tags are defined, their key-value pairs must be separated by at least one space-character.

        MATCH GROUPS: The table below will help explain / point-out how each of the "Regular-Expression Group-Matches" would evaluate. To retrieve a sub-part of a match, use the method java.util.regex.Matcher.group(int), where the integer parameter specifies a group number. A group is "created" by surrounding part of the Reg-Ex with opening and closing parenthesis.
        Match Group Number Group Return String
        matcher.group(1) Returns entire key-value pair (as a String), leaving out the leading white-space
        matcher.group(2) Returns 'key' String of the key-value attribute
        matcher.group(3) Returns 'value' String of the key-value attribute. Note that if there are surrounding-quotes, they will be includedd in this return String.

        NOTE: The first set, or "opening pair", of parenthesis begin with the marker '?:'. This means that 1) this is a reg-ex group, but 2) this is what is known as a 'non-capturing group.' No invokation of java.util.regex.Matcher.group(...) (for any group-number) could ever include the text between this particular parenthesis pair.
        See Also:
        TagNode.allAV(boolean, boolean)
        Code:
        Exact Field Declaration Expression:
        1
        2
        3
        4
        5
        6
        7
        8
        9
        public static final Pattern KEY_VALUE_REGEX = Pattern.compile(
                    "(?:\\s+?" +                    // mandatory leading white-space
                        "(([\\w-]+?)=(" +           // inner-tag name (a.k.a. 'key' or 'attribute-name')
                            "'[^']*?'"     + "|" +  // inner-tag value using single-quotes ... 'OR'
                            "\"[^\"]*?\""   + "|" + // inner-tag value using double-quotes ... 'OR'
                            "[^\"'>\\s]*"   +       // inner-tag value without quotes
                    ")))",
                    Pattern.CASE_INSENSITIVE | Pattern.DOTALL
                );
        
      • QUOTES_AND_VALUE_REGEX

        public static final java.util.regex.Pattern QUOTES_AND_VALUE_REGEX
        LEGACY-REGEX: Was used by the method TagNode.AV(String), among others. Note that this regular-expression will be deprecated, since it is redundant.

        CAPTURE GROUPS: Nearly the entire Reg-Ex is surrounding by parenthesis. m.group(1) shall return a String that differs with m.group(), less a leading '=' (equals-sign).
        See Also:
        TagNode.AV(String)
        Code:
        Exact Field Declaration Expression:
        1
        2
        3
        4
        5
        6
        7
        8
        9
        public static final Pattern QUOTES_AND_VALUE_REGEX = Pattern.compile(
                    // Matches, for example:  ='MyClass'   or    ="MyClass"   or   =MyClass
                    "=(" + 
                        "\"[^\"]*?\""   + "|" + // inner-tag value using single-quotes ... 'OR'
                        "'[^']*?'"      + "|" + // inner-tag value using double-quotes ... 'OR'
                        "[\\w-]+"       +       // inner-tag value without quotes
                    ")",
                    Pattern.DOTALL
                );
        
      • ATTRIBUTE_KEY_REGEX

        public static final java.util.regex.Pattern ATTRIBUTE_KEY_REGEX
        This matches all valid attribute-keys (not values) of HTML Element key-value pairs.

        • PART-1: [A-Za-z_] The first character must be a letter or the underscore.
        • PART-2: [A-Za-z0-9_-] All other characters must be alpha-numeric, the dash '-', or the underscore '_'.
        See Also:
        InnerTagKeyException.check(String[]), TagNode.allKeyOnlyAttributes(boolean)
        Code:
        Exact Field Declaration Expression:
        1
        2
        public static final Pattern ATTRIBUTE_KEY_REGEX = 
                    Pattern.compile("^[A-Za-z_][A-Za-z0-9_-]*$");
        
      • DATA_ATTRIBUTE_REGEX

        public static final java.util.regex.Pattern DATA_ATTRIBUTE_REGEX
        This is used to match HTML "Data-Attribute" elements. An HTML Data-Attribute is one which the attribute-name of the attribute key-value pair - begins with the characters'data-*'

        NOTE: Knowledge and understanding of java.util.regex.* can be helpful for many of the search and update routines in this JAR library. However, neither class TagNode, nor any of the advanced Node-Search class routines mandate using java's Regular-Expression Package Library.

        The table include below is a brief explanation of what each of the elements of this Regular-Expression for capturing HTML Data Inner-Tags can do.
        Regular-Expression Sub-Part Explanation
        '[^']*?' Single-Quote Match: A regular-expression for matching a key-value pair surrounded by single-quotes.
        \"[^\"]*?\" Double-Quote Match: Matches a key-value pair that employs double-quotes.
        [^\"'>\\s]* No Quotes Used: This will match a key-value pair that doesn't use quotation marks. Note that white-space may not be used in the value-String.
        ([\\w-]+?) Attribute-Key: This is the "Attribute Name" or also called "Inner-Tag" of the data-attribute key-value pair. Note that the characters 'data-' are required to match the attribute, but are not included in this capture group.

        MATCH GROUPS: The table below will help explain / point-out how each of the "Regular-Expression Group-Matches" would evaluate. To retrieve a sub-part of a match, use the method java.util.regex.Matcher.group(int) - where the integer parameter specifies a group number. A group is "created" by surrounding part of the Reg-Ex with opening and closing parenthesis.
        Match Group Number Group Return String
        matcher.group(1) Returns entire data-attribute key-value pair (as a String), leaving out the leading white-space
        matcher.group(2) Returns 'key' String of the data-attribute key-value attribute. Note that the initial substring data-* is not included along with the attribute-name (return value) for this capture-group, because it is outside of the capturing parenthesis for this group.
        matcher.group(3) Returns 'value' String of the key-value attribute. Note that if there are surrounding-quotes, they will be includedd in this return String.

        NOTE: The first set, or "opening pair", of parenthesis begin with the marker '?:'. This means that 1) this is a reg-ex group, but 2) this is what is known as a 'non-capturing group.' No invokation of java.util.regex.Matcher.group(...) (for any group-number) could ever include the text between this particular parenthesis pair.
        See Also:
        TagNode.getDataAN(), TagNode.getDataAV()
        Code:
        Exact Field Declaration Expression:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        public static final Pattern DATA_ATTRIBUTE_REGEX = Pattern.compile(
                    // regex will match, for example:   data-src="https://cdn.imgur.com/MyImage.jpg"
                    "(?:\\s+?" +                            // mandatory leading white-space
                        "(data-([\\w-]+?)=" +               // data inner-tag name 
                            "(" +   "'[^']*?'"      + "|" + // inner-tag value using single-quotes ... 'OR'
                                    "\"[^\"]*?\""   + "|" + // inner-tag value using double-quotes ... 'OR
                                    "[^\"'>\\s]*"   +       // inner-tag value without quotes
                        ")))",
                    Pattern.CASE_INSENSITIVE | Pattern.DOTALL  
                );
        
      • CSS_INLINE_STYLE_REGEX

        public static final java.util.regex.Pattern CSS_INLINE_STYLE_REGEX
        NOTE: Knowledge and understanding of java.util.regex.* can be helpful for many of the search and update routines in this JAR library. However, neither class TagNode, nor any of the advanced Node-Search class routines mandate using Java's Regular-Expression Package Library

        This is a regular expression Pattern that matches CSS Style Definitions that are directly 'inlined' into HTML TagNode instances.
        Regular-Expression Sub-Part Explanation
        -?[_a-zA-Z]+[_\\-a-zA-Z0-9]* The standard CSS-Token definition. The CSS variable-name may begin with a '-' (dash), and may then have a letter a..z, A..Z. Afterwards, the declaration may only contain the following: letters, numbers, dashes and/or the underscore '_'.
        : After the CSS variable-name, the declaration shall be followed by a colon (':'), and then may contain any ASCII text-characters - except the character, semi-colon (';').
        ;|$|[\\w]+$ says that the CSS inline declaration should be continued with a semicolon, or it may also reach the end of the 'style' attribute-value after arriving at the "end of the declaration" which, in regular-expressions, is marked by a dollar-sign: '$'.

        MATCH GROUPS: The table below will help explain / point-out how each of the "Regular-Expression Group-Matches" would evaluate. To retrieve a sub-part of a match, use the method regex.Matcher.group(int) - where the integer parameter specifies a group number. A group is "created" by surrounding part of the Reg-Ex with opening and closing parenthesis.
        Match Group Number Group Return String
        matcher.group(1) Returns the CSS Style Property Name, for-instance 'font-weight' or 'border'
        matcher.group(2) Returns the CSS Style Property Value, for-instance bold or 1px 1px 1px 1px
        matcher.group(3) Returns white-space, or the semicolon, that may exist between property definitions.
        See Also:
        TagNode.cssStyle()
        Code:
        Exact Field Declaration Expression:
        1
        2
        3
        4
        5
        6
        7
        public static final Pattern CSS_INLINE_STYLE_REGEX = Pattern.compile(
                        // regex will match, for example:  font-weight: bold;
                        "([_\\-a-zA-Z]+" + "[_\\-a-zA-Z0-9]*)" +    // CSS Style Property Name - Must begin with letter or underscore
                        "\\s*?" + ":" + "\\s*?" +                   // The ":" symbol between property-name and property-value
                        "([^;]+?\\s*)" +                            // CSS Style Property Value
                        "(;|$|[\\w]+$)"                             // text after the "Name : Value" definition    
                );