Package Torello.HTML

Class Attributes


  • public class Attributes
    extends java.lang.Object
    Attributes - Documentation.

    This class is used to perform iteration-loops over HTML Element Vectors where each and every TagNode Attribute can be updated, modified, added or removed with just a single method invocation. This class can be used in conjunction with the 'AUM' enumerated-type class where the type of updated is refined/specified.

    It is important to note that these methods are really just for-loops that update an html-page with nodes whose attributes have changed. Generally, the methods in this class will not save a lot of typing - since the for-loop is not very long and replacing old HTML Elements with new ones in a Vector should be easy. However, with error checking, exception reporting and String-concatenation provided by the enum 'AUM' (Attribute Update Mode) enumerated-type - the value of using this class over a simple for loop becomes more apparent: less error-prone & simpler code.

    "Re-Inventing the Wheel' is something that happens in American Computer-Programming circles pretty easily. C#, for instance, but before getting into complaints about software system engineering, it should be pointed out that this class (class Attributes) along with enum AUM - when working together - behave similarly to the pair: class ReplaceNodes and class ReplaceFunction. Both of these are generally used to replace HTML TagNode's in a vectorized-html web-page with ones that have updated attributes.

    Static (Functional) API: The methods in this class are all (100%) defined with the Java Key-Word / Key-Concept 'static'. Furthermore, there is no way to obtain an instance of this class, because there are no public (nor private) constructors. Java's Spring-Boot, MVC feature is *not* utilized because it flies directly in the face of the light-weight data-classes philosophy. This has many advantages over the rather ornate Component Annotations (@Component, @Service, @AutoWired, etc... 'Java Beans') syntax:

    • The methods here use the key-word 'static' which means (by implication) that there is no internal-state. Without any 'internal state' there is no need for constructors in the first place! (This is often the complaint by MVC Programmers).
    • A 'Static' (Functional-Programming) API expects to use fewer data-classes, and light-weight data-classes, making it easier to understand and to program.
    • The Vectorized HTML data-model allows more user-control over HTML parse, search, update & scrape. Also, memory management, memory leakage, and the Java Garbage Collector ought to be intelligible through the 'reuse' of the standard JDK class Vector for storing HTML Web-Page data.

    The power that object-oriented programming extends to a user is (mostly) limited to data-representation. Thinking of "Services" as "Objects" (Spring-MVC, 'Java Beans') is somewhat 'over-applying' the Object Oriented Programming Model. Like most classes in the Java-HTML JAR Library, this class backtracks to a more C-Styled Functional Programming Model (no Objects) - by re-using (quite profusely) the key-word static with all of its methods, and by sticking to Java's well-understood class Vector

    Internal-State: A user may click on this class' source code (see link below) to view any and all internally defined fields class. A cursory inspection of the code would prove that this class has precisely zero internally defined global fields (Spaghetti). All variables used by the methods in this class are local fields only, and therefore this class ought to be though of as 'state-less'.
    See Also:
    AUM



    • Method Detail

      • update

        public static int[] update​(java.util.Vector<? super TagNode> html,
                                   AUM mode,
                                   int sPos,
                                   int ePos,
                                   java.lang.String innerTag,
                                   java.lang.String itValue,
                                   SD quote)
        Will update any HTML TagNode's present in the vector-parameter 'html' according to passed AUM mode and the 'innerTag' parameter.

        NOTE: This method restricts the update process to the specified subrange sPos ... ePos of the 'html' Vector.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        mode - Since the purpose of this class is to update, modify, or remove HTML Element Inner-Tag key-value pairs, the mechanism - or the desired behavior - of the update process needs to be specified. Use the enumerated type enum 'AUM'. for choosing what update type is needed.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        innerTag - This is the name of the HTML attribute that needs to be changed, added, or removed.
        itValue - This is the value that the attribute needs to be set, or removed, depending upon which of the AUM modes is selected. If AUM.RemoveSubString were chosen, then all HTML Elements within the specified Vector range would have the first copy of 'itValue' removed from any TagNode's containing an attribute with name 'innerTag'.
        quote - The programmer is expected to select either SD.Single-Quote or SD.Double-Quote. The updated Attribute / Inner-Tag key-value pairs will be surrounded by the selected quote. Always remember that the class 'TagNode' checks for quotes-within-quotes, and will throw an exception if two-double quotes also contain a double-quote within the inner-tag value, or vice-versa (single-quotes within a two single-quoted attribute-value).
        Returns:
        This method shall return an integer-array index-list whose values identify which HTML Vector Elements were changed as a result of this method invocation.

        NOTE: One minor subtlety, there could be cases where a new HTML Element 'TagNode' reference / object were instantiated or 'created,' even though the actual String that comprised the HTMLNode itself were identical to the original HTMLNode.str String. In the 'AUM' enumerated-type, when AUM.Set is invoked, the original String data for an attribute is always clobbered, even in cases where an identical version of the String is replaced or substituted.
        Throws:
        QuotesException - If there are "quotes within quotes" problems when invoking the TagNode constructor, this exception will throw. The problem occurs when one or more of the attribute key-value pairs have a quotation-choice such that the chosen quotation-mark is also found within the attribute-value.

        QuotesException will also throw in the case that an attribute key-value pair has elected to use the "No Quotes" option, but the attribute-value contains white-space.
        InnerTagKeyException - This exception will throw if a non-standard String-value is passed to parameter String 'innerTag'. HTML expects that an attribute-name conform to a set of rules in order to be processed by a browser.
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        AUM.update(TagNode, String, String, SD), LV, TagNode.isTagNode(), TagNode.isClosing
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
         InnerTagKeyException.check(innerTag);
        
         IntStream.Builder   b   = IntStream.builder();      // Use Java Stream to keep a list of Vector-Locations
                                                             // that were updated / modified.
         int                 MIN = 3 + innerTag.length();    // minimum possible length to have the specified attribute at all.
                                                             // '<', TOKEN, SPACE, ATTRIBUTE, '>'
         LV                  l = new LV(sPos, ePos, html);   // Loop Variable
         HTMLNode            n;                              // Temporary Variables
         TagNode             tn;
        
         for (int i=l.start; i < l.end; i++)
             if (    ((n = (HTMLNode) html.elementAt(i)).isTagNode())            // Only instances of TagNode have attributes, NOT TextNode or CommentNode
                 &&  (! (tn = (TagNode) n).isClosing)                            // TC.OpeningTags have attributes, Closing-Element Nodes cannot have them
                 &&  ((mode == AUM.Set) ||                                       // AUM.Set does not require the attribute to already exist
                         (tn.str.length() >= (MIN + tn.tok.length())))           // minimum possible length to have the specified attribute at all.
                 && ((tn = mode.update(tn, innerTag, itValue, quote)) != null) ) // If AUM.update returns a new TagNode, we must replace the old one.
             {
                 html.setElementAt(tn, i);                                       // Replace the old TagNode
                 b.accept(i);                                                    // Make sure to keep the index where it resides, to return to the user
             }
        
         // Build the IntStream, Convert the IntStream -> int[], Return it.
         return b.build().toArray();
        
      • update

        public static int[] update​(java.util.Vector<? super TagNode> html,
                                   AUM mode,
                                   int[] posArr,
                                   java.lang.String innerTag,
                                   java.lang.String itValue,
                                   SD quote)
        Will update any HTML TagNode's present in the vector-parameter 'html' according to a passed 'AUM' mode and the 'innerTag' parameter.

        NOTE: This method restricts the removal process to only nodes specified by the Vector-index parameter 'posArr'.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        mode - Since the purpose of this class is to update, modify, or remove HTML Element Inner-Tag key-value pairs, the mechanism - or the desired behavior - of the update process needs to be specified. Use the enumerated type enum 'AUM'. for choosing what update type is needed.
        posArr - This integer-array is expected to receive a "Pointer-Integer Array." These are usually generated by the NodeSearch 'Find' classes, and are simply lists of index-pointers into a Vectorized HTML Web-Page Vector. The int[] array passed to this parameter will specify the TagNode's in the Vector whose attributes will be partially removed via a call to TagNode.removeAV(...) and replaced.

        For Example:
        1
        2
        3
        4
        5
        6
        7
        8
        9
         // This line will retrieve an array "index-pointer" to every HTML Paragraph Element.
         int[] posArr         = TagNodeFind.all(htmlPage, TC.OpeningTags, "p");
        
         // This line will ensure that every HTML Paragraph Element that was found on the HTML
         // page in the previous line of code - shall have a CSS class='MyClass' key-value
         // Inner-Tag.  The returned array will contain a list of pointers to HTML Paragraph 
         // Elements that were changed.
         int[] changedPosArr  = Attributes.update(htmlPage, AUM.set, posArr, "class", "MyClass", SD.SingleQuote);
         
        
        innerTag - This is the name of the HMTL attribute that needs to be changed, added, or removed.
        itValue - This is the value that the attribute needs to be set, or removed, depending upon which of the AUM modes is selected. If AUM.RemoveSubString were chosen, then all HTML Elements within the specified Vector range would have the first copy of 'itValue' removed from any HTML containing an attribute with name 'innerTag'.
        quote - The programmer is expected to select either SD.Single-Quote or SD.Double-Quote. The updated Attribute / Inner-Tag key-value pairs will be surrounded by the selected quote. Always remember that the class 'TagNode' checks for quotes-within-quotes, and will throw an exception if two-double quotes also contain a double-quote within the inner-tag value, or vice-versa (single-quotes within a two single-quoted attribute-value).
        Returns:
        This method shall return an integer-array index-list whose values identify which HTML Vector Elements were changed as a result of this method invokation.

        NOTE: One minor subtlety, there could be cases where a new HTML Element 'TagNode' reference / object were instantiated or 'created,' even though the actual String that comprised the HTMLNode itself were identical to the original HTMLNode.str String. In the 'AUM' enumerated-type, when AUM.Set is invoked, the original String data for an attribute is always clobbered, even in cases where an identical version of the String is replaced or substituted.
        Throws:
        QuotesException - If there are "quotes within quotes" problems when invoking the TagNode constructor, this exception will throw. The problem occurs when one or more of the attribute key-value pairs have a quotation-choice such that the chosen quotation-mark is also found within the attribute-value.

        QuotesException will also throw in the case that an attribute key-value pair has elected to use the "No Quotes" option, but the attribute-value contains white-space.
        InnerTagKeyException - This exception will throw if a non-standard String-value is passed to parameter String 'innerTag'. HTML expects that an attribute-name conform to a set of rules in order to be processed by a browser.
        java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
        TagNodeExpectedException - This exception shall throw if an identified Vector index is supposed to have an instance of TagNode, but instead had some other HTMLNode instance. If an integer-position array (int[] posArr) one of whose indices does not point to a TagNode, then this exception's throw shall inform the programmer.
        OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but this TagNode has its boolean isClosing field set to TRUE, then this exception shall throw. When passing an int[] posArr integer-array of Vector-indices, and the code expects that each of the locations pointed to in the Vector to contain "Opening HTML Element Tags", then this exception's throw will inform the user.
        See Also:
        AUM.update(TagNode, String, String, SD), TagNode.isTagNode(), TagNode.isClosing
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
         InnerTagKeyException.check(innerTag);
        
         IntStream.Builder   b   = IntStream.builder();          // Use Java Stream to keep a list of Vector-Locations
                                                                 // that were updated / modified.
         int                 MIN = 3 + innerTag.length();        // minimum possible length to have an attribute at all.
                                                                 // '<', TOKEN, SPACE, ATTRIBUTE, '>'
        
         for (int i : posArr)
         {
             HTMLNode n = (HTMLNode) html.elementAt(i);          // Must be an HTML TagNode
             if (! n.isTagNode())  
                 throw new TagNodeExpectedException(i);
        
             TagNode tn = (TagNode) n;                           // Must be an "Opening" HTML TagNode
             if (tn.isClosing)
                 throw new OpeningTagNodeExpectedException(i);
        
             if ((mode != AUM.Set) &&                            // AUM.Set *DOES NOT* require the attribute to exist already (the other *DO*)
                 (tn.str.length() < (MIN + tn.tok.length())))    // Minimum length of this element before it even could have the named inner-tag
                 continue;                                       // '<', TOKEN, SPACE, ATTRIBUTE, '=', '>'
        
             tn = mode.update(tn, innerTag, itValue, quote);
        
             if (tn != null)                                     // non-null ==> an update WAS performed
             {
                 html.setElementAt(tn, i);                       // Replace the old TagNode
                 b.accept(i);                                    // Make sure to keep the index where it resides, to return to the user
             }
         }
        
         // Build the IntStream, Convert the IntStream -> int[], Return it.
          return b.build().toArray();
        
      • removeAll

        public static int[] removeAll​(java.util.Vector<? super TagNode> html,
                                      int sPos,
                                      int ePos)
        The purpose of this method is to remove all attributes / Inner-Tag key-value pairs from each and every non-'TextNode' and non-'CommentNode' HTML Element found on the vectorized-html page parameter 'html'. The removal process is limited to the range specified by method-parameters sPos, ePos.

        Specifically: Each and every class=... id=... src=... alt=... href=... onclick=... etc... attribute from each and every instance of 'TagNode' HTML Element found in the vectorized-page parameter 'html' will be removed. All TagNode's shall contain empty attribute-lists.

        NOTE: This method restricts the removal process to the specified subrange sPos ... ePos of the HTML-Vector.

        Example:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
         // Retrieve the contents from the foreign news source "https://www.gov.cn" - pick an article
         URL url = new URL("http://www.gov.cn/premier/2020-xx/xx/content_5526766.htm");
         Vector<HTMLNode> news = HTMLPage.getPageTokens(url, false);
         
         // Now retrieve the "article body"
         Vector<HTMLNode> body = InnerTagGetInclusive.first(page, "div", "class", TextComparitor.C, "article");
         
         // To view a "pared down" version - with all CSS class, id information removed - call this
         // method, and only the raw HTML tags will remain... <P>, <DIV>, <B>... etc.
         // Passing 0 and -1 means the 'entire-page' is processed.
         Attributes.removeAll(body, 0, -1);
         
         // Print the updated "article body" Vector using the Debug class.  It should be MUCH EASIER to read.
         // Again: all long-winded "class" "ID" and other common HTML clutter has been removed.
         System.out.println(Util.pageToString(body));
         
        
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        An integer array of 'Vector'-index positions of each and every HTML Element 'TagNode' whose attributes were removed.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        TagNode.removeAllAV(), TagNode.isTagNode(), TagNode.isClosing, LV
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
         IntStream.Builder   b = IntStream.builder();
         LV                  l = new LV(sPos, ePos, html);
        
         HTMLNode n;     TagNode tn;
        
         for (int i=l.start; i < l.end; i++)
             if (    ((n = (HTMLNode) html.elementAt(i)).isTagNode())        // Only instances of TagNode have attributes, NOT TextNode or CommentNode
                 &&  (! (tn = (TagNode) n).isClosing)                        // TC.OpeningTags have attributes, Closing-Element Nodes cannot have them
                 &&  (tn.str.length() > (tn.tok.length() + 2))   )           // If element-length = tok-length+2, there are no attributes!
                 {                                                           // '<', TOKEN, '>'
                     html.setElementAt(tn.removeAllAV(), i);                 // Replace the old TagNode
                     b.accept(i);                                            // Make sure to keep the index where it resides, to return to the user
                 }
        
         // Build the IntStream, Convert the IntStream -> int[], Return it.
         return b.build().toArray();
        
      • removeAll

        public static int[] removeAll​(java.util.Vector<? super TagNode> html,
                                      int[] posArr)
        The purpose of this method is to remove all attributes / Inner-Tag key-value pairs from each and every non-'TextNode' and non-'CommentNode' HTML Element found on the vectorized-html page parameter 'html'. The removal process is limited to the only removing attributes from elements pointed to by the contents of passed-parameter 'posArr'

        Specifically: Each and every class=... id=... src=... alt=... href=... onclick=... etc... attribute from each and every instance of 'TagNode' HTML Element found in the vectorized-page parameter 'html' will be removed. All TagNode's shall contain empty attribute-lists.

        NOTE: This method restricts the removal process to only nodes specified by the 'Vector'-index parameter 'posArr'.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        posArr - This integer-array is expected to receive a "Pointer-Integer Array." These are usually generated by the NodeSearch 'Find' classes, and are simply lists of index-pointers into a Vectorized HTML Web-Page Vector. The int[] array passed to this parameter will specify the TagNode's in the Vector whose attributes will be partially removed via a call to TagNode.removeAV(...) and replaced.

        For Example:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
         // This line will retrieve an array "index-pointer" to every HTML Paragraph Element.
         int[] posArr         = TagNodeFind.all(htmlPage, TC.OpeningTags, "p");
        
         // This line will remove every attribute key-value pair from every HTML Paragraph
         // Element on the vectorized-html page 'htmlPage'
         // The returned array will contain a list of pointers to HTML Paragraph Elements that
         // were changed.  Paragraph Elements that were already empty of Inner-Tag key-value pairs
         // will not have a pointer in this index-array.
         int[] changedPosArr  = Attributes.removeAll(htmlPage, posArr);
         
        
        Returns:
        An integer array of 'Vector'-index positions of each and every HTML Element 'TagNode' whose attributes were removed.
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
        TagNodeExpectedException - This exception shall throw if an identified Vector index is supposed to have an instance of TagNode, but instead had some other HTMLNode instance. If an integer-position array (int[] posArr) one of whose indices does not point to a TagNode, then this exception's throw shall inform the programmer.
        OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but this TagNode has its boolean isClosing field set to TRUE, then this exception shall throw. When passing an int[] posArr integer-array of Vector-indices, and the code expects that each of the locations pointed to in the Vector to contain "Opening HTML Element Tags", then this exception's throw will inform the user.
        See Also:
        TagNode.removeAllAV(), TagNode.isTagNode(), TagNode.isClosing
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
         IntStream.Builder b = IntStream.builder();          // Use Java Stream to keep a list of Vector-Locations
                                                             // that were updated / modified.
        
         for (int i : posArr)
         {
             HTMLNode n = (HTMLNode) html.elementAt(i);
             if (! n.isTagNode())                            // Must be an HTML TagNode
                 throw new TagNodeExpectedException(i);
        
             TagNode tn = (TagNode) n;
             if (tn.isClosing)                               // Must be an "Opening" HTML TagNode
                 throw new OpeningTagNodeExpectedException(i);
        
             if (tn.str.length() > (tn.tok.length() + 2))    // If element-length = tok-length+2, there are no attributes!
             {                                                   
                 html.setElementAt(tn.removeAllAV(), i);     // Replace the old TagNode
                 b.accept(i);                                // Make sure to keep the index where it resides, to return to the user
             }
         }
        
         // Build the IntStream, Convert the IntStream -> int[], Return it.
         return b.build().toArray();
        
      • removeData

        public static int[] removeData​(java.util.Vector<? super TagNode> html,
                                       int sPos,
                                       int ePos)
        The purpose of this method is to remove all HTML data-attribute key-value pairs from each and every non-'TextNode' and non-'CommentNode' HTML Element found on the vectorized-html page parameter 'html'

        NOTE: This method restricts the removal process to the specified subrange sPos ... ePos of the HTML-Vector.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        An integer array of 'Vector'-index positions of each and every HTML Element 'TagNode' which did contain HTML data-attributes that have since been removed.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        TagNode.removeDataAttributes(), TagNode.isTagNode(), TagNode.isClosing, LV
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
         IntStream.Builder   b   = IntStream.builder();      // Use Java Stream to keep a list of Vector-Locations
                                                             // that were updated / modified.
         int                 MIN = 9;                        // Minimum Length of TagNode.str to even have a "data-*=" attribute
                                                             // '<', TOKEN, SPACE, "data-*", '>';
         LV                  l   = new LV(sPos, ePos, html); // Loop Variable
        
         HTMLNode            n;                              // Temp Variables
         TagNode             tn, newTN;                      // Temp Variables
        
         for (int i=l.start; i < l.end; i++)
             if (    ((n = (HTMLNode) html.elementAt(i)).isTagNode())        // Only instances of TagNode have attributes, NOT TextNode or CommentNode
                 &&  (! (tn = (TagNode) n).isClosing)                        // TC.OpeningTags have attributes, Closing-Element Nodes cannot have them
                 &&  (tn.str.length() >= (tn.tok.length() + MIN))            // Minimum Length of TagNode.str to even have a "data-*=" attribute
                 &&  ((newTN = tn.removeDataAttributes()) != tn))            // A "new" TagNode is *only returned* by this method if the "data-attributes" were removed.
                 {                                                   
                     html.setElementAt(newTN, i);                            // Replace the old TagNode
                     b.accept(i);                                            // Make sure to keep the index where it resides, to return to the user
                 }
        
         // Build the IntStream, Convert the IntStream -> int[], Return it.
         return b.build().toArray();
        
      • removeData

        public static int[] removeData​(java.util.Vector<? super TagNode> html,
                                       int[] posArr)
        The purpose of this method is to remove all HTML data-attribute key-value pairs from each and every non-'TextNode' and non-'CommentNode' HTML Element found on the vectorized-html page parameter 'html'

        NOTE: This method restricts the removal process to only nodes specified by the 'Vector'-index parameter 'posArr'.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        posArr - This integer-array is expected to receive a "Pointer-Integer Array." These are usually generated by the NodeSearch 'Find' classes, and are simply lists of index-pointers into a Vectorized HTML Web-Page Vector. The int[] array passed to this parameter will specify the TagNode's in the Vector whose attributes will be partially removed via a call to TagNode.removeAV(...) and replaced.

        For Example:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
         // This line will retrieve an array "index-pointer" to every HTML Image Element.
         int[] posArr         = TagNodeFind.all(htmlPage, TC.OpeningTags, "img");
        
         // This line will remove every "data-attribute" key-value pair from every HTML Image
         // Element on the vectorized-html page 'htmlPage'
         // The returned array will contain a list of pointers to HTML Paragraph Elements that
         // were changed.  Image Elements that did not have "data-" HTML InnerTags
         // will not have a pointer in this index-array.
         int[] changedPosArr  = Attributes.removeData(htmlPage, posArr);
         
        
        Returns:
        An integer array of 'Vector'-index positions of each and every HTML Element 'TagNode' whose data attributes removed.
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
        TagNodeExpectedException - This exception shall throw if an identified Vector index is supposed to have an instance of TagNode, but instead had some other HTMLNode instance. If an integer-position array (int[] posArr) one of whose indices does not point to a TagNode, then this exception's throw shall inform the programmer.
        OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but this TagNode has its boolean isClosing field set to TRUE, then this exception shall throw. When passing an int[] posArr integer-array of Vector-indices, and the code expects that each of the locations pointed to in the Vector to contain "Opening HTML Element Tags", then this exception's throw will inform the user.
        See Also:
        TagNode.removeDataAttributes(), TagNode.isTagNode(), TagNode.isClosing
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
         IntStream.Builder   b   = IntStream.builder();      // Use Java Stream to keep a list of Vector-Locations
                                                             // that were updated / modified.
         int                 MIN = 9;                        // Minimum Length of TagNode.str to even have a "data-*=" attribute
                                                             // '<', TOKEN, SPACE, "data-*", '>'
        
         for (int i: posArr)
         {
             HTMLNode n = (HTMLNode) html.elementAt(i);
             if (! n.isTagNode())                            // Must be an HTML TagNode
                 throw new TagNodeExpectedException(i);
        
             TagNode tn = (TagNode) n;
             if (tn.isClosing)                               // Must be an "Opening" HTML TagNode
                 throw new OpeningTagNodeExpectedException(i);
        
             if (tn.str.length() < (tn.tok.length() + MIN))  // Minimum Length of TagNode.str to even have a "data-*=" attribute
                 continue;                                   // '<', TOKEN, SPACE, "data-*", '>' 
        
             TagNode newTN = tn.removeDataAttributes();
        
             if (newTN != tn)                                // A "new" TagNode is *only returned* by this method 
                                                             // if the "data-attributes" were removed.
             {                                                   
                 html.setElementAt(newTN, i);                // Replace the old TagNode
                 b.accept(i);                                // Make sure to keep the index where it resides, to return to the user
             }
         }
        
         return b.build().toArray();                         // Build the IntStream, Convert the IntStream -> int[], Return it.
        
      • remove

        public static int[] remove​(java.util.Vector<? super TagNode> html,
                                   int sPos,
                                   int ePos,
                                   java.lang.String... innerTags)
        This will remove all copies of the attributes whose names are listed among the by String[] array parameter 'innerTags' from the vectorized-html web-page parameter 'html'.

        NOTE: This method restricts the removal process to the specified subrange sPos ... ePos of the 'html' vector.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        innerTags - This String, or list of String's, should contain valid HTML Element inner-tag names. Any instances of these attributes which are found inside HTML TagNode's on the web-page Vector will be removed from the TagNode, and the old TagNode will be replaced in the vectorized-html web-page with a new, pared-down, TagNode.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        An integer-array whose elements function as 'Vector'-index pointers into the original vectorized-html web-page parameter 'html.' The nodes/references pointed to by the pointers in this array are the nodes/elements that were changed, and now contain new TagNode elements whose attribute key-value pairs were removed by this method.
        Throws:
        InnerTagKeyException - This exception will throw if a non-standard String-value is passed to parameter String 'innerTag'. HTML expects that an attribute-name conform to a set of rules in order to be processed by a browser.
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        java.lang.IllegalArgumentException - If parameter 'innerTags' has zero elements.
        See Also:
        TagNode.removeAttributes(String[]), LV, TagNode.hasOR(boolean, String[]), TagNode.isTagNode(), TagNode.isClosing, InnerTagKeyException.check(String[])
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
         InnerTagKeyException.check(innerTags);
        
         IntStream.Builder   b   = IntStream.builder();      // Use Java Stream to keep a list of Vector-Locations
                                                             // that were updated / modified.
         int                 MIN = 1000;                     // Compute the "minimum length" of a TagNode.str field
         LV                  l   = new LV(sPos, ePos, html); // Loop Variable
         HTMLNode            n;                              // Temp Variables
         TagNode             tn;     
        
         // Minimum-Length of TagNode.str would have to be 3 + smallest inner-tag passed
         for (String attrib : innerTags) if (attrib.length() < MIN) MIN = attrib.length();
        
         for (int i=l.start; i < l.end; i++)
             if (    ((n = (HTMLNode) html.elementAt(i)).isTagNode())        // Only instances of TagNode have attributes, NOT TextNode or CommentNode
                 &&  (! (tn = (TagNode) n).isClosing)                        // TC.OpeningTags have attributes, Closing-Element Nodes cannot have them
                 &&  (tn.str.length() >= (tn.tok.length() + MIN))            // If element-length is less than the MINIMUM length, it couldn't have *ANY* of the named attributes
                 &&  tn.hasOR(false, innerTags))                             // If this TagNode has the attributes that have been requested for removal, then...
                 {
                     // Build a new TagNode, and then replace the old one with the newly built one
                     // on the page or sub-page, and at the same location.
                     tn = tn.removeAttributes(innerTags);
                     html.setElementAt(tn, i);
        
                     // Java's IntStream-Builder is just a way to "build" a short list of integer's.  At the end of this
                     // method, the list will be built and returned to the user.  It shall contain all Vector locations
                     // where a "TagNode swap" (replaced TagNode, with attributes filtered) has occurred.
                     b.accept(i);
                 }
        
         // Build the IntStream, Convert the IntStream -> int[], Return it.
         return b.build().toArray();
        
      • remove

        public static int[] remove​(java.util.Vector<? super TagNode> html,
                                   int[] posArr,
                                   java.lang.String... innerTags)
        This will remove all copies of the attributes whose names are listed among the by String[] array parameter 'innerTags' from the vectorized-html web-page parameter 'html'.

        NOTE: This method restricts the removal process to only nodes specified by the 'Vector'-index parameter 'posArr'.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        innerTags - This String, or list of String's, should contain valid HTML Element inner-tag names. Any instances of these attributes which are found inside HTML TagNode's on the web-page Vector will be removed from the TagNode, and the old TagNode will be replaced in the vectorized-html web-page with a new, pared-down, TagNode.

        AGAIN: This method shall only modify TagNode's if their Vector-index locations in 'html' are listed in 'posArr'.
        posArr - This integer-array is expected to receive a "Pointer-Integer Array." These are usually generated by the NodeSearch 'Find' classes, and are simply lists of index-pointers into a Vectorized HTML Web-Page Vector. The int[] array passed to this parameter will specify the TagNode's in the Vector whose attributes will be partially removed via a call to TagNode.removeAV(...) and replaced.

        For Example:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
         // This line will retrieve an array "index-pointer" to every HTML Paragraph Element.
         int[] posArr         = TagNodeFind.all(htmlPage, TC.OpeningTags, "p");
        
         // This line will remove attribute key-value pairs for 'class' and 'id' from every HTML 
         // Paragraph Element on the vectorized-html page 'htmlPage.'  The returned array will 
         // contain a list of pointers to HTML Paragraph Elements that were changed.  Paragraph
         // Elements that did not contain a 'class' nor an 'id' inner-tag will not have a pointer 
         // in the returned index-array, and therefore will not have been modified.
         int[] changedPosArr  = Attributes.remove(htmlPage, posArr, "class", "id");
         
        
        Returns:
        An integer-array whose elements function as 'Vector'-index pointers into the original vectorized-html web-page parameter 'html.' The nodes/references pointed to by the pointers in this array are the nodes/elements that were changed, and now contain new TagNode elements whose attribute key-value pairs were removed by this method.
        Throws:
        InnerTagKeyException - This exception will throw if a non-standard String-value is passed to parameter String 'innerTag'. HTML expects that an attribute-name conform to a set of rules in order to be processed by a browser.
        java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
        TagNodeExpectedException - This exception shall throw if an identified Vector index is supposed to have an instance of TagNode, but instead had some other HTMLNode instance. If an integer-position array (int[] posArr) one of whose indices does not point to a TagNode, then this exception's throw shall inform the programmer.
        OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but this TagNode has its boolean isClosing field set to TRUE, then this exception shall throw. When passing an int[] posArr integer-array of Vector-indices, and the code expects that each of the locations pointed to in the Vector to contain "Opening HTML Element Tags", then this exception's throw will inform the user.
        java.lang.IllegalArgumentException - If parameter 'innerTags' has zero elements.
        See Also:
        TagNode.removeAttributes(String[]), TagNode.hasOR(boolean, String[]), TagNode.isTagNode(), TagNode.isClosing, InnerTagKeyException.check(String[])
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
         InnerTagKeyException.check(innerTags);
        
         IntStream.Builder   b   = IntStream.builder();      // Use Java Stream to keep a list of Vector-Locations
                                                             // that were updated / modified.
         int                 MIN = 1000;                     // Compute the "minimum length" of a TagNode.str field
        
         // Minimum-Length of TagNode.str would have to be 3 + smallest inner-tag passed
         for (String attrib : innerTags) if (attrib.length() < MIN) MIN = attrib.length();
        
         for (int i : posArr)
         {
             HTMLNode n = (HTMLNode) html.elementAt(i);
             if (! n.isTagNode())                            // Must be an HTML TagNode
                 throw new TagNodeExpectedException(i);
        
             TagNode tn = (TagNode) n;
             if (tn.isClosing)                               // Must be an "Opening" HTML TagNode
                 throw new OpeningTagNodeExpectedException(i);
        
             if (tn.str.length() < (tn.tok.length() + MIN))  // If element-length <= MIN, none of the attributes could possibly be present.
                 continue;                                   // MINOR Optimization...
        
             if (tn.hasOR(false, innerTags))                 // If this TagNode has the attributes that have been requested for removal, then...
             {
                 // Build a new TagNode, and then replace the old one with the newly built one
                 // on the page or sub-page, and at the same location.
                 tn = tn.removeAttributes(innerTags);
                 html.setElementAt(tn, i);
        
                 // Java's IntStream-Builder is just a way to "build" a short list of integer's.  At the end of this
                 // method, the list will be built and returned to the user.  It shall contain all Vector locations
                 // where a "TagNode swap" (replaced TagNode, with attributes filtered) has occurred.
                 b.accept(i);
             }
         }
        
         return b.build().toArray();                        // Build the IntStream, Convert the IntStream -> int[], Return it.
        
      • retrieve

        public static Ret2<int[],​java.lang.String[]> retrieve​
                    (java.util.Vector<? extends HTMLNode> html,
                     java.lang.String attribute)
        
        Convenience Method. Invokes retrieve(Vector, int, int, String)
        Code:
        Exact Method Body:
        1
         return retrieve(html, 0, -1, attribute);
        
      • retrieve

        public static Ret2<int[],​java.lang.String[]> retrieve​
                    (java.util.Vector<? extends HTMLNode> html,
                     DotPair dp,
                     java.lang.String attribute)
        
        Convenience Method. Invokes retrieve(Vector, int, int, String)
        Code:
        Exact Method Body:
        1
         return retrieve(html, dp.start, dp.end + 1, attribute);
        
      • retrieve

        public static Ret2<int[],​java.lang.String[]> retrieve​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos,
                     java.lang.String attribute)
        
        The purpose of this method is to retrieve the value of each attribute in each TagNode in an HTML Vector (or sub-Vector) that contained such an attribute.

        NOTE: This method restricts the retrieval process to the specified subrange sPos ... ePos of the HTML-Vector.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        attribute - This is the HTML attribute-value that is being retrieved. Each instance of TagNode in the input 'html' parameter Vector shall be searched for this attribute-name. If this attribute is present in any of the TagNode's in HTML, then that TagNode's location (Vector-index) shall be returned in the int[] position array, and the value of that attribute shall be returned in the String[] array.
        Returns:
        An instance of Ret2<int[], String[]> where the two return-fields are as follows:

        • Ret2.a (int[])

          This an integer-array int[] containing the indices of each instance of TagNode that contained a non-null attribute matching parameter 'attribute'.

        • Ret2.b (String[])

          This a String-array String[] containing the values of the attributes in the TagNode's that contained the named 'attribute'.


        NOTE: These arrays should be considered parallel arrays.

        ALSO: This method shall never return null, if there are no matches, an instance of Ret2<int[], String[]> shall be returned, containing zero length arrays.
        Throws:
        InnerTagKeyException - If the attribute name passed to this parameter does not contain the name of a valid HTML5 attribute, then this exception shall throw.
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        TagNode.AV(String), TagNode.isTagNode(), TagNode.isClosing, InnerTagKeyException.check(String[]), LV
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
         InnerTagKeyException.check(attribute);
        
         LV                      l       = new LV(html, sPos, ePos); // Loop Variable
         IntStream.Builder       posB    = IntStream.builder();      // Save matches here (vector-position)
         Stream.Builder<String>  strB    = Stream.builder();         // Save attribute-values here
         int                     MIN_LEN = 4 + attribute.length();   // <, TOK, SPACE, attribute, =, >
         TagNode                 tn;                                 // Temp Variables
         HTMLNode                n;
         String                  attribValue;
        
         for (int i=l.start; i < l.end; i++)
             if (    ((n = html.elementAt(i)).isTagNode())           // Only visit TagNode's
                 &&  (! (tn = (TagNode) n).isClosing)                // Only Visit OpeningTags (Closing Tags cannot have attributes)
                 &&  (tn.str.length() >= (tn.tok.length()+MIN_LEN))  // Min-Length to even have the attribute
                 &&  ((attribValue = tn.AV(attribute)) != null)  )   // If the attribute-value is non-null, save it and return it
             { 
                 posB.accept(i);                                     // Save the vector-index position of the TagNode
                 strB.accept(attribValue);                           // Save the Attribute-Value of that TagNode
             }
        
         // Java Stream's shall build the arrays.  Put them into an instance of Ret2, and return them.
         return new Ret2<>(posB.build().toArray(), strB.build().toArray(String[]::new));
        
      • retrieve

        public static java.lang.String[] retrieve​
                    (java.util.Vector<? extends HTMLNode> html,
                     int[] posArr,
                     java.lang.String attribute)
        
        This shall query each element listed in the position-array (parameter 'posArr') for the value of the attribute of the HTML Element located at these positions. The value of each attribute will be appended to a parallel String-array and returned. This String[] array shall be parallel to the input Vector-index 'posArr' parameter.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        posArr - This shall be a list of Vector-indices that contain opening TagNode elements. The value of the attribute provided by parameter 'attribute' will be returned in a parallel String[] array for each TagNode identified by 'posArr'.
        attribute - This is the name of the HTML attribute that is being retrieved. Each TagNode element at the locations specified by input parameter 'posArr' shall be searched for this attribute (name), and the value of that attribute shall be placed in the returned String[] array.

        If any of the TagNode instances listed by the Vector-index array do not have that attribute, then a 'null' shall be placed in the returned String[] array at the index-location parallel to its position in 'posArr'
        Returns:
        This returns a String[] array that shall be parallel to the input-parameter int[] posArr. Each location in this String-array shall correspond to the attribute-value returned by a call to TagNode.AV(attribute) on the TagNode that is located at the Vector-index identified by the value at 'posArr'.
        Throws:
        InnerTagKeyException - If the String provided to parameter 'attribute' is not a valid HTML-5 attribute-name, then this exception shall thow.
        java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
        TagNodeExpectedException - This exception shall throw if an identified Vector index is supposed to have an instance of TagNode, but instead had some other HTMLNode instance. If an integer-position array (int[] posArr) one of whose indices does not point to a TagNode, then this exception's throw shall inform the programmer.
        OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but this TagNode has its boolean isClosing field set to TRUE, then this exception shall throw. When passing an int[] posArr integer-array of Vector-indices, and the code expects that each of the locations pointed to in the Vector to contain "Opening HTML Element Tags", then this exception's throw will inform the user.
        See Also:
        InnerTagKeyException.check(String[]), TagNode.AV(String), TagNode.isTagNode(), TagNode.isClosing
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
         InnerTagKeyException.check(attribute);
        
         int         i   = 0;
         String[]    ret = new String[posArr.length];    // Return String-array
         int         MIN = 4 + attribute.length();       // Minimum length of the TagNode.str to even have the specified attribute
                                                         // '<', TOKEN, SPACE, INNERTAG, '=', '>'
        
         for (int pos: posArr)
         {
             HTMLNode n = (HTMLNode) html.elementAt(pos);
             if (! n.isTagNode())                        // Must be an HTML TagNode
                 throw new TagNodeExpectedException(pos);
        
             TagNode tn = (TagNode) n;
             if (tn.isClosing)                           // Must be an "Opening" HTML TagNode
                 throw new OpeningTagNodeExpectedException(pos);
        
             ret[i++] = (tn.str.length() < (tn.tok.length() + MIN))
                 ? null                                  // TagNode.str is too short to even have the attribute
                 : tn.AV(attribute);                     // Possibly has the attribute... Save the result of TagNode.AV(attribute)
         }
        
         return ret;
        
      • filter

        public static int[] filter​(java.util.Vector<? super TagNode> html,
                                   int sPos,
                                   int ePos,
                                   Attributes.Filter f)
        Filters the contents of each instance of a 'TC.OpeningTags' element found in the input Vector. The type of filter performed is defined by the parameter Filter 'f'. Each time a TagNode found in the input vectorized-html web-page, or html sub-list, is changed or modified the, original TagNode will be removed and replaced by a new, updated or modified TagNode instance.

        NOTE: This method restricts the filter process to the specified subrange sPos ... ePos of the HTML-Vector.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        f - This is a 'functional-interface' instance. It may be implemented by a lambda expression, or with an assignment to a (C Styled) function pointer. This interface is defined here in class 'Attributes'. It needs to implement a method that receives a String and a java.util.Properties, and removes all attribute key-value pairs that need to be removed from those Properties. If any changes have been made to the Properties, this must be indicated by returning TRUE as the result of this method.

        By implementing an instance of 'Filter', a programmer may selectively choose which attributes in each and every TagNode of a web-page, or sub-list, using a single lambda-expression.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        An int[] array whose elements function as 'Vector'-index pointers into the original vectorized-html web-page parameter 'html.' The nodes/references pointed to by the pointers in this array are the nodes/elements that were changed, and now contain new TagNode elements whose attribute key-value pairs were removed or updated by this method.
        Throws:
        InnerTagKeyException - The TagNode constructor that is used here when replacing TagNode instances will automatically check each attribute that is being inserted into the TagNode. If the user has added inner-tags whose names do not meet the requirements of the inner-tag naming conventions, this exception will throw.
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        QuotesException - If there are "quotes within quotes" problems when invoking the TagNode constructor, this exception will throw. The problem occurs when one or more of the attribute key-value pairs have a quotation-choice such that the chosen quotation-mark is also found within the attribute-value.

        QuotesException will also throw in the case that an attribute key-value pair has elected to use the "No Quotes" option, but the attribute-value contains white-space.
        See Also:
        TagNode.allAV(boolean, boolean), TagNode.isTagNode(), TagNode.isClosing, LV
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
         IntStream.Builder   b   = IntStream.builder();                      // Save Modified node-locations in a java stream
         LV                  l   = new LV(sPos, ePos, html);                 // Loop Variable
        
         Properties p;       HTMLNode n;     TagNode tn;
        
         for (int i=l.start; i < l.end; i++)
             if (    ((n = (HTMLNode) html.elementAt(i)).isTagNode())        // Only instances of TagNode have attributes, NOT TextNode or CommentNode
                 &&  (! (tn = (TagNode) n).isClosing)                        // TC.OpeningTags have attributes, Closing-Element Nodes cannot have them
                 &&  (tn.str.length() >= (tn.tok.length() + 5))              // '<', TOKEN, SPACE, ATTRIBUTE<MIN-1>, '=', '>'
                 &&  ((p = tn.allAV(true, true)).size() > 0)                 // Retrieve all Attribute Key-Value Pairs.  Take note of surrounding quotes.
                 &&  f.filter(tn.tok, p) )                                   // Run the provided filter logic, if it returns TRUE, then build new TagNode
             {
                 // This makes sure not to leave out any possible "boolean" (a.k.a "Key Only") 
                 // attributes when we rebuild the new TagNode.   An example of a "boolean" attribute
                 // in HTML is "HIDDEN" which is a key that does not require any value to convey its
                 // purpose or function.  Sometimes web-page designers might type "HIDDENT=TRUE", but
                 // it is not necessary.  In any case, the "allAV(boolean, boolean)" method only returns
                 // attributes that have BOTH a 'key' AND a 'value'.
                 List<String> keyOnly = tn.allKeyOnlyAttributes(true).collect(Collectors.toList());
        
                 // Build a new TagNode, then replace the old one
                 tn = new TagNode(tn.tok, p, keyOnly, null, tn.str.endsWith("/>")); 
                 html.setElementAt(tn, i);
        
                 // Save the vector-index where a replacement has occurred.  The user will be
                 // provided a list of all locations where an old TagNode was replaced with a new one.
                 b.accept(i);
             }
        
         // Build the IntStream, Convert the IntStream -> int[], Return it.
         return b.build().toArray();
        
      • filter

        public static int[] filter​(java.util.Vector<? super TagNode> html,
                                   int[] posArr,
                                   Attributes.Filter f)
        Filters the contents of each instance of a 'TC.OpeningTags' element in the input Vector. The type of filter performed is defined by the parameter Filter 'f'. Each time a TagNode in the input vectorized-html web-page, or html sub-list, is changed or modified the original TagNode will be removed and replaced by a new, updated or modified TagNode instance.

        NOTE: This method restricts the removal process to only nodes specified by the 'Vector'-index parameter 'posArr'.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        f - This is a 'functional-interface' instance. It may be implemented by a lambda expression, or with an assignment to a (C Styled) function pointer. This interface is defined here in class 'Attributes'. It needs to implement a method that receives a String and a java.util.Properties, and removes all attribute key-value pairs that need to be removed from those Properties. If any changes have been made to the Properties, this must be indicated by returning TRUE as the result of this method.

        By implementing an instance of 'Filter', a programmer may selectively choose which attributes in each and every TagNode of a web-page, or sub-list, using a single lambda-expression.

        AGAIN: This method shall only modify TagNode's if their Vector-index locations in 'html' are listed in 'posArr'.
        posArr - This integer-array is expected to receive a "Pointer-Integer Array." These are usually generated by the NodeSearch 'Find' classes, and are simply lists of index-pointers into a Vectorized HTML Web-Page Vector. The int[] array passed to this parameter will specify the TagNode's in the Vector whose attributes will be partially removed via a call to TagNode.removeAV(...) and replaced.

        For Example:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
         // This line will retrieve an array "index-pointer" to every HTML Section Element.
         int[] posArr         = TagNodeFind.all(htmlPage, TC.OpeningTags, "section");
        
         // This line uses a lambda-expression to implement a simple Attributes Filter.  This filter
         // removes any 'class' information found in the element, and then adds a 'title' attribute
         // if the TagNode does not already have a 'title' inner-tag.
         // NOTE:    This filter operation will only be applied to the TagNode's that were identified
         //          by the search operation in the previous line.  Specifically, only TagNode's whose 
         //          indices are in the integer-array 'posArr' will be checked against this filter lambda
         //          expression.
         // ALSO:    This 'Counter' class simply 'counts' and returns successive integers, beginning at one.
         // RETURNS: The returned array will contain a list of pointers to HTML SECTION Elements that
         //          were changed.  SECTION Elements that were not updated by the Attributes.Filter
         //          lambda-expression will not have a pointer in this index-array.
         Counter c = new Counter(1);
         int[] changedPosArr  = Attributes.filter(htmlPage, posArr, (String htmlTag, Properties av) ->
         {
             boolean ret = false;
             if (av.contains("class"))    { ret=true; av.remove("class"); }
             if (! av.contains("title"))  { ret=true; av.put("title", "Article Section Page #" + c.next()); }
             return ret;
         });
        
         
        
        Returns:
        An integer-array whose elements function as 'Vector'-index pointers into the original vectorized-html web-page parameter 'html.' The nodes/references pointed to by the pointers in this array are the nodes/elements that were changed, and now contain new TagNode elements whose attribute key-value pairs were removed or updated by this method.
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
        InnerTagKeyException - The TagNode constructor that is used here when replacing TagNode instances will automatically check each attribute that is being inserted into the TagNode. If the user has added inner-tags whose names do not meet the requirements of the inner-tag naming conventions, this exception will throw.
        QuotesException - If there are "quotes within quotes" problems when invoking the TagNode constructor, this exception will throw. The problem occurs when one or more of the attribute key-value pairs have a quotation-choice such that the chosen quotation-mark is also found within the attribute-value.

        QuotesException will also throw in the case that an attribute key-value pair has elected to use the "No Quotes" option, but the attribute-value contains white-space.
        TagNodeExpectedException - This exception shall throw if an identified Vector index is supposed to have an instance of TagNode, but instead had some other HTMLNode instance. If an integer-position array (int[] posArr) one of whose indices does not point to a TagNode, then this exception's throw shall inform the programmer.
        OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but this TagNode has its boolean isClosing field set to TRUE, then this exception shall throw. When passing an int[] posArr integer-array of Vector-indices, and the code expects that each of the locations pointed to in the Vector to contain "Opening HTML Element Tags", then this exception's throw will inform the user.
        See Also:
        TagNode.allAV(boolean, boolean), TagNode.isTagNode(), TagNode.isClosing
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
         IntStream.Builder b = IntStream.builder();          // Use Java Stream to keep a list of Vector-Locations
                                                             // that were updated / modified.
        
         for (int i: posArr)
         {
             HTMLNode n = (HTMLNode) html.elementAt(i);
             if (! n.isTagNode())                            // Must be an HTML TagNode
                 throw new TagNodeExpectedException(i);
        
             TagNode tn = (TagNode) n;
             if (tn.isClosing)                               // Must be an "Opening" HTML TagNode
                 throw new OpeningTagNodeExpectedException(i);
        
             if (tn.str.length() < (tn.tok.length() + 5))    // If element-length < tok-length+5, there are no attributes!
                 continue;                                   // '<', TOKEN, SPACE, ATTRIBUTE<MIN-1>, '=', '>'
        
             Properties p =  tn.allAV(true, true);           // Retrieve all Attribute Key-Value Pairs.
        
             // This makes sure not to leave out any possible "boolean" (a.k.a "Key Only") 
             // attributes when we rebuild the new TagNode.   An example of a "boolean" attribute
             // in HTML is "HIDDEN" which is a key that does not require any value to convey its
             // purpose or function.  Sometimes web-page designers might type "HIDDENT=TRUE", but
             // it is not necessary.  In any case, the "allAV(boolean, boolean)" method only returns
             // attributes that have BOTH a 'key' AND a 'value'.
             List<String> keyOnly = tn.allKeyOnlyAttributes(true).collect(Collectors.toList());
        
             if ((p.size() > 0) && f.filter(tn.tok, p))      // Run the provided filter logic, if it returns TRUE, then build new TagNode
             {
                 // Build a new TagNode, and replace the old one.
                 tn = new TagNode(tn.tok, p, keyOnly, null, tn.str.endsWith("/>"));
                 html.setElementAt(tn, i);
        
                 // Save the vector-index where a replacement has occured.  The user will be
                 // provided a list of all locations where an old TagNode was replaced with a new one.
                 b.accept(i);
             }
         }
        
         return b.build().toArray();                         // Build the IntStream, Convert the IntStream -> int[], Return it.
        
      • filter

        public static int[] filter​(java.util.Vector<? super TagNode> html,
                                   int sPos,
                                   int ePos,
                                   java.lang.String... innerTagWhiteList)
        Filters the contents of each instance of a 'TC.OpeningTags' element in the input Vector using an attribute 'white-list'. All input-Vector TagNode's that have attributes whose names are not members of the inner-tag white-list will be removed, and a new TagNode whose only attributes are members of the innerTag white-list will replace the old TagNode.

        NOTE: This method restricts the filter process to the specified subrange sPos ... ePos of the HTML Vector.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        innerTagWhiteList - This should be a list of attribute names that 'white-list' the attributes in each TagNode inside the vectorized-html parameter 'html'. The concept of 'white-list' means that any attribute inside any TagNode within this input Vector whose name is not in the white-list will be removed from the TagNode, and a new TagNode will be created and replace the old one in the Vector.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        An integer-array whose elements function as 'Vector'-index pointers into the original vectorized-html web-page parameter 'html.' The nodes/references pointed to by the pointers in this array are the nodes/elements that were changed, and now contain new TagNode elements whose attribute key-value pairs were removed by this method.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        TagNode.allAN(boolean, boolean), TagNode.isTagNode(), TagNode.removeAttributes(String[]), TagNode.isClosing, LV
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
         TreeSet<String>         whiteList   = new TreeSet<>();          // java.util.TreeSet: a staple in this library.  Think of it as an optimized, sort array.
         IntStream.Builder       b           = IntStream.builder();      // Java Streams keep a list of which TagNode's were changed
         LV                      l           = new LV(sPos, ePos, html); // Loop-Variable
        
         // Build the tree-set with the contents of the list.  Trim them, convert to lower-case
         // REMEMBER: Internally, the attribute key-value pairs are returned in a java.util.Properties
         //           instance.  This Properties instance always has keys in lower case format.
         for (String attribute: innerTagWhiteList) whiteList.add(attribute.trim().toLowerCase());
        
         HTMLNode n;     TagNode tn;                                     // Temp Variables
         Vector<String> attributesToRemove = new Vector<>();
        
         for (int i=l.start; i < l.end; i++)
             if (    ((n = (HTMLNode) html.elementAt(i)).isTagNode())            // Only instances of TagNode have attributes, NOT TextNode or CommentNode
                 &&  (! (tn = (TagNode) n).isClosing)                            // TC.OpeningTags have attributes, Closing-Element Nodes cannot have them
                 &&  (tn.str.length() > (tn.tok.length() + 3))   )               // If element-length <= tok-length+3, there are no attributes!
             {
                 attributesToRemove.clear();                                     // List of attributes that didn't pass the white-list
        
                 String[] allAN = tn.allAN(true, true).toArray(String[]::new);   // List of all attributes in the TagNode
        
                 for (String attribute : allAN)
                     if (! whiteList.contains(attribute))                        // If attribute is not on the pass-list
                         attributesToRemove.addElement(attribute);               // put it on the "chopping block"
        
                 if (attributesToRemove.size() > 0)                              // if there were attributes that didn't pass...
                 {
                     // This generated list of attributes that need removal is currently stored in a
                     // Vector<String>.  Unfortunately the 'TagNode.removeAttributes(String...)' is
                     // expecting a list of String-literals, or a String[] (String-Array), so here we
                     // simply convert the Vector<String> to a String[].
                     String[] atr = new String[attributesToRemove.size()];
                     atr = attributesToRemove.toArray(atr);
        
                     // Build a new TagNode, and then replace the old one with the newly built one
                     // on the page or sub-page, and at the same location.
                     tn = tn.removeAttributes(atr);
                     html.setElementAt(tn, i);
        
                     // Java's IntStream-Builder is just a way to "build" a short list of integer's.  At the end of this
                     // method, the list will be built and returned to the user.  It shall contain all Vector locations
                     // where a "TagNode swap" (replaced TagNode, with attributes filtered) has occurred.
                     b.accept(i);
                 }
             }
        
         // Build the IntStream, Convert the IntStream -> int[], Return it.
         return b.build().toArray();
        
      • filter

        public static int[] filter​(java.util.Vector<? super TagNode> html,
                                   int[] posArr,
                                   java.lang.String... innerTagWhiteList)
        Filters the contents of each instance of a 'TC.OpeningTags' element in the input Vector using an attribute 'white-list'. All input-Vector TagNode's that have attributes whose names are not members of the inner-tag white-list will be removed, and a new TagNode whose only attributes are members of the innerTag white-list will replace the old TagNode.

        NOTE: This method restricts the removal process to only nodes specified by the 'Vector'-index parameter 'posArr'.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        innerTagWhiteList - This should be a list of attribute names that 'white-list' the attributes in each TagNode inside the vectorized-html parameter 'html'. The concept of 'white-list' means that any attribute inside any TagNode within this input Vector whose name is not in the white-list will be removed from the TagNode, and a new TagNode will be created and replace the old one in the Vector.

        AGAIN: This method shall only modify TagNode's if their Vector-index locations in 'html' are listed in 'posArr'.
        posArr - This integer-array is expected to receive a "Pointer-Integer Array." These are usually generated by the NodeSearch 'Find' classes, and are simply lists of index-pointers into a Vectorized HTML Web-Page Vector. The int[] array passed to this parameter will specify the TagNode's in the Vector whose attributes will be partially removed via a call to TagNode.removeAV(...) and replaced.

        For Example:
        1
        2
        3
        4
        5
        6
        7
        8
         // This line will retrieve an array "index-pointer" to every HTML Image Element.
         int[] posArr         = TagNodeFind.all(htmlPage, TC.OpeningTags, "img");
        
         // This line  will "clean up" any HTML "<IMG>" elements.  If these elements are 'cluttered'
         // after the filter operation, only the 'src' and 'alt' attributes will remain.
         Attributes.filter(htmlPage, posArr, "src", "alt");
        
         
        
        Returns:
        An integer-array whose elements function as 'Vector'-index pointers into the original vectorized-html web-page parameter 'html'. The nodes/references pointed to by the pointers in this array are the nodes/elements that were changed, and now contain new TagNode elements whose attribute key-value pairs were removed by this method.
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
        TagNodeExpectedException - This exception shall throw if an identified Vector index is supposed to have an instance of TagNode, but instead had some other HTMLNode instance. If an integer-position array (int[] posArr) one of whose indices does not point to a TagNode, then this exception's throw shall inform the programmer.
        OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but this TagNode has its boolean isClosing field set to TRUE, then this exception shall throw. When passing an int[] posArr integer-array of Vector-indices, and the code expects that each of the locations pointed to in the Vector to contain "Opening HTML Element Tags", then this exception's throw will inform the user.
        See Also:
        TagNode.allAN(boolean, boolean), TagNode.removeAttributes(String[]), TagNode.isTagNode(), TagNode.isClosing
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
         TreeSet<String>     whiteList   = new TreeSet<>();      // java.util.TreeSet: a staple in this library.  Think of it as an optimized, sort array.
         IntStream.Builder   b           = IntStream.builder();  // Java Streams to keep a list of vector-indices that were updated.
        
         // Build the tree-set with the contents of the list.  Trim them, convert to lower-case
         // REMEMBER: Internally, the attribute key-value pairs are returned in a java.util.Properties
         //           instance.  This Properties instance always has keys in lower case format.
         for (String attribute: innerTagWhiteList) whiteList.add(attribute.trim().toLowerCase());
        
         for (int i: posArr)
         {
             HTMLNode n = (HTMLNode) html.elementAt(i);
             if (! n.isTagNode())                                            // Must be an HTML TagNode
                 throw new TagNodeExpectedException(i);
        
             TagNode tn = (TagNode) n;
             if (tn.isClosing)                                               // Must be an "Opening" HTML TagNode
                 throw new OpeningTagNodeExpectedException(i);
        
             if (tn.str.length() <= (tn.tok.length() + 3))                   // If element-length = tok-length+2, there are no attributes!
                 continue;
        
             String[] allAN = tn.allAN(true, true).toArray(String[]::new);   // List of all attributes in the TagNode
        
             Vector<String> attributesToRemove = new Vector<>();             // List of attributes that didn't pass the white-list
        
             for (String attribute : allAN)
                 if (! whiteList.contains(attribute))                        // If attribute is not on the pass-list
                     attributesToRemove.addElement(attribute);               // put it on the "chopping block"
        
             if (attributesToRemove.size() > 0)                              // if there were attributes that didn't pass...
             {
                 // This generated list of attributes that need removal is currently stored in a
                 // Vector<String>.  Unfortunately the 'TagNode.removeAttributes(String...)' is
                 // expecting a list of String-literals, or a String[] (String-Array), so here we
                 // simply convert the Vector<String> to a String[].
                 allAN = (String[]) attributesToRemove.toArray();
        
                 // Build a new TagNode, and then replace the old one with the newly built one
                 // on the page or sub-page, and at the same location.
                 tn = tn.removeAttributes(allAN);
                 html.setElementAt(tn, i);
        
                 // Java's IntStream-Builder is just a way to "build" a short list of integer's.  At the end of this
                 // method, the list will be built and returned to the user.  It shall contain all Vector locations
                 // where a "TagNode swap" (replaced TagNode, with attributes filtered) has occurred.
                 b.accept(i);
             }
         }
        
         return b.build().toArray();                                         // Build the IntStream, Convert the IntStream -> int[], Return it.
        
      • filter

        public static int[] filter​(java.util.Vector<? super TagNode> html,
                                   int sPos,
                                   int ePos,
                                   StrFilter filter)
        Filters the contents of each instance of a 'TC.OpeningTags' element in the input Vector using a StrFilter. All input-Vector TagNode's which have attributes will have the list of attribute-names tested against the provided StrFilter.test(attribute) predicate.

        If any attribute whose name fails the Predicate test, then that attribute will be removed. After testing all of a TagNode's inner-tags, if any of those attributes did fail the StrFilter.test(...) method, a new TagNode will be constructed leaving those out. Finally, the old TagNode will be removed from input HTML Vector, and replaced with the new one.

        NOTE: This method restricts the filter process to the specified subrange sPos ... ePos of the HTML Vector.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        filter - There are a plethora of available "automatically built" String filters available using the interface StrFilter's static-member methods that build StrFilter to use here. One may also write a java lambda-expression here to implement the java.util.function.Predicate.

        IMPORTANT: The StrFilter functional-interface extends Predicate<Object>. Perhaps it may seem counter-intuitive that it does not extend Predicate<String>, however, since StrFilter is a general purpose Predicate (used in numerous locations in this JAR-File 'Java-HTML' library distribution), it's a situation that allows for non-string objects (like the myriad classes which implement the interface CharSequence) to simply invoke Java's Object.toString() method to be used as input to the filter-test.

        Sadly, this means that in writing a custom-made lambda-expression for this Predicate, it is mandatory to call Java's (Object class) 'toString()' method on the input 'innerTags' - even though the input parameter 'inner-tags' is already a String.

        The rational for this inconvenience is that interface 'StrFilter' has quite a few general-purpose, statically-invoked, factory-builder routines. Reusing those methods outweighs the benefit of having these methods, here, accept a 'Predicate<String>' instead of a 'Predicate<Object>' as an input parameter. (Again, noting that StrFilter extends 'Predicate<Object>'.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        An integer-array whose elements function as 'Vector'-index pointers into the original vectorized-html web-page parameter 'html'. The nodes/references pointed to by the pointers in this array are the nodes/elements that were changed, and now contain new TagNode elements whose attribute key-value pairs were filtered by this method.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        TagNode.allAN(), TagNode.isTagNode(), TagNode.isClosing, TagNode.removeAttributes(String[]), LV
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
         IntStream.Builder   b   = IntStream.builder();                      // Save the list of modified TagNode's in a Java Stream
         LV                  l   = new LV(sPos, ePos, html);                 // Loop Variables
        
         HTMLNode n;     TagNode tn;                                         // Temp Variables
        
         for (int i=l.start; i < l.end; i++)
             if (    ((n = (HTMLNode) html.elementAt(i)).isTagNode())        // Only instances of TagNode have attributes, NOT TextNode or CommentNode
                 &&  (! (tn = (TagNode) n).isClosing)                        // TC.OpeningTags have attributes, Closing-Element Nodes cannot have them
                 &&  (tn.str.length() > (tn.tok.length() + 3))   )           // '<', TOKEN, SPACE, '>'
             {
                 String[] innerTagsToRemove = tn                             // Temp-variable will hold a list of all "inner-tags that must be removed"
                         .allAN(true, true)                                  // generates a 'Stream<String>' of every attribute-name in the TagNode
                         .filter(innerTag -> filter.test(innerTag))          // reduces the stream to contain (ONLY) the inner-tags that need to be removed from the TagNode
                         .toArray(String[]::new);                            // converts this 'Stream<String>' to a 'String[]' (String-Array)
        
                 if (innerTagsToRemove.length > 0)                           // now... update only if there are attributes to be removed.
                 {
                     // Build a new TagNode, and then replace the old one with the newly built one
                     // on the page or sub-page, and at the same location.
                     tn = tn.removeAttributes(innerTagsToRemove);
                     html.setElementAt(tn, i);
        
                     // Java's IntStream-Builder is just a way to "build" a short list of integer's.  At the end of this
                     // method, the list will be built and returned to the user.  It shall contain all Vector locations
                     // where a "TagNode swap" (replaced TagNode, with attributes filtered) has occurred.
                     b.accept(i);
                 }
             }
        
         // Build the IntStream, Convert the IntStream -> int[], Return it.
         return b.build().toArray();
        
      • filter

        public static int[] filter​(java.util.Vector<? super TagNode> html,
                                   int[] posArr,
                                   StrFilter filter)
        Filters the contents of each instance of a 'TC.OpeningTags' element in the input Vector using a StrFilter. All input-Vector TagNode's which have attributes will have the list of attribute-names tested against the provided StrFilter.test(attribute) predicate.

        If any attribute whose name fails the Predicate test, then that attribute will be removed. After testing all of a TagNode's inner-tags, if any of those attributes did fail the StrFilter.test(...) method, a new TagNode will be constructed leaving those out. Finally, the old TagNode will be removed from input HTML Vector, and replaced with the new one.

        NOTE: This method restricts the removal process to only nodes specified by the 'Vector'-index parameter 'posArr'.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        filter - There are a plethora of available "automatically built" String filters available using the interface StrFilter's static-member methods that build StrFilter to use here. One may also write a java lambda-expression here to implement the java.util.function.Predicate.

        IMPORTANT: The StrFilter functional-interface extends Predicate<Object>. Perhaps it may seem counter-intuitive that it does not extend Predicate<String>, however, since StrFilter is a general purpose Predicate (used in numerous locations in this JAR-File 'Java-HTML' library distribution), it's a situation that allows for non-string objects (like the myriad classes which implement the interface CharSequence) to simply invoke Java's Object.toString() method to be used as input to the filter-test.

        Sadly, this means that in writing a custom-made lambda-expression for this Predicate, it is mandatory to call Java's (Object class) 'toString()' method on the input 'innerTags' - even though the input parameter 'inner-tags' is already a String.

        The rational for this inconvenience is that interface 'StrFilter' has quite a few general-purpose, statically-invoked, factory-builder routines. Reusing those methods outweighs the benefit of having these methods, here, accept a 'Predicate<String>' instead of a 'Predicate<Object>' as an input parameter. (Again, noting that StrFilter extends 'Predicate<Object>'.

        AGAIN: This method shall only modify TagNode's if their Vector-index locations in 'html' are listed in 'posArr'.
        posArr - This integer-array is expected to receive a "Pointer-Integer Array." These are usually generated by the NodeSearch 'Find' classes, and are simply lists of index-pointers into a Vectorized HTML Web-Page Vector. The int[] array passed to this parameter will specify the TagNode's in the Vector whose attributes will be partially removed via a call to TagNode.removeAV(...) and replaced.

        For Example:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
         // This line will retrieve an array "index-pointer" to every HTML Image Element.
         int[] posArr = TagNodeFind.all(htmlPage, TC.OpeningTags, "img");
         
         // Build an instance of StrFilter
         // NOTE: The 'true' parameter indicates that the attribute name should be considered
         //       and compared using a CASE INSENSITIVE fashion.
         StrFilter f = StrFilter.strListKEEP(true, "src")
        
         // This line  will "clean up" any HTML "<IMG>" elements.  If these elements are 'cluttered'
         // after the filter operation, only the 'src' attribute will remain.
         Attributes.filter(htmlPage, posArr, f);
        
         
        
        Returns:
        An integer-array whose elements function as 'Vector'-index pointers into the original vectorized-html web-page parameter 'html'. The nodes/references pointed to by the pointers in this array are the nodes/elements that were changed, and now contain new TagNode elements whose attribute key-value pairs were filtered by this method.
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - If any of the elements in 'posArr' contain index-pointers that are out of range of Vector-parameter 'page', then java will, naturally, throw this exception.
        TagNodeExpectedException - This exception shall throw if an identified Vector index is supposed to have an instance of TagNode, but instead had some other HTMLNode instance. If an integer-position array (int[] posArr) one of whose indices does not point to a TagNode, then this exception's throw shall inform the programmer.
        OpeningTagNodeExpectedException - When a Vector position-index holds an instance of TagNode, but this TagNode has its boolean isClosing field set to TRUE, then this exception shall throw. When passing an int[] posArr integer-array of Vector-indices, and the code expects that each of the locations pointed to in the Vector to contain "Opening HTML Element Tags", then this exception's throw will inform the user.
        See Also:
        TagNode.allAN(), TagNode.isTagNode(), TagNode.isClosing, TagNode.removeAttributes(String[])
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
         IntStream.Builder b = IntStream.builder();                  // Use Java Stream to keep a list of Vector-Locations
                                                                     // that were updated / modified.
        
         for (int i: posArr)
         {
             HTMLNode n = (HTMLNode) html.elementAt(i);
             if (! n.isTagNode())                                    // Must be an HTML TagNode
                 throw new TagNodeExpectedException(i);
        
             TagNode tn = (TagNode) n;
             if (tn.isClosing)                                       // Must be an "Opening" HTML TagNode
                 throw new OpeningTagNodeExpectedException(i);
        
             if (tn.str.length() < (tn.tok.length() + 3))            // '<', TOKEN, SPACE '>'
                 continue;
        
             String[] innerTagsToRemove = tn                         // Temp-variable will hold a list of all "inner-tags that must be removed"
                     .allAN(true, true)                              // generates a 'Stream<String>' of every attribute-name in the TagNode
                     .filter(innerTag -> filter.test(innerTag))      // reduces the stream to contain (ONLY) the inner-tags that need to be removed from the TagNode
                     .toArray(String[]::new);                        // converts this 'Stream<String>' to a 'String[]' (String-Array)
        
             if (innerTagsToRemove.length > 0)                       // now... update only if there are attributes to be removed.
             {
                 // Build a new TagNode, and then replace the old one with the newly built one
                 // on the page or sub-page, and at the same location.
                 tn = tn.removeAttributes(innerTagsToRemove);
                 html.setElementAt(tn, i);
        
                 // Java's IntStream-Builder is just a way to "build" a short list of integer's.  At the end of this
                 // method, the list will be built and returned to the user.  It shall contain all Vector locations
                 // where a "TagNode swap" (replaced TagNode, with attributes filtered) has occurred.
                 b.accept(i);
             }
         }
        
         // Build the IntStream, Convert the IntStream -> int[], Return it.
         return b.build().toArray();