Package Torello.HTML

Class Util


  • public class Util
    extends java.lang.Object
    Util - Documentation.

    This is a list of some of the common "helper routines" that I occasionally need. There are not in any particular order. Almost all of these routines are used internally, either in the NodeSearch search-loops and iterators, or else they are found in parts of package "Tools." The possibility to expand classes like this is probably "boundless" - however, keep in mind that classes like public class 'SubSection' and also public class 'NodeIndex' and both of its sub-classes public class 'TagNodeIndex' and 'TextNodeIndex' make some of the short, for-loop-driven, helper-routines seems a little spurious.

    The most complicated and easy-to-make-mistakes are the for-loops & iterators of the node-search package. With these solidly tested for over a year, the helper routines that build those for-loops are included in this class here. Extending more utility and modification tools for vectorized-html pages might be the subject of future development work, but easily the most complicated stuff - search and iterate - have been handled. The methods here might be useful, but it is not a "precise science" on what is a usable class, and what is not. Please remember that the methods ending in "OPT" (meaning optimized) really just mean that a couple of the exception throw checks are not there, because those do not need to be repeated on each iteration of a node-search search-for-loop when the for-loop criteria are specified in the method-signature, and (hopefully, obviously) do not need to be checked on each loop iteration.

    Static (Functional) API: The methods in this class are all (100%) defined with the Java Key-Word / Key-Concept 'static'. Furthermore, there is no way to obtain an instance of this class, because there are no public (nor private) constructors. Java's Spring-Boot, MVC feature is *not* utilized because it flies directly in the face of the light-weight data-classes philosophy. This has many advantages over the rather ornate Component Annotations (@Component, @Service, @AutoWired, etc... 'Java Beans') syntax:

    • The methods here use the key-word 'static' which means (by implication) that there is no internal-state. Without any 'internal state' there is no need for constructors in the first place! (This is often the complaint by MVC Programmers).
    • A 'Static' (Functional-Programming) API expects to use fewer data-classes, and light-weight data-classes, making it easier to understand and to program.
    • The Vectorized HTML data-model allows more user-control over HTML parse, search, update & scrape. Also, memory management, memory leakage, and the Java Garbage Collector ought to be intelligible through the 'reuse' of the standard JDK class Vector for storing HTML Web-Page data.

    The power that object-oriented programming extends to a user is (mostly) limited to data-representation. Thinking of "Services" as "Objects" (Spring-MVC, 'Java Beans') is somewhat 'over-applying' the Object Oriented Programming Model. Like most classes in the Java-HTML JAR Library, this class backtracks to a more C-Styled Functional Programming Model (no Objects) - by re-using (quite profusely) the key-word static with all of its methods, and by sticking to Java's well-understood class Vector

    Internal-State: A user may click on this class' source code (see link below) to view any and all internally defined fields class. A cursory inspection of the code would prove that this class has precisely zero internally defined global fields (Spaghetti). All variables used by the methods in this class are local fields only, and therefore this class ought to be though of as 'state-less'.



    • Method Detail

      • trimTextNodes

        public static int trimTextNodes​(java.util.Vector<HTMLNode> page,
                                        int sPos,
                                        int ePos,
                                        boolean deleteZeroLengthStrings)
        This will iterate through the entire Vector<HTMLNode>, and invoke java.lang.String.trim() on each TextNode on the page. If this invocation results in a reduction of String.length(), then a new TextNode will be instantiated whose TextNode.str field is set to the result of the String.trim(old_node.str) operation.
        Parameters:
        deleteZeroLengthStrings - If a TextNode's length is zero (before or after trim() is called) and when this parameter is TRUE, that TextNode must be removed from the Vector.
        Returns:
        Any node that is trimmed or deleted will increment the counter. This counter final-value is returned
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
         int                 counter = 0;
         IntStream.Builder   b       = deleteZeroLengthStrings ? IntStream.builder() : null;
         HTMLNode            n       = null;
         LV                  l       = new LV(page, sPos, ePos);
        
         for (int i=l.start; i < l.end; i++)
             if ((n = page.elementAt(i)).isTextNode())
             {
                 String  trimmed         = n.str.trim();
                 int     trimmedLength   = trimmed.length();
        
                 if ((trimmedLength == 0) && deleteZeroLengthStrings)
                     { b.add(i); counter++; }
                 else if (trimmedLength < n.str.length())
                     { page.setElementAt(new TextNode(trimmed), i); counter++; }
             }
        
         if (deleteZeroLengthStrings) removeNodesOPT(page, b.build().toArray());
        
         return counter;
        
      • removeInclusiveEmpty

        public static int removeInclusiveEmpty​(java.util.Vector<HTMLNode> page,
                                               int sPos,
                                               int ePos,
                                               java.lang.String... htmlTags)
        This will do an "Inclusive Search" using the standard class TagNodeInclusiveIterator in the package NodeSearch. Then it will inspect the contents of the subsections. Any subsections that do not contain any instances of HTMLNode in between them, or any subsections that only contain "blank-text" (white-space) between them shall be removed.

        IMPORTANT: The search logic shall perform multiple recursive iterations of itself, such that if, for instance, the user requested that all empty HTML divider (<DIV>) elements be removed, if after removing a set a dividers resulted in more empty ones (nested <DIV> elements), then an additional removal shall be called. This recursion shall continue until there are no empty HTML elements of the types listed by 'htmlTags'
        Parameters:
        page - Any vectorized-html page or sub-page.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        htmlTags - The list of inclusive (non-singleton) html elements to search for possibly being empty container tags.
        Returns:
        The number of HTMLNode's that were removed.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
         DotPair         subList;
         int             removed = 0;
         HNLIInclusive   iter    = TagNodeInclusiveIterator.iter(page, htmlTags);
         LV              l       = new LV(page, sPos, ePos);
        
         iter.restrictCursor(l);
        
         TOP:
         while (iter.hasNext())
        
             // If there is only the opening & closing pair, with nothing in between,
             // then the pair must be removed because it is "Empty" (Inclusive Empty)
             if ((subList = iter.nextDotPair()).size() == 2)
             { iter.remove();    ePos -= subList.size();     removed += subList.size(); }
        
             else
             {
                 // If there is any TagNode in between the start-end pair, then this is NOT EMPTY
                 // In this case, skip to the next start-end opening-closing pair.
                 for (int i=(subList.start + 1); i < subList.end; i++)
                     if (! page.elementAt(i).isTextNode())
                         continue TOP;
        
                 // If there were only TextNode's between an opening-closing TagNode Pair....
                 // **AND** those TextNode's are only white-space, then this also considered
                 // Inclusively Empty.  (Get all TextNode's, and if .trim() reduces the length()
                 // to zero, then it was only white-space.
                 if (Util.textNodesString(page, subList).trim().length() == 0)
                 { iter.remove();    ePos -= subList.size();     removed += subList.size(); }
             }
        
         // This process must be continued recursively, because if any inner, for instance,
         // <DIV> ... </DIV> was removed, then the outer list must be re-checked...
         if (removed > 0)
             return removed + removeInclusiveEmpty(page, sPos, ePos, htmlTags);
         else
             return 0;
        
      • rangeToString

        public static java.lang.String rangeToString​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos)
        
        The purpose of this method/function is to convert a portion of the contents of an HTML-Page, currently being represented as a Vector of HTMLNode's into a String. Two 'int' parameters are provided in this method's signature to define a sub-list of a page to be converted to a java.lang.String
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The Vector converted into a String.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        pageToString(Vector), rangeToString(Vector, DotPair)
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         StringBuilder   ret = new StringBuilder();
         LV              l   = new LV(html, sPos, ePos);
        
         for (int i=l.start; i < l.end; i++) ret.append(html.elementAt(i).str);
        
         return ret.toString();
        
      • textNodesString

        public static java.lang.String textNodesString​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos)
        
        This will return a String that is comprised of ONLY the TextNode's contained within the input Vector - and furthermore, only nodes that are situated between index int 'sPos' and index int 'ePos' in that Vector.

        The for-loop that iterates the input- parameter will simply skip an instance of 'TagNode' and 'CommentNode' when building the output return String..
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        This will return a String that is comprised of the text-only elements in the web-page or sub-page. Only text between the requested Vector-indices is included.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        textNodesString(Vector, DotPair), textNodesString(Vector)
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
        7
        8
        9
         StringBuilder   sb  = new StringBuilder();
         LV              l   = new LV(html, sPos, ePos);
         HTMLNode        n;
        
         for (int i=l.start; i < l.end; i++)
             if ((n = html.elementAt(i)).isTextNode())
                 sb.append(n.str);
        
         return sb.toString();
        
      • removeAllTextNodes

        public static int removeAllTextNodes​(java.util.Vector<HTMLNode> page,
                                             int sPos,
                                             int ePos)
        Takes a sub-section of an HTML Vector and removes all TextNode present
        Parameters:
        page - Any HTML page
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of HTML TextNode's that were removed
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        TextNode, removeNodesOPT(Vector, int[])
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
         IntStream.Builder   b       = IntStream.builder();
         LV                  l       = new LV(page, sPos, ePos);
        
         // Use Java-Streams to build the list of nodes that are valid text-nodes.
         for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTextNode()) b.add(i);
        
         // Build the stream and convert it to an int[] (integer-array)
         int[]               posArr  = b.build().toArray();
        
         // The integer array is guaranteed to be sorted, and contain valid vector-indices.
         removeNodesOPT(page, posArr);
        
         return posArr.length;
        
      • removeAllTagNodes

        public static int removeAllTagNodes​(java.util.Vector<HTMLNode> page,
                                            int sPos,
                                            int ePos)
        Takes a sub-section of an HTML Vector and removes all TagNode present
        Parameters:
        page - Any HTML page
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of HTML TagNode's that were removed
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        TagNode, removeNodesOPT(Vector, int[])
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
         IntStream.Builder   b       = IntStream.builder();
         LV                  l       = new LV(page, sPos, ePos);
        
         // Use Java-Streams to build the list of nodes that are valid tag-nodes.
         for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTagNode()) b.add(i);
        
         // Build the stream and convert it to an int[] (integer-array)
         int[]               posArr  = b.build().toArray();
        
         // The integer array is guaranteed to be sorted, and contain valid vector-indices.
         removeNodesOPT(page, posArr);
        
         return posArr.length;
        
      • removeAllCommentNodes

        public static int removeAllCommentNodes​(java.util.Vector<HTMLNode> page,
                                                int sPos,
                                                int ePos)
        Takes a sub-section of an HTML Vector and removes all CommentNode present
        Parameters:
        page - Any HTML page
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of HTML CommentNode's that were removed
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        CommentNode, removeNodesOPT(Vector, int[])
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
         IntStream.Builder   b       = IntStream.builder();
         LV                  l       = new LV(page, sPos, ePos);
        
         // Use Java-Streams to build the list of nodes that are valid comment-nodes.
         for (int i=l.start; i < l.end; i++)
             if (page.elementAt(i).isCommentNode())
                 b.add(i);
        
         // Build the stream and convert it to an int[] (integer-array)
         int[]               posArr  = b.build().toArray();
        
         // The integer array is guaranteed to be sorted, and contain valid vector-indices.
         removeNodesOPT(page, posArr);
        
         return posArr.length;
        
      • escapeTextNodes

        public static int escapeTextNodes​(java.util.Vector<HTMLNode> html,
                                          int sPos,
                                          int ePos)
        Will call HTML.Escape.replaceAll on each TextNode in the range of sPos ... ePos
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of TextNode's that changed as a result of the Escape.replaceAll(n.str) loop.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        Escape.replaceAll(String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
         LV          l       = new LV(html, sPos, ePos);
         HTMLNode    n       = null;
         String      s       = null;
         int	        counter = 0;
        
         for (int i=l.start; i < l.end; i++)
             if ((n = html.elementAt(i)).isTextNode())
                 if (! (s = Escape.replace(n.str)).equals(n.str))
                 {
                     html.setElementAt(new TextNode(s), i);
                     counter++;
                 }
         return counter;
        
      • cloneRange

        public static java.util.Vector<HTMLNodecloneRange​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos)
        
        Copies (clones!) a sub-range of the HTML page, stores the results in a Vector, and returns it.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The "cloned" (copied) sub-range specified by 'sPos' and 'ePos'.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        cloneRange(Vector, DotPair)
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
        7
         LV                  l   = new LV(html, sPos, ePos);
         Vector<HTMLNode>    ret = new Vector<>(l.end - l.start);
        
         // Copy the range specified into the return vector
         for (int i = l.start; i < l.end; i++) ret.addElement(html.elementAt(i));
        
         return ret;
        
      • removeAllInnerTags

        public static int removeAllInnerTags​
                    (java.util.Vector<? super TagNode> html,
                     int sPos,
                     int ePos)
        
        This method removes all inner-tags (all attributes) from every TagNode inside of an HTML page. It does this by replacing every TagNode in the Vector with the pre-instantiated, publicly-available TagNode which can be obtained by a call to the class HTMLTags.hasTag(token, TC).

        NOTE: This method determines whether a fresh TagNode is to be inserted by measuring the length of the internal TagNode.str (a String) field. If TagNode.str.length() is not equal to the HTML token TagNode.tok length plus 2, then a fresh, pre-instantiated, node is replaced. The '+2' figure comes from the additional characters '<' and '>' that start and end every HTML TagNode
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of TagNode elements that have were replaced with zero-attribute HTML Element Tags.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
         Object o;   TagNode tn;
        
         int ret = 0;
         LV  l   = new LV(sPos, ePos, html);
        
         for (int i = (l.end-1); i >= l.start; i--)                  // Iterate the Loop-Variable
             if ((o = html.elementAt(i)) instanceof TagNode)         // Only TagNode's have Inner-Tags
                 if (! (tn = (TagNode) o).isClosing)                 // Only "Opening TagNodes" have attributes
                     if (tn.str.length() > (tn.tok.length() + 2))    // <TOK> *CANNOT* have Inner-Tags...
                     {
                         ret++;
                         html.setElementAt(HTMLTags.hasTag(tn.tok, TC.OpeningTags), i);
                         // HTMLTags.hasTag(tok, TC) gets an empty and pre-instantiated TagNode,
                         // where TagNode.tok == 'tn.tok' and TagNode.isClosing = false
                     }
        
         return ret;
        
      • textStrLength

        public static int textStrLength​(java.util.Vector<? extends HTMLNode> html,
                                        int sPos,
                                        int ePos)
        This method will return the length of the strings contained by all/only instances of 'TextNode' among the nodes of the input HTML-Vector. This is identical to the behavior of the method with the same name, but includes starting and ending bounds on the html Vector: 'sPos' & 'ePos'.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The sum of the lengths of the text contained by text-nodes in the Vector between 'sPos' and 'ePos'.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
         HTMLNode    n;
         int         sum = 0;
         LV          l   = new LV(html, sPos, ePos);
        
         // Counts the length of each "String" in a "TextNode" between sPos and ePos
         for (int i=l.start; i < l.end; i++)
             if ((n = html.elementAt(i)).isTextNode())
                 sum += n.str.length();
        
         return sum;
        
      • compactTextNodes

        public static int compactTextNodes​(java.util.Vector<HTMLNode> html,
                                           int sPos,
                                           int ePos)
        Occasionally, when removing instances of TagNode from a vectorized-html page, certain instances of TextNode which were not adjacent / neighbours in the Vector, all of a sudden become adjacent. Although there are no major problems with contiguous instances of TextNode from the Search Algorithm's perspective, for programmer's, it can sometimes be befuddling to realize that the output text that is returned from a call to Util.pageToString(html) is not being found because the text that is left is broken amongst multiple instances of adjacent TextNodes.

        This method merely combines "Adjacent" instances of class TextNode in the Vector into single instances of class TextNode
        Parameters:
        html - Any vectorized-html web-page. If this page contain any contiguously placed TextNode's, the extra's will be eliminated, and the internal-string's inside the node's (TextNode.str) will be combined. This action will reduce the size of the actual html-Vector.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of nodes that were eliminated after being combined, or 0 if there were no text-nodes that were removed.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        HTMLNode.str, TextNode
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
         LV      l           = new LV(html, sPos, ePos);
         boolean compacting  = false;
         int     firstPos    = -1;
         int     delta       = 0;
        
         for (int i=l.start; i < (l.end - delta); i++)
             if (html.elementAt(i).isTextNode())         // Is a TextNode
             {
                 if (compacting) continue;               // Not in "Compacting Mode"
                 compacting = true;  firstPos = i;       // Start "Compacting Mode" - this is a TextNode
             }
             else if (compacting && (firstPos < (i-1)))  // Else - Must be a TagNode or CommentNode
             {
                 // Save compacted TextNode String's into this StringBuilder
                 StringBuilder compacted = new StringBuilder();
        
                 // Iterate all TextNodes that were adjacent, put them together into StringBuilder
                 for (int j=firstPos; j < i; j++) compacted.append(html.elementAt(j).str);
        
                 // Place this new "aggregate TextNode" at location of the first TextNode that
                 // was compacted into this StringBuilder
                 html.setElementAt(new TextNode(compacted.toString()), firstPos);
        
                 // Remove the rest of the positions in the Vector that had TextNode's.  These have
                 // all been put together into the "Aggregate TextNode" at position "firstPos"
                 Util.removeRange(html, firstPos + 1, i);
        
                 // The change in the size of the Vector needs to be accounted for.
                 delta += (i - firstPos - 1);
        
                 // Change the loop-counter variable, too, since the size of the Vector has changed.
                 i = firstPos + 1;
        
                 // Since we just hit a CommentNode, or TagNode, exit "Compacting Mode."
                 compacting = false;
        
             } else compacting = false;
                 // NOTE: This, ALSO, MUST BE a TagNode or CommentNode (just like the previous
                 //       if-else branch !)
                 // TRICKY: Don't forget this 'else' !
        
         // Added - Don't forget the case where the Vector ends with a series of TextNodes
         // TRICKY TOO! (Same as the HTML Parser... The ending or 'trailing' nodes must be parsed
         int lastNodePos = html.size() - 1;
         if (html.elementAt(lastNodePos).isTextNode()) if (compacting && (firstPos < lastNodePos))
         {
             StringBuilder compacted = new StringBuilder();
        
             // Compact the TextNodes that were identified at the end of the Vector range.
             for (int j=firstPos; j <= lastNodePos; j++) compacted.append(html.elementAt(j).str);
        
             // Replace the group of TextNode's at the end of the Vector, with the single, aggregate
             html.setElementAt(new TextNode(compacted.toString()), firstPos);
             Util.removeRange(html, firstPos + 1, lastNodePos + 1);
         }
        
         return delta;
        
      • countNewLines

        public static int countNewLines​(java.util.Vector<? extends HTMLNode> html,
                                        int sPos,
                                        int ePos)
        This will count the number of new-line symbols present - on the partial HTML page. The count will include a sum of every instanceof TextNode.str that contains the standard new-line symbols: \r\n, \r, \n, meaning that UNIX, MSFT, Apple, etc. forms of text-line rendering should all be treated equally.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of new-line characters in all of the TextNode's that occur between vectorized-page positions 'sPos' and 'ePos.'

        NOTE: The regular-expression used here 'NEWLINEP' is as follows:
        1
        2
         private static final Pattern NEWLINEP = Pattern.compile("\\r\\n|\\r|\\n");
         
        
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        StringParse.NEWLINEP
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
         int newLineCount    = 0;
         LV  l               = new LV(html, sPos, ePos);
        
         for (int i=l.start; i < l.end; i++)
        
             if (html.elementAt(i).isTextNode())
        
                 // Uses the Torello.Java.StringParse "New Line RegEx"
                 for (   Matcher m = StringParse.NEWLINEP.matcher(html.elementAt(i).str);
                         m.find();
                         newLineCount++);
        
         return newLineCount;
        
      • countTextNodes

        public static int countTextNodes​(java.util.Vector<HTMLNode> page,
                                         int sPos,
                                         int ePos)
        Counts the number of TextNode's in a Vector<HTMLNode> between the demarcated array / Vector positions, 'sPos' and 'ePos'
        Parameters:
        page - Any HTML page.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of TextNode's in the Vector between the demarcated indices.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
        7
        8
         int counter = 0;
         LV  l       = new LV(page, sPos, ePos);
        
         // Iterates the entire page between sPos and ePos, incrementing the count for every
         // instance of text-node.
         for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTextNode()) counter++;
        
         return counter;
        
      • countCommentNodes

        public static int countCommentNodes​(java.util.Vector<HTMLNode> page,
                                            int sPos,
                                            int ePos)
        Counts the number of CommentNode's in an Vector<HTMLNode> between the demarcated array / Vector positions.
        Parameters:
        page - Any HTML page.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of CommentNode's in the Vector between the demarcated indices.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
         int counter = 0;
         LV  l       = new LV(page, sPos, ePos);
        
         // Iterates the entire page between sPos and ePos, incrementing the count for every
         // instance of comment-node.
         for (int i=l.start; i < l.end; i++)
              if (page.elementAt(i).isCommentNode())
                 counter++;
        
         return counter;
        
      • countTagNodes

        public static int countTagNodes​(java.util.Vector<HTMLNode> page,
                                        int sPos,
                                        int ePos)
        Counts the number of TagNode's in a Vector<HTMLNode> between the demarcated array / Vector positions.
        Parameters:
        page - Any HTML page.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The number of TagNode's in the Vector.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
        7
        8
         int counter = 0;
         LV  l       = new LV(page, sPos, ePos);
        
         // Iterates the entire page between sPos and ePos, incrementing the count for every
         // instance of TagNode.
         for (int i=l.start; i < l.end; i++) if (page.elementAt(i).isTagNode()) counter++;
        
         return counter;
        
      • strLength

        public static int strLength​(java.util.Vector<? extends HTMLNode> html,
                                    int sPos,
                                    int ePos)
        This method simply adds / sums the String-length of every HTMLNode.str field in the passed page-Vector. It only counts nodes between parameters sPos (inclusive) and ePos (exclusive).
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        The total length - in characters - of the sub-page of HTML between 'sPos' and 'ePos'
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        strLength(Vector)
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         int ret = 0;
         LV  l   = new LV(html, sPos, ePos);
        
         for (int i=l.start; i < l.end; i++) ret += html.elementAt(i).str.length();
        
         return ret;
        
      • hashCode

        public static int hashCode​(java.util.Vector<? extends HTMLNode> html,
                                   DotPair dp)
        Convenience Method. Receives DotPair and Invokes hashCode(Vector, int, int)
        Code:
        Exact Method Body:
        1
         return hashCode(html, dp.start, dp.end + 1);
        
      • hashCode

        public static int hashCode​(java.util.Vector<? extends HTMLNode> html,
                                   int sPos,
                                   int ePos)
        Generates a hash-code for a vectorized html page-Vector.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        Returns the String.hashCode() of the partial HTML-page as if it were not being stored as a Vector, but rather as HTML inside of a Java-String.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        hashCode(Vector)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
         int h   = 0;
         LV  lv  = new LV(html, sPos, ePos);
        
         for (int j=lv.start; j < lv.end; j++)
         {
             String  s = html.elementAt(j).str;
             int     l = s.length();
        
             // This line has been copied from the jdk8/jdk8 "String.hashCode()" method.
             // The difference is that it iterates over the entire vector
             for (int i=0; i < l; i++) h = 31 * h + s.charAt(i);
         }
        
         return h;
        
      • removeStyleNodeBlocks

        public static int removeStyleNodeBlocks​
                    (java.util.Vector<? extends HTMLNode> html)
        
        Removes all HTML 'style' Node blocks.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        Returns:
        The number of <STYLE>-Node Blocks that were removed
        See Also:
        TagNodeRemoveInclusive.first(Vector, String[])
        Code:
        Exact Method Body:
        1
        2
        3
         int removeCount = 0;
         while (TagNodeRemoveInclusive.first(html, "style") > 0) removeCount++;
         return removeCount;
        
      • removeScriptNodeBlocks

        public static int removeScriptNodeBlocks​
                    (java.util.Vector<? extends HTMLNode> html)
        
        Removes all 'script' Node blocks.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        Returns:
        The number of SCRIPT-Node Blocks that were removed
        See Also:
        TagNodeRemoveInclusive.first(Vector, String[])
        Code:
        Exact Method Body:
        1
        2
        3
         int removeCount = 0;
         while (TagNodeRemoveInclusive.first(html, "script") > 0) removeCount++;
         return removeCount;
        
      • getJSONScriptBlocks

        public static java.util.stream.Stream<java.lang.String> getJSONScriptBlocks​
                    (java.util.Vector<HTMLNode> html,
                     int sPos,
                     int ePos)
        
        This method shall search for any and all <SCRIPT TYPE="json"> JSON TEXT </SCRIPT> block present in a range of Vectorized HTML. The search method shall simply look for the toke "JSON" in the TYPE attribute of each and every <SCRIPT> TagNode that is found on the page. The validity of the JSON found within such blocks is not checked for validity, nor is it even guaranteed to be JSON data!
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        This will return a java.util.stream.Stream<String> of each of the JSON elements present in the specified range of the Vectorized HTML passed to parameter 'html'.

        Conversion-Target Stream-Method Invocation
        String[] Stream.toArray(String[]::new);
        List<String> Stream.collect(Collectors.toList());
        Vector<String> Stream.collect(Collectors.toCollection(Vector::new));
        TreeSet<String> Stream.collect(Collectors.toCollection(TreeSet::new));
        Iterator<String> Stream.iterator();
        See Also:
        StrTokCmpr.containsIgnoreCase(String, Predicate, String), rangeToString(Vector, int, int)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
         // Whenever building lists, it is usually easiest to use a Stream.Builder
         Stream.Builder<String> b = Stream.builder();
        
         // This Predicate simply tests that if the substring "json" (CASE INSENSITIVE) is found
         // in the TYPE attribute of a <SCRIPT TYPE=...> node, that the token-string is, indeed a
         // word - not a substring of some other word.  For instance: TYPE="json" would PASS, but
         // TYPE="rajsong" would FAIL - because the token string is not surrounded by white-space
        
         final Predicate<String> tester = (String s) ->
             StrTokCmpr.containsIgnoreCase(s, (Character c) -> ! Character.isLetterOrDigit(c), "json");
        
         // Find all <SCRIPT> node-blocks whose "TYPE" attribute abides by the tester String-predicate
         // named above.
         Vector<DotPair> jsonDPList = InnerTagFindInclusive.all
             (html, sPos, ePos, "script", "type", tester);
        
         // Convert each of these DotPair element into a java.lang.String
         // Add the String to the Stream.Builder<String>
         for (DotPair jsonDP : jsonDPList)
             if (jsonDP.size() > 2)
                 b.accept(Util.rangeToString(html, jsonDP.start + 1, jsonDP.end));
        
         // Build the Stream, and return it.
         return b.build();
        
      • insertNodes

        public static void insertNodes​(java.util.Vector<HTMLNode> html,
                                       int pos,
                                       HTMLNode... nodes)
        Inserts nodes, and allows a 'varargs' parameter.
        Parameters:
        html - Any HTML Page
        pos - The position in the original Vector where the nodes shall be inserted.
        nodes - A list of nodes to insert.
        Code:
        Exact Method Body:
        1
        2
        3
         Vector<HTMLNode> nodesVec = new Vector<>(nodes.length);
         for (HTMLNode node : nodes) nodesVec.addElement(node);
         html.addAll(pos, nodesVec);
        
      • removeNodesOPT

        public static <T extends HTMLNode> void removeNodesOPT​
                    (java.util.Vector<T> page,
                     int... posArr)
        
        OPT: Optimized

        This method does the same thing as the 'removeNodes(Vector, int[])' method, but all error checking is skipped, and the input integer array is presumed to have been sorted. There are no guarantees about the behavior if the input int[] posArr contains unsorted (least-to-greatest) values, or if there are duplicates or negative values in this array.

        NOTE: If the var-args input integer-array parameter is empty, this method shall exit gracefully, and immediately.
        Parameters:
        page - Any HTML-Page, usually ones generated by HTMLPage.getPageTokens(...), but these may be obtained or created in any fashion so necessary.
        posArr - An array of integers which list/identify the nodes in the page to be removed. Because this implementation has been optimized, no error checking will be performed on this input. It is presumed to be sorted, least-to-greatest, and that all values in the array are valid-indices into the vectorized-html parameter 'page'
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
         if (posArr.length == 0) return;
        
         int endingInsertPos = page.size() - posArr.length;
         int posArrIndex     = 0;
         int insertPos       = posArr[0];
         int retrievePos     = posArr[0];
        
         // There is very little that can be documented about these two loops.  Took 3 hours
         // to figure out.  Read the variables names for "best documentation"
        
         while (insertPos < endingInsertPos)
         {
             // This inner-loop is necessary for when the posArr has consecutive-elements that
             // are *ALSO* consecutive-pointers.
             //
             // For instance, this invokation:
             // Util.removeNodes(page, 4, 5, 6); ...
             //      where 4, 5, and 6 are consecutive - the inner while-loop is required.
             //
             // For this invokation: 
             // Util.removeNodes(page, 2, 4, 6); 
             //      the inner-loop is not entered.
        
             while ((posArrIndex < posArr.length) && (retrievePos == posArr[posArrIndex]))
             { retrievePos++; posArrIndex++; }
        
             page.setElementAt(page.elementAt(retrievePos++), insertPos++);
         }
        
         // Remove all remaining elements in the tail of the array.
         page.setSize(page.size() - posArr.length);
        
      • removeNodes

        public static <T extends HTMLNode> void removeNodes​
                    (boolean preserveInputArray,
                     java.util.Vector<T> page,
                     int... nodeList)
        
        This method remove each HTMLNode from the passed-parameter 'page' listed/identified by the input array 'nodeList'.

        NOTE: If the var-args input integer-array parameter is empty, this method shall exit gracefully, and immediately.
        Parameters:
        preserveInputArray - This is a convenience input parameter that allows a programmer to "preserve" the original input-parameter integer-array that is passed to this method. It could be argued this parameter is "superfluous" - however, keep in mind that the passed parameter 'nodeList' must be sorted before this method is able function properly. There is a sort that's performed within the body of this method. Just in case that the original order of the integer-array input-parameter must be preserved, its possible to request for the sort to operate on "a clone" of the input-parameter integer-array, instead of the original integer-array 'nodeList' itself.
        page - Any HTML-Page, usually ones generated by HTMLPage.getPageTokens(...), but these may be obtained or created in any fashion so necessary.
        nodeList - An array of integers which list/identify the nodes in the page to be removed.
        Throws:
        java.lang.IllegalArgumentException - If the 'nodeList' contains duplicate entries. Obviously, no HTMLNode may be removed from the Vector<HTMLNode> more than once.
        java.lang.IndexOutOfBoundsException - If the nodeList contains index-pointers / items that are not within the bounds of the passed HTML-Page Vector.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
         if (nodeList.length == 0) return;
        
         // @Safe Var Args
         int[]   posArr  = preserveInputArray ? nodeList.clone() : nodeList;
         int     len     = posArr.length;
        
         Arrays.sort(posArr);
        
         // Check for duplicates in the nodeList, no HTMLNode may be removed twice!
         for (int i=0; i < (len - 1); i++)
             if (posArr[i] == posArr[i+1]) throw new IllegalArgumentException(
                 "The input array contains duplicate items, this is not allowed.\n" +
                 "This is since each array-entry is intended to be a pointer/index for items to " +
                 "be removed.\nNo item can possibly be removed twice.!"
             );
        
         // Make sure all nodes are within the bounds of the original Vector.  (no negative indexes,
         // no indexes greater than the size of the Vector)
         if ((posArr[0] < 0) || (posArr[len - 1] >= page.size()))
             throw new IndexOutOfBoundsException (
                 "The input array contains entries which are not within the bounds of the " +
                 "original-passed Vector.\nHTMLPage Vector has: " + page.size() + " elements.\n" +
                 "Maximum element in the nodeList is [" + posArr[len - 1] + "], and the minimum " +
                 "element is: [" + posArr[0] + "]"
             );
        
         int endingInsertPos = page.size() - posArr.length;
         int posArrIndex     = 0;
         int insertPos       = posArr[0];
         int retrievePos     = posArr[0];
        
         // There is very little that can be documented about these two loops.  Took 3 hours
         // to figure out.  Read the variables names for "best documentation"
        
         while (insertPos < endingInsertPos)
         {
             // This inner-loop is necessary for when the posArr has consecutive-elements that
             // are *ALSO* consecutive-pointers.
             //
             // For instance, this invocation:
             // Util.removeNodes(page, 4, 5, 6);
             //      where 4, 5, and 6 are consecutive - the inner while-loop is required.
             //
             // For this invocation: 
             // Util.removeNodes(page, 2, 4, 6);
             //      the inner-loop is not entered.
             while ((posArrIndex < posArr.length) && (retrievePos == posArr[posArrIndex])) 
             { retrievePos++; posArrIndex++; }
        
             page.setElementAt(page.elementAt(retrievePos++), insertPos++);
         }
        
         // Remove all remaining elements in the tail of the array.
         page.setSize(page.size() - posArr.length);
        
      • replaceRange

        public static void replaceRange​(java.util.Vector<HTMLNode> page,
                                        int sPos,
                                        int ePos,
                                        java.util.Vector<HTMLNode> newNodes)
        Replaces any all and all HTMLNode's located between the Vector locations 'sPos' (inclusive) and 'ePos' (exclusive). By exclusive, this means that the HTMLNode located at positon 'ePos' will not be replaced, but the one at 'sPos' is replaced.

        The size of the Vector will change by newNodes.size() - (ePos + sPos). The contents situated between Vector location sPos and sPos + newNodes.size() will, indeed, be the contents of the 'newNodes' parameter.
        Parameters:
        page - Any Java HTML page, constructed of HTMLNode (TagNode & TextNode)
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        newNodes - Any Java HTML page-Vector of HTMLNode.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        pollRange(Vector, int, int), removeRange(Vector, int, int), replaceRange(Vector, DotPair, Vector)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
         // Torello.Java.LV
         LV l = new LV(sPos, ePos, page);
        
         int oldSize     = ePos - sPos;
         int newSize     = newNodes.size();
         int insertPos   = sPos;
         int i           = 0;
        
         while ((i < newSize) && (i < oldSize))
             page.setElementAt(newNodes.elementAt(i++), insertPos++);
        
         if (newSize == oldSize) return;
        
         if (newSize < oldSize)  // The new Vector is SMALLER than the old sub-range
                                 // The rest of the nodes just need to be trashed
             Util.removeRange(page, insertPos, ePos);
         else                    // The new Vector is BIGGER than the old sub-range
                                 // There are still more nodes to insert.
             page.addAll(ePos, newNodes.subList(i, newSize));
        
      • removeRange

        public static <T extends HTMLNode> int removeRange​
                    (java.util.Vector<T> page,
                     int sPos,
                     int ePos)
        
        Java's java.util.Vector class does not allow public access to the removeRange(start, end) function. It is protected in Java's Documentation about the Vector class. This method does exactly that, nothing else.
        Parameters:
        page - Any Java HTML page, constructed of HTMLNode (TagNode & TextNode)
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        the number of nodes removed.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        pollRange(Vector, int, int), removeRange(Vector, DotPair)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
         // Torello.Java.LV
         LV  l   = new LV(sPos, ePos, page);
        
         // Shift the nodes in position Vector[l.end through page.size()] to vector-position
         // Vector[l.start]
         int end = page.size() - l.end - 1;
        
         for (int i=0; i <= end; i++) page.setElementAt(page.elementAt(l.end + i), l.start + i);
        
         // Number of nodes to remove
         int numToRemove = l.end - l.start;
        
         // Remove the tail - all nodes starting at:
         // vector-position[page.size() - (l.end - l.start)]
         page.setSize(page.size() - numToRemove);
        
         return numToRemove;
        
      • pollRange

        public static java.util.Vector<HTMLNodepollRange​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos)
        
        Java's java.util.Vector class does not allow public access to the removeRange(start, end) function. It is listed as 'protected' in Java's Documentation about the class Vector. This method upstages that, and performs the 'Poll' operation, where the nodes are first removed, stored, and then return as a function result.

        FURTHERMORE: The nodes that are removed are placed in a separate return Vector, and returned as a result to this method.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        A complete list (Vector<HTMLNode>) of the nodes that were removed.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        removeRange(Vector, int, int), removeRange(Vector, DotPair), pollRange(Vector, DotPair)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
         LV                  l   = new LV(html, sPos, ePos);
         Vector<HTMLNode>    ret = new Vector<HTMLNode>(l.end - l.start);
        
         // Copy the elements from the input vector into the return vector
         for (int i=l.start; i < l.end; i++) ret.add(html.elementAt(i));
        
         // Remove the range from the input vector (this is the meaning of 'poll')
         Util.removeRange(html, sPos, ePos);
        
         return ret;
        
      • split

        public static java.util.Vector<HTMLNodesplit​
                    (java.util.Vector<? extends HTMLNode> html,
                     int pos)
        
        This removes every element from the Vector beginning at position 0, all the way to position 'pos' (exclusive). The elementAt(pos) remains in the original page input-Vector. This is the definition of 'exclusive'.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        pos - Any position within the range of the input Vector.
        Returns:
        The elements in the Vector from position: 0 ('zero') all the way to position: 'pos'
        Code:
        Exact Method Body:
        1
         return pollRange(html, 0, pos);
        
      • removeFirstLast

        public static void removeFirstLast​
                    (java.util.Vector<? extends HTMLNode> html)
        
        Removes the first and last element of a vectorized-HTML web-page, or sub-page. Generally, this could be used to remove the surrounding tag's '<DIV>' ... '</DIV>', or something similar.

        IMPORTANT: This method WILL NOT CHECK whether there are matching HTML open-and-close tags at the end beginning and end of this sub-section. Generally, though, that is how this method may be used.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        Throws:
        java.lang.IllegalArgumentException - If the Vector has fewer than two elements.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
         int size = html.size();
        
         if (size < 2) throw new IllegalArgumentException(
             "You have requested that the first and last elements the input 'page' parameter (a vector) be removed.  " +
             "However, the vector size is only [" + size  + "], so this cannot be performed."
         );
        
         // NOTE: *** This removes elementAt(0) and elementAt(size-1)
         //       *** NOT ALL ELEMENTS BETWEEN 0 and (size-1)
         Util.removeNodesOPT(html, 0, size-1);