Package Torello.HTML

Class Util.Inclusive

  • Enclosing class:
    Util

    public static class Util.Inclusive
    extends java.lang.Object
    Util.Inclusive Documentation

    These methods provided in this class will search for an inclusive-match to an input, opening, TagNode. The use user must provide the HTML-Vector containing the opening TagNode, and the six search variants, (Count, Find, Get, Peek, Poll, and Remove each have a method in this class for retrieving the type requested.

    Static (Functional) API: The methods in this class are all (100%) defined with the Java Key-Word / Key-Concept 'static'. Furthermore, there is no way to obtain an instance of this class, because there are no public (nor private) constructors. Java's Spring-Boot, MVC feature is *not* utilized because it flies directly in the face of the light-weight data-classes philosophy. This has many advantages over the rather ornate Component Annotations (@Component, @Service, @AutoWired, etc... 'Java Beans') syntax:

    • The methods here use the key-word 'static' which means (by implication) that there is no internal-state. Without any 'internal state' there is no need for constructors in the first place! (This is often the complaint by MVC Programmers).
    • A 'Static' (Functional-Programming) API expects to use fewer data-classes, and light-weight data-classes, making it easier to understand and to program.
    • The Vectorized HTML data-model allows more user-control over HTML parse, search, update & scrape. Also, memory management, memory leakage, and the Java Garbage Collector ought to be intelligible through the 'reuse' of the standard JDK class Vector for storing HTML Web-Page data.

    The power that object-oriented programming extends to a user is (mostly) limited to data-representation. Thinking of "Services" as "Objects" (Spring-MVC, 'Java Beans') is somewhat 'over-applying' the Object Oriented Programming Model. Like most classes in the Java-HTML JAR Library, this class backtracks to a more C-Styled Functional Programming Model (no Objects) - by re-using (quite profusely) the key-word static with all of its methods, and by sticking to Java's well-understood class Vector

    Internal-State: A user may click on this class' source code (see link below) to view any and all internally defined fields class. A cursory inspection of the code would prove that this class has precisely zero internally defined global fields (Spaghetti). All variables used by the methods in this class are local fields only, and therefore this class ought to be though of as 'state-less'.



    • Method Detail

      • find

        public static int find​(java.util.Vector<? extends HTMLNode> html,
                               int nodeIndex)
        This finds the closing HTML 'TagNode' match for a given opening 'TagNode' in a given-input html page or sub-section.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        nodeIndex - An index into that Vector. This index must point to an HTMLNode element that is:

        1. An instance of TagNode
        2. A TagNode whose 'isClosing' field is FALSE
        3. Is not a 'singleton' HTML element-token (i.e. <IMG>, <BR>, <H1>, etc...)
        Returns:
        An "inclusive search" finds OpeningTag and ClosingTag pairs - and returns all the elements between them in the contents of a return-Vector, or Vector DotPair-end-point value. This method will take a particular node of a Vector, and (as long it has a match) find it's closing HTMLNode match. The integer returned will be the index into this page of the closing, matching TagNode.
        Throws:
        TagNodeExpectedException - If the node in the Vector-parameter 'html' contained at index 'nodeIndex' is not an instance of TagNode, then this exception is thrown.
        OpeningTagNodeExpectedException - If the node in the Vector-parameter 'html' at index 'nodeIndex' is a closing version of the HTML element, then this exception shall throw.
        InclusiveException - If the node in Vector-parameter 'html', pointed-to by index 'nodeIndex' is an HTML 'Singleton' / Self-Closing Tag, then this exception will be thrown.
        See Also:
        TagNode, TagNode.tok, TagNode.isClosing, HTMLNode
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
         TagNode     tn          = null;
         HTMLNode    n           = null;
         String      tok         = null;
        
         if (! html.elementAt(nodeIndex).isTagNode())
             throw new TagNodeExpectedException (
                 "You have attempted to find a closing tag to match an opening one, " +
                 "but the 'nodeIndex' (" + nodeIndex + ") you have passed doesn't contain " +
                 "an instance of TagNode."
             );
         else tn = (TagNode) html.elementAt(nodeIndex);
        
         if (tn.isClosing) throw new OpeningTagNodeExpectedException(
             "The TagNode indicated by 'nodeIndex' = " + nodeIndex + " has its 'isClosing' " +
             "boolean as TRUE - this is not an opening TagNode, but it must be to continue."
         );
        
         // Checks to ensure this token is not a 'self-closing' or 'singleton' tag.
         // If it is an exception shall throw.
         tok = tn.tok;
         InclusiveException.check(tok);
        
         int         end         = html.size();
         int         openCount   = 1;
        
         for (int pos = nodeIndex; pos < end; pos++)
             if ((n = html.elementAt(pos)).isTagNode())
                 if ((tn = ((TagNode) n)).tok.equals(tok))
                 {
                     openCount += tn.isClosing ? -1 : 1;
                     if (openCount == 0) return pos;
                 }
        
         return -1;
        
      • get

        public static java.util.Vector<HTMLNodeget​
                    (java.util.Vector<? extends HTMLNode> html,
                     int nodeIndex)
        
        Convenience Method. Invokes find(Vector, int).

        Converts output to 'GET' format (Vector-sublist), using Util.cloneRange(Vector, int, int)
        Code:
        Exact Method Body:
        1
        2
         int endPos = find(html, nodeIndex);
         return (endPos == -1) ? null : cloneRange(html, nodeIndex, endPos + 1);
        
      • peek

        public static SubSection peek​(java.util.Vector<? extends HTMLNode> html,
                                      int nodeIndex)
        Convenience Method. Invokes find(Vector, int).

        Converts output to 'PEEK' format (SubSection), using Util.cloneRange(Vector, int, int)
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         int endPos = find(html, nodeIndex);
        
         return (endPos == -1) ? null : new SubSection(
             new DotPair(nodeIndex, endPos),
             cloneRange(html, nodeIndex, endPos + 1)
         );
        
      • poll

        public static java.util.Vector<HTMLNodepoll​
                    (java.util.Vector<? extends HTMLNode> html,
                     int nodeIndex)
        
        Convenience Method. Invokes find(Vector, int).

        Converts output to 'POLL' format (Vector-sublist), using Util.pollRange(Vector, int, int). Removes Sub-List.
        Code:
        Exact Method Body:
        1
        2
         int endPos = find(html, nodeIndex);
         return (endPos == -1) ? null : pollRange(html, nodeIndex, endPos + 1);
        
      • remove

        public static int remove​(java.util.Vector<? extends HTMLNode> html,
                                 int nodeIndex)
        Convenience Method. Invokes find(Vector, int).

        Converts output to 'REMOVE' format (int - number of nodes removed), using Util.removeRange(Vector, int, int). Removes Sub-List.
        Code:
        Exact Method Body:
        1
        2
         int endPos = find(html, nodeIndex);
         return (endPos == -1) ? 0 : removeRange(html, nodeIndex, endPos + 1);
        
      • vectorOPT

        public static java.util.Vector<HTMLNodevectorOPT​
                    (java.util.Vector<? extends HTMLNode> html,
                     int tagPos)
        
        Convenience Method. Invokes dotPairOPT(Vector, int).

        Converts output to Vector<HTMLNode>.
        Code:
        Exact Method Body:
        1
        2
        3
         DotPair dp = dotPairOPT(html, tagPos);
         if (dp == null) return null;
         else            return Util.cloneRange(html, dp.start, dp.end + 1);
        
      • subSectionOPT

        public static SubSection subSectionOPT​
                    (java.util.Vector<? extends HTMLNode> html,
                     int tagPos)
        
        Convenience Method. Invokes dotPairOPT(Vector, int).

        Converts output to SubSection.
        Code:
        Exact Method Body:
        1
        2
        3
         DotPair dp = dotPairOPT(html, tagPos);
         if (dp == null) return null;
         else            return new SubSection(dp, Util.cloneRange(html, dp.start, dp.end + 1));
        
      • dotPairOPT

        public static DotPair dotPairOPT​
                    (java.util.Vector<? extends HTMLNode> html,
                     int tagPos)
        
        OPT: Optimized Which means this method expects that any parameter-error checking has already been performed.

        There are no error-checks, nor validity-checks performed on the input to this method. This is a heavily-used, internally-used method for this package. Originally, this was included in the internal-helper set of classes for the Node-Search package.

        PURPOSE AND USE: This method expects to receive a vectorized-html page, or sub-page, along with a valid-index into that page pointing to an instance of a TagNode. The TagNode instance is expected to be BOTH an OpeningTag, and a non-singleton (non-self-closing) HTML Element. This method finds the corresponding "closing, matching, paired" TagNode HTML Element. For instance, a "<DIV ..."> HTML element is matched to it's corresponding "</DIV>" element, and an "<A ...>" element to it's closing "</A>" element.

        This method is heavily used in any class in the Node-Search Package that contains or uses the word 'inclusive.' This is because 'inclusive' is closely-similar to the "java-script function" '.innerHTML' All three of the following optimization methods perform identical tasks, but have different return types (of similar / identical data):

        • public static DotPair inclusiveDotPairOPT(Vector, int, int) - which returns the matching 'innerHTML' as an index-pointer pair.
        • public static Vector<HTMLNode> inclusiveVectorOPT(Vector, int, int) - which returns the matching 'innerHTML' as cloned-copy of the Vector-sublist as new instance of 'Vector<HTMLNode>'.
        • public static DotPair inclusiveDotPairOPT(Vector, int, int) - which returns the matching 'innerHTML' as cloned-copy of the Vector-sublist combined-with it's DotPair (both the 'Vector' clone and the 'DotPair' index-pointers are returned, together, as an instance of SubSection).
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        tagPos - This may be any valid position within this html-Vector, and for obvious reasons it must both be positive, and less than the size of the Vector. It must also point to a valid MObject-reference to an instance of class TagNode.
        Returns:
        A 'DotPair' version of an inclusive, end-to-end HTML tag-element.

        Again, there is a strong similarity between the term "inclusive-match" and the java-script Object-field 'innerHTML.' Both of these terms essentially refer to a block of HTML code that begins with a non-singleton HTML element (like a <DIV> - divider) that has an opening-tag: <DIV> and a closing-tag </DIV> - and includes all HTMLNode's between these.
        See Also:
        TagNode, TagNode.isClosing, TagNode.tok, DotPair
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
         // Temp Variables
         HTMLNode n;		TagNode tn;		int openCount = 1;
        
         int len = html.size();
        
         // This is the name (token) of the "Opening HTML Element", we are searching for
         // the matching, closing element
         String tok = ((TagNode) html.elementAt(tagPos)).tok;
        
         for (int i = (tagPos+1); i < len; i++)
             if ((n = html.elementAt(i)).isTagNode())
                 if ((tn = (TagNode) n).tok.equals(tok))
                 {
                     // This keeps a "Depth Count" - where "depth" is just the number of 
                     // opened tags, for which a matching, closing tag hasn't been found yet.
                     openCount += (tn.isClosing ? -1 : 1);
        
                     // When all open-tags of the specified HTML Element 'tok' have been
                     // found, search has finished.
                     if (openCount == 0) return new DotPair(tagPos, i);
                 }
        
         // Was not found
         return null;
        
      • vectorOPT

        public static java.util.Vector<HTMLNodevectorOPT​
                    (java.util.Vector<? extends HTMLNode> html,
                     int tagPos,
                     int end)
        
        Convenience Method. Invokes dotPairOPT(Vector, int, int).

        Converts output to Vector<HTMLNode>.
        Code:
        Exact Method Body:
        1
        2
        3
         DotPair dp = dotPairOPT(html, tagPos, end);
         if (dp == null) return null;
         else            return Util.cloneRange(html, dp.start, dp.end + 1);
        
      • subSectionOPT

        public static SubSection subSectionOPT​
                    (java.util.Vector<? extends HTMLNode> html,
                     int tagPos,
                     int end)
        
        Convenience Method. Invokes dotPairOPT(Vector, int, int).

        Converts output to SubSection.
        Code:
        Exact Method Body:
        1
        2
        3
         DotPair dp = dotPairOPT(html, tagPos, end);
         if (dp == null) return null;
         else            return new SubSection(dp, Util.cloneRange(html, dp.start, dp.end + 1));
        
      • dotPairOPT

        public static DotPair dotPairOPT​
                    (java.util.Vector<? extends HTMLNode> html,
                     int tagPos,
                     int end)
        
        OPT: Optimized Which means this method expects that any parameter-error checking has already been performed.

        There are no error-checks, nor validity-checks performed on the input to this method. This is a heavily-used, internally-used method for this package. Originally, this was included in the internal-helper set of classes for the Node-Search package.

        PURPOSE AND USE: This method expects to receive a vectorized-html page, or sub-page, along with a valid-index into that page pointing to an instance of a TagNode. The TagNode instance is expected to be BOTH an OpeningTag, and a non-singleton (non-self-closing) HTML Element. This method finds the corresponding "closing, matching, paired" TagNode HTML Element. For instance, a "<DIV ..."> HTML element is matched to it's corresponding "</DIV>" element, and an "<A ...>" element to it's closing "</A>" element.

        This method is heavily used in any class in the Node-Search Package that contains or uses the word 'inclusive.' This is because 'inclusive' is closely-similar to the "java-script function" '.innerHTML' All three of the following optimization methods perform identical tasks, but have different return types (of similar / identical data):

        • public static DotPair inclusiveDotPairOPT(Vector, int, int) - which returns the matching 'innerHTML' as an index-pointer pair.
        • public static Vector<HTMLNode> inclusiveVectorOPT(Vector, int, int) - which returns the matching 'innerHTML' as cloned-copy of the Vector-sublist as new instance of 'Vector<HTMLNode>'.
        • public static DotPair inclusiveDotPairOPT(Vector, int, int) - which returns the matching 'innerHTML' as cloned-copy of the Vector-sublist combined-with it's DotPair (both the 'Vector' clone and the 'DotPair' index-pointers are returned, together, as an instance of SubSection).
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        tagPos - This may be any valid position within this html-Vector, and for obvious reasons it must both be positive, and less than the size of the Vector. It must also point to a valid MObject-reference to an instance of class TagNode.
        end - This is a "loop-variable" instance that establishes an ending-perimeter around the search-location for finding an inclusive-match. (As an aside, it essentially maps to int ePos' in all of the node-search methods). If a complete end-to-end open-and-close "inclusive-match" is not found within the perimeter of 'tagPos' and 'end', then a 'null' shall be returned.
        Returns:
        A 'DotPair' version of an inclusive, end-to-end HTML tag-element.

        Again, there is a strong similarity between the term "inclusive-match" and the java-script Object-field 'innerHTML.' Both of these terms essentially refer to a block of HTML code that begins with a non-singleton HTML element (like a <DIV> - divider) that has an opening-tag: <DIV> and a closing-tag </DIV> - and includes all HTMLNode's between these.
        See Also:
        TagNode, TagNode.isClosing, TagNode.tok, DotPair
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
         // Temp Variables
         HTMLNode n;		TagNode tn;		int openCount = 1;		int endPos;
        
         // This is the name (token) of the "Opening HTML Element", we are searching for
         // the matching, closing element
         String tok = ((TagNode) html.elementAt(tagPos)).tok;
        
         for (endPos = (tagPos+1); endPos < end; endPos++)
             if ((n = html.elementAt(endPos)).isTagNode())
                 if ((tn = (TagNode) n).tok.equals(tok))
                 {
                     // This keeps a "Depth Count" - where "depth" is just the number of
                     // opened tags, for which a matching, closing tag hasn't been found yet.
                     openCount += (tn.isClosing ? -1 : 1);
        
                     // System.out.print(".");
        
                     // When all open-tags of the specified HTML Element 'tok' have been
                     // found, search has finished.
                     if (openCount == 0) return new DotPair(tagPos, endPos);
                 }
        
         // The end of the vectorized-html page (or subsection) was reached, but the
         // matching-closing element was not found.
         return null; // assert(endPos == html.size());