Package Torello.HTML

Class Balance


  • public class Balance
    extends java.lang.Object
    Balance - Documentation.

    This class allows a programmer to check some aspects of HTML validity. Given that different features of HTML do not all work in different browsers - there really isn't any concept at all that which people would agree regarding validity. Strictly speaking, it would be reasonable to check the attributes of each and every TagNode to see if they conform to the expected tags of an HTML Element, but none of the browsers would fault a page with an 'HREF' tag in an 'IMG' (Image) Element. When it comes to parsing HTML Strings, and converting them into vectorized-pages, the easiest and most powerful check to perform is whether or not opening tag-elements are closed.

    It is not mandatory in browsers to close all HTML Tags; page with unclosed tags will still continue to render the content regardless of how well the HTML is formed. Most of the large websites contain HTML that is generated by a computer, and doesn't really fail to close HTML tags at all. The methods in this class produce an "Open and Closed Count" for each Tag that the user has requested be counted. For instance, an HTML page, or sub-section of a page, that contained 5 open HTML 'DIV' (divider) Elements, but only 4 closing dividers, would return a Hashtable with a 'DIV' property equal to '1'. If an HTML Element has an entry in a Hashtable with a value of '0', it means that the HTML Element was 'balanced' on that particular page. A 'balanced' HTML Element / Tag has an equal number of opening and closing elements on the page.

    Static (Functional) API: The methods in this class are all (100%) defined with the Java Key-Word / Key-Concept 'static'. Furthermore, there is no way to obtain an instance of this class, because there are no public (nor private) constructors. Java's Spring-Boot, MVC feature is *not* utilized because it flies directly in the face of the light-weight data-classes philosophy. This has many advantages over the rather ornate Component Annotations (@Component, @Service, @AutoWired, etc... 'Java Beans') syntax:

    • The methods here use the key-word 'static' which means (by implication) that there is no internal-state. Without any 'internal state' there is no need for constructors in the first place! (This is often the complaint by MVC Programmers).
    • A 'Static' (Functional-Programming) API expects to use fewer data-classes, and light-weight data-classes, making it easier to understand and to program.
    • The Vectorized HTML data-model allows more user-control over HTML parse, search, update & scrape. Also, memory management, memory leakage, and the Java Garbage Collector ought to be intelligible through the 'reuse' of the standard JDK class Vector for storing HTML Web-Page data.

    The power that object-oriented programming extends to a user is (mostly) limited to data-representation. Thinking of "Services" as "Objects" (Spring-MVC, 'Java Beans') is somewhat 'over-applying' the Object Oriented Programming Model. Like most classes in the Java-HTML JAR Library, this class backtracks to a more C-Styled Functional Programming Model (no Objects) - by re-using (quite profusely) the key-word static with all of its methods, and by sticking to Java's well-understood class Vector

    Internal-State: A user may click on this class' source code (see link below) to view any and all internally defined fields class. A cursory inspection of the code would prove that this class has precisely zero internally defined global fields (Spaghetti). All variables used by the methods in this class are local fields only, and therefore this class ought to be though of as 'state-less'.



    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method
      static Hashtable<String,
           ​Integer>
      check​(Vector<? super TagNode> html)
      static int[] check​(Vector<? super TagNode> html, String... htmlTags)
      static Hashtable<String,
           ​Integer>
      checkNonZero​(Hashtable<String,​Integer> ht)
      static int checkTag​(Vector<? super TagNode> html, String htmlTag)
      static Hashtable<String,
           ​int[]>
      depth​(Vector<? super TagNode> html)
      static Hashtable<String,
           ​int[]>
      depth​(Vector<? super TagNode> html, String... htmlTags)
      static Hashtable<String,
           ​int[]>
      depthGreaterThanOne​(Hashtable<String,​int[]> ht)
      static Hashtable<String,
           ​int[]>
      depthInvalid​(Hashtable<String,​int[]> ht)
      static int[] depthTag​(Vector<? super TagNode> html, String htmlTag)
      static Ret2<int[],
           ​int[]>
      locationsAndDepth​(Vector<? super TagNode> html, String htmlTag)
      static int[] nonNestedCheck​(Vector<? super TagNode> html, String htmlTag)
      static String toStringBalance​(int[] balanceCheckReport, String... htmlTags)
      static String toStringBalance​(Hashtable<String,​Integer> balanceCheckReport)
      static String toStringDepth​(Hashtable<String,​int[]> depthReport)
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • check

        public static java.util.Hashtable<java.lang.String,​java.lang.Integer> check​
                    (java.util.Vector<? super TagNode> html)
        
        Creates a Hashtable that has a count of all open and closed HTML tags found on the page.

        This Hashtable may be regarded as maintaining "counts" on each-and-every HTML 5 element to identify whether there is a one-to-one balance mapping between opening and closing tags for each element. When the Hashtable generated by this method produces a number for a particular HTML Element that is not zero, that means there are an unequal number of opening and closing elements for that HTML Element.

        FOR INSTANCE: If this method produced a Hashtable, and it were queried for the count of HTML "divider" elements (<DIV ...> ... </DIV>), if and when the count returned a non-zero positive number, that would mean that the vectorized-html had more opening divider elements ('<DIV ...>') than closing divider elements ('</DIV>').

        VALIDITY NOTE: There are some browser-parse advocates who would state that not all elements must be closed. For instance, there are pages on the internet that will not include closing '</LI>' elements in Ordered-Lists, or Unordered-Lists. This hints at the commonly-heard-phrase "Browser War," and the concept of validity, therefore, is not addressed in this class.

        The following example will help explain the use of this method. If an HTML page needs to be checked to see that all elements are properly opened and closed, this method can be used to return a list of any HTML element tag that does not have an equal number of opening and closing tags. It is important to recognize that the only "validity" that is actually checked is that the exact number of, say, opening '<I>' (italics) elements is exactly equal to the number of closing '</I>' elements. Nothing of the semantics of their use is computed or calculated. In this example, the generated java-doc html page for class TagNode is checked. This example found an "unclosed italics element." To find WHERE, use method nonNestedCheck(Vector, String)

        Example:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
         String                      html    = FileRW.loadFileToString(htmlFileName);
         Vector<HTMLNode>            v       = HTMLPage.getPageTokens(html, false);
         Hashtable<String, Integer>  b       = Balance.check(v);
         StringBuffer                sb      = new StringBuffer();
        
         // This part just prints a text-output to a string buffer, which is printed to the screen.
         for (String key : b.keySet())
         {
             Integer i = b.get(key);
             // Only print keys that had a "non-zero count"
             // A Non-Zero-Count implies Opening-Tag-Count and Closing-Tag-Count are not equal!
             if (i.intValue() != 0) sb.append(key + "\t" + i.intValue() + "\n");
         }
         
         // This example output was: "i   -1", because of an unclosed italics element.
         // NOTE: To find where this unclosed element is, use method: nonNestedCheck(Vector, String)
         
        
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        Returns:
        A Hashtable map of the count of each html-element present in this Vector. For instance, if this Vector had 5 '<A ...>' (Anchor-Link) elements, and 6 '</A>' then this Hashtable would have a String-key 'a' with an integer value of -1.
        See Also:
        FileRW.loadFileToString(String), HTMLPage.getPageTokens(CharSequence, boolean)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
         Hashtable<String, Integer> ht = new Hashtable<>();
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text, and
         // not HTML Comments
         for (Object o : html) if (o instanceof TagNode)
         {
             TagNode tn      = (TagNode) o;
        
             if (HTMLTags.isSingleton(tn.tok))   continue;
                 // Singleton tags are also known as 'self-closing' tags.  BR, HR, IMG, etc...
        
             Integer I   = ht.get(tn.tok);
             int     i   = (I != null) ? I.intValue() : 0;
             i           += tn.isClosing ? -1 : 1;
                 // An opening-version (TC.OpeningTags, For Instance <DIV ...>) will ADD 1 to the count
                 // A closing-tag (For Instance: </DIV>) will SUBTRACT 1 from the count
        
             ht.put(tn.tok, Integer.valueOf(i));
                 // Update the return result Hashtable for this particular HTML-Element (tn.tok)
         }
         return ht;
        
      • check

        public static int[] check​(java.util.Vector<? super TagNode> html,
                                  java.lang.String... htmlTags)
        Creates an array that includes an open-and-close 'count' for each html-tag / html-element that was requested via the passed input-parameter String[] htmlTags.

        VALIDITY NOTE: There are some browser-parse advocates who would state that not all elements must be closed. For instance, there are pages on the internet that will not include closing '</LI>' elements in Ordered-Lists, or Unordered-Lists. This hints at the commonly-heard-phrase "Browser War," and the concept of validity, therefore, is not addressed in this class.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully. The HTML-Element Open-Close-Counts will be computed using this page.
        htmlTags - This may be one, or many, html elements whose open-close count needs to be computed. Any HTML 5.0 Element that is not present in this list - will not have a count computed. The count results are stored in an integer array that will be a "parallel array" to this input array.
        Returns:
        An array of the count of each html-element present in the input vectorized-html parameter 'html'. For instance, If the following values were passed to this method:

        • A vectorized-html page that had 5 '<A ...>' (Anchor-Link) open-elements, and 6 '</A>' closing-elements.
        • And at least one of the String's in the parameter 'htmlTags' was the letter 'a'
        • ==> Then the array-position corresponding to the position in array 'htmlTags' that had the 'a' (Anchor-Link) would have a value of '-1'
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If and of the 'htmlTags' are 'singleton' (Self-Closing) Tags, this exception will throw.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
         // Check that these are all valid HTML Tags, throw an exception if not.
         htmlTags = ARGCHECK.htmlTags(htmlTags);
        
         // Temporary Hash-table, used to store the count of each htmlTag
         Hashtable<String, Integer> ht = new Hashtable<>();
        
         // Initialize the temporary hash-table.  This will be discarded at the end of the method,
         // and converted into a parallel array.  (Parallel to the input String... htmlTags array).
         // Also, check to make sure the user hasn't requested a count of Singleton HTML Elements.
         for (String htmlTag : htmlTags)
         {
             if (HTMLTags.isSingleton(htmlTag)) throw new SingletonException(
                 "One of the tags you have passed: [" + htmlTag + "] is a singleton-tag, " +
                 "and is only allowed opening versions of the tag."
             );
        
             ht.put(htmlTag, Integer.valueOf(0));
         }
        
         Integer I;
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text, and
         // not HTML Comments
         for (Object o : html) if (o instanceof TagNode)
         {
             TagNode tn = (TagNode) o;
        
             // Get the current count from the hash-table
             I = ht.get(tn.tok);
        
             // The hash-table only holds elements we are counting, if null, then skip.
             if (I == null) continue;
        
             // Save the new, computed count, in the hash-table
             ht.put(tn.tok, Integer.valueOf(I.intValue() + (tn.isClosing ? -1 : 1)));
                 // An opening-version (TC.OpeningTags, For Instance <DIV ...>) will ADD 1 to the count
                 // A closing-tag (For Instance: </DIV>) will SUBTRACT 1 from the count
         }
        
         // Convert the hash-table to an integer-array, and return this to the user
         int[] ret = new int[htmlTags.length];
        
         for (int i=0; i < ret.length; i++)
             ret[i] = 0;
        
         for (int i=0; i < htmlTags.length; i++)
             if ((I = ht.get(htmlTags[i])) != null) 
                 ret[i] = I.intValue();
            
         return ret;
        
      • checkNonZero

        public static java.util.Hashtable<java.lang.String,​java.lang.Integer> checkNonZero​
                    (java.util.Hashtable<java.lang.String,​java.lang.Integer> ht)
        
        Creates a Hashtable that has a count of all open and closed HTML tags found on the page - whose count-value is not equal to zero. It is smartest to see this as a helper method that will report when there are unbalanced HTML Elements on a page, and not provide any count when all non-singleton HTML Elements found in the HTML have a 1-to-1 open-close tag mapping. I.E. When the count is '0' (zero) - that HTML Element will not have a count in the returned table.K

        VALIDITY NOTE: There are some browser-parse advocates who would state that not all elements must be closed. For instance, there are pages on the internet that will not include closing '</LI>' elements in Ordered-Lists, or Unordered-Lists. This hints at the commonly-heard-phrase "Browser War," and the concept of validity, therefore, is not addressed in this class.

        CLONE NOTE: This method clones the original Hashtable, and removes the elements whose depth was equal zero.. This allows the user to perform other operations with the original values contained by the original table, rather than modifying it.
        Parameters:
        ht - This should be a Hashtable that was produced by a call to one of the two available check(...) methods.
        Returns:
        A Hashtable map of the count of each html-element present in this Vector. For instance, if this Vector had 5 '<A ...>' (Anchor-Link) elements, and six '</A>' then this Hashtable would have a String-key 'a' with an integer value of '-1'.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
         @SuppressWarnings("unchecked")
         Hashtable<String, Integer>  ret     = (Hashtable<String, Integer>) ht.clone();
         Enumeration<String>         keys    = ret.keys();
        
         while (keys.hasMoreElements())
         {
             String key = keys.nextElement();
             if (ret.get(key).intValue() == 0) ret.remove(key);
                 // Remove any keys (HTML element-names) that have a normal ('0') count.
         }
         return ret;
        
      • checkTag

        public static int checkTag​(java.util.Vector<? super TagNode> html,
                                   java.lang.String htmlTag)
        This will compute a count for just one, particular, HTML Element of whether that Element has been properly opened and closed. An open and close count (integer value) will be returned by this method.

        VALIDITY NOTE: There are some browser-parse advocates who would state that not all elements must be closed. For instance, there are pages on the internet that will not include closing '</LI>' elements in Ordered-Lists, or Unordered-Lists. This hints at the commonly-heard-phrase "Browser War," and the concept of validity, therefore, is not addressed in this class.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        htmlTag - This the html element whose open-close count needs to be kept.
        Returns:
        The count of each html-element present in this Vector. For instance, if the user had requested that HTML Anchor Links be counted, and if the input Vector had 5 '<A ...>' (Anchor-Link) elements, and six '</A>' then this method would return -1.
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If this 'htmlTag' is a 'singleton' (Self-Closing) Tag, this exception will throw.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
         // Check that this is a valid HTML Tag, throw an exception if invalid
         htmlTag = ARGCHECK.htmlTag(htmlTag);
        
         if (HTMLTags.isSingleton(htmlTag)) throw new SingletonException(
             "The tag you have passed: [" + htmlTag + "] is a singleton-tag, and is only " +
             "allowed opening versions of the tag."
         );
        
         TagNode tn;     int i = 0;
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text, and
         // not HTML Comments
         for (Object o : html) if (o instanceof TagNode) 
        
             // If we encounter an HTML Element whose tag is the tag whose count we are 
             // computing, then....
             if ((tn = (TagNode) o).tok.equals(htmlTag))
                 i += tn.isClosing ? -1 : 1;
                 // An opening-version (TC.OpeningTags, For Instance <DIV ...>) will ADD 1 to the count
                 // A closing-tag (For Instance: </DIV>) will SUBTRACT 1 from the count
        
         return i;
        
      • depth

        public static java.util.Hashtable<java.lang.String,​int[]> depth​
                    (java.util.Vector<? super TagNode> html)
        
        This method will calculate the "Maximum" and "Minimum" depth for every HTML 5.0 Tag found on a page. The Max-Depth is the "Maximum-Number" of Opening HTML Element Opening Tags were found for a particular element, before a matching closing version of the same Element is encountered. In the example below, the maximum "open-count" for the HTML 'divider' Element (<DIV>) is '2'. This is because a second <DIV> element is opened before the first is closed.

        HTML Elements:
        1
        2
        3
        4
        5
        6
         <DIV class="MySection"><H1>These are my ideas:</H1>
         <!-- Above is an outer divider, below is an inner divider -->
         <DIV class="MyNumbers">Here are the points:
         <!-- HTML Content Here -->
         </DIV></DIV>
         
        


        VALIDITY NOTE: Generally, there are very few elements where the maximum depth should ever be greater than 1. For many standard elements such as the "Anchor Tag" (HTML '<A HREF=...>') having a maximum depth other than 1 would generally be thought of as "Invalid HTML." What to do about such occurrences shall be left to the programmer. Of course, there are elements that commonly reach a depth greater than 1, for instance: '<SPAN STYLE=...>'. Another counter-example would be the HTML 'table' element, which often contains 'inner-tables' within the specific rows and columns of the outer table. In such an HTML page, the elements 'tr', 'td', 'table' and possible 'th' would all have depths greater than 1.

        ALSO NOTE: This maximum and minimum depth count will not pay any attention to whether HTML open and close tags "enclose each-other" or are "interleaved." The actual mechanics of the for-loop performing the count shall hopefully explain the computation, in the included hilited java source-code below.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        Returns:
        The returned Hashtable will contain an integer-array for each HTML Element that was found on the page. Each of these arrays shall be of length 3.

        1. Minimum Depth: return_array[0]
        2. Maximum Depth: return_array[1]
        3. Total Count: return_array[2]


        REDUNDANCY NOTE: The third element of the returned array should be identical to the result produced by an invocation of method: Balance.checkTag(html, htmlTag);
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If this 'htmlTag' is a 'singleton' (Self-Closing) Tag, this exception will throw.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
         Hashtable<String, int[]> ht = new Hashtable<>();
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text, and not HTML Comments
         for (Object o : html) if (o instanceof TagNode) 
         {
             TagNode tn = (TagNode) o;
        
             // Don't keep a count on singleton tags.
             if (HTMLTags.isSingleton(tn.tok)) continue;
        
             int[] curMaxAndMinArr = ht.get(tn.tok);
        
             // If this is the first encounter of a particular HTML Element, create a MAX/MIN
             // integer array, and initialize it's values to zero.
             if (curMaxAndMinArr == null)
             {
                 curMaxAndMinArr = new int[3];
                 curMaxAndMinArr[0] = 0;     // Current Minimum Depth Count for HTML Element "tn.tok" is zero
                 curMaxAndMinArr[1] = 0;     // Current Maximum Depth Count for HTML Element "tn.tok" is zero
                 curMaxAndMinArr[2] = 0;     // Current Computed Depth Count is HTML Element "tn.tok" is zero
                 ht.put(tn.tok, curMaxAndMinArr);
             }
        
             // curCount += tn.isClosing ? -1 : 1;
             curMaxAndMinArr[2] += tn.isClosing ? -1 : 1;
                 // An opening-version (TC.OpeningTags, For Instance <DIV ...>) will ADD 1 to the count
                 // A closing-tag (For Instance: </DIV>) will SUBTRACT 1 from the count
        
             // If the current depth-count is a "New Minimum" (a new low! :), then save it in the
             // minimum pos of the output-array.
             if (curMaxAndMinArr[2] < curMaxAndMinArr[0]) curMaxAndMinArr[0] = curMaxAndMinArr[2];
        
             // If the current depth-count (for this tag) is a "New Maximum" (a new high), save it
             // to the max-pos of the output-array.
             if (curMaxAndMinArr[2] > curMaxAndMinArr[1]) curMaxAndMinArr[1] = curMaxAndMinArr[2];
         }
        
         return ht;
        
      • depth

        public static java.util.Hashtable<java.lang.String,​int[]> depth​
                    (java.util.Vector<? super TagNode> html,
                     java.lang.String... htmlTags)
        
        This method will calculate the "Maximum" and "Minimum" depth for every HTML Tag listed in the var-args String[] htmlTags parameter. The Max-Depth is the "Maximum-Number" of Opening HTML Element Opening Tags were found for a particular element, before a matching closing version of the same Element is encountered. In the example below, the maximum 'open-count' for the HTML 'divider' Element (<DIV>) is '2'. This is because a second <DIV> element is opened before the first is closed.

        HTML Elements:
        1
        2
        3
        4
        5
        6
         <DIV class="MySection"><H1>These are my ideas:</H1>
         <!-- Above is an outer divider, below is an inner divider -->
         <DIV class="MyNumbers">Here are the points:
         <!-- HTML Content Here -->
         </DIV></DIV>
         
        


        VALIDITY NOTE: Generally, there are very few elements where the maximum depth should ever be greater than 1. For many standard elements such as the "Anchor Tag" (HTML '<A HREF=...>') having a maximum depth other than 1 would generally be thought of as "Invalid HTML." What to do about such occurrences shall be left to the programmer. Of course, there are elements that commonly reach a depth greater than 1, for instance: '<SPAN STYLE=...>'. Another counter-example would be the HTML 'table' element, which often contains 'inner-tables' within the specific rows and columns of the outer table. In such an HTML page, the elements 'tr', 'td', 'table' and possible 'th' would all have depths greater than 1.

        ALSO NOTE: This maximum and minimum depth count will not pay any attention to whether HTML open and close tags "enclose each-other" or are "interleaved." The actual mechanics of the for-loop performing the count shall hopefully explain the computation, in the included hilited java source-code below.

        FINALLY: This method differs from a method with an identical name in that it adds a var-args String... htmlTags parameter which allows a user to decide which tags he would like counted and returned in this Hashtable, and which he would like to ignore. ALSO: if one of the requested HTML tags from this var-args String parameter is not actually an HTML Element that is present in the html-page, then there will still be an integer-array value in the Hashtable for that element. However it's values will be equal to zero.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        Returns:
        The returned Hashtable will contain an integer-array for each HTML Element that was found on the page. Each of these arrays shall be of length 3.

        1. Minimum Depth: return_array[0]
        2. Maximum Depth: return_array[1]
        3. Total Count: return_array[2]


        REDUNDANCY NOTE: The third element of the returned array should be identical to the result produced by an invocation of method: Balance.checkTag(html, htmlTag);
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If this 'htmlTag' is a 'singleton' (Self-Closing) Tag, this exception will throw.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
         // Check that these are all valid HTML Tags, throw an exception if not.
         htmlTags = ARGCHECK.htmlTags(htmlTags);
        
         Hashtable<String, int[]> ht = new Hashtable<>();
        
         // Initialize the temporary hash-table.  This will be discarded at the end of the method,
         // and converted into a parallel array.  (Parallel to the input String... htmlTags array).
         // Also, check to make sure the user hasn't requested a count of Singleton HTML Elements.
         for (String htmlTag : htmlTags)
         {
             if (HTMLTags.isSingleton(htmlTag)) throw new SingletonException(
                 "One of the tags you have passed: [" + htmlTag + "] is a singleton-tag, " +
                 "and is only allowed opening versions of the tag."
             );
        
             // Insert an initialized array (init to zero) for this HTML Tag/Token
             int[] arr = new int[3];
             arr[0] = 0;     // Current Minimum Depth Count for HTML Element "tn.tok" is zero
             arr[1] = 0;     // Current Maximum Depth Count for HTML Element "tn.tok" is zero
             arr[2] = 0;     // Current Computed Depth Count is HTML Element "tn.tok" is zero
             ht.put(htmlTag, arr);
         }
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text,
         // and not HTML Comments
         for (Object o: html) if (o instanceof TagNode) 
         {
             TagNode tn = (TagNode) o;
        
             int[] curMaxAndMinArr = ht.get(tn.tok);
        
             // If this is null, we are attempting to perform the count on an HTML Element that
             // wasn't requested by the user with the var-args 'String... htmlTags' parameter.
             // The Hashtable was initialized to only have those tags. (see about 5 lines above 
             // where the Hashtable is initialized)
        
             if (curMaxAndMinArr == null) continue;
        
             curMaxAndMinArr[2] += tn.isClosing ? -1 : 1;
                 // An opening-version (TC.OpeningTags, For Instance <DIV ...>) will ADD 1 to the count
                 // A closing-tag (For Instance: </DIV>) will SUBTRACT 1 from the count
        
             // If the current depth-count is a "New Minimum" (a new low! :), then save it in the
             // minimum pos of the output-array.
             if (curMaxAndMinArr[2] < curMaxAndMinArr[0]) curMaxAndMinArr[0] = curMaxAndMinArr[2];
        
             // If the current depth-count (for this tag) is a "New Maximum" (a new high), save it
             // to the max-pos of the output-array.
             if (curMaxAndMinArr[2] > curMaxAndMinArr[1]) curMaxAndMinArr[1] = curMaxAndMinArr[2];
        
             // NOTE:    No need to update the hash-table, since this is an array - changing its
             //          values is already "reflected" into the Hashtable.
         }
        
         return ht;
        
      • depthInvalid

        public static java.util.Hashtable<java.lang.String,​int[]> depthInvalid​
                    (java.util.Hashtable<java.lang.String,​int[]> ht)
        
        Creates a Hashtable that has a maximum and minimum depth for all HTML tags found on the page. Any HTML Tags that meet ALL of these criteria shall be removed from the result-set Hashtable ...

        • Minimum Depth Is '0' - i.e. closing tag never precedes opening.
        • Count is '0' - i.ei. there is a 1-to-1 ratio of opening and closing tags for the particular HTML Element.


        NOTE: This means that there is a 1:1 ratio of opening and closing versions of the tag, and also that there are no positions in the vector where a closing tag to come before an tag to open it.

        CLONE NOTE: This method clones the original hash-table, and removes the elements whose depth-calculations are invalid - described above. This allows the user to perform other operations with the original values contained by the original table, rather than modifying it.
        Parameters:
        ht - This should be a Hashtable that was produced by a call to one of the two available depth(...) methods.
        Returns:
        This shall a return a list of HTML Tags that are potentially (but not guaranteed to be) invalid.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
         @SuppressWarnings("unchecked")
         Hashtable<String, int[]>    ret     = (Hashtable<String, int[]>) ht.clone();
         Enumeration<String>         keys    = ret.keys();
        
         // Using the "Enumeration" class allows the situation where elements can be removed from
         // the underlying data-structure - while iterating through that data-structure.  This is
         // not possible using a keySet Iterator.
         while (keys.hasMoreElements())
         {
             String  key = keys.nextElement();
             int[]   arr = ret.get(key);
        
             if ((arr[1] >= 0) && (arr[2] == 0)) ret.remove(key);
         }
         return ret;
        
      • depthGreaterThanOne

        public static java.util.Hashtable<java.lang.String,​int[]> depthGreaterThanOne​
                    (java.util.Hashtable<java.lang.String,​int[]> ht)
        
        Creates a Hashtable that has a maximum and minimum depth for all HTML tags found on the page. Any HTML Tags that meet ALL of these criteria, below, shall be removed from the result-set Hashtable ...

        • Maximum Depth is precisely '1' - i.e. Each element of this tag is closed before a second is open.


        CLONE NOTE: This method clones the original Hashtable, and removes the elements whose maximum-depth is not greater than one. This allows the user to perform other operations with the original values retrieved in the original table.
        Parameters:
        ht - This should be a Hashtable that was produced by a call to one of the two available depth(...) methods.
        Returns:
        This shall a return a list of HTML Tags that are potentially (but not guaranteed to be) invalid.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
         @SuppressWarnings("unchecked")
         Hashtable<String, int[]>    ret     = (Hashtable<String, int[]>) ht.clone();
         Enumeration<String>         keys    = ret.keys();
        
         // Using the "Enumeration" class allows the situation where elements can be removed from
         // the underlying data-structure - while iterating through that data-structure.  This is not
         // possible using a keySet Iterator.
         while (keys.hasMoreElements())
         {
             String  key = keys.nextElement();
             int[]   arr = ret.get(key);
        
             if (arr[1] == 1) ret.remove(key);
         }
        
         return ret;
        
      • depthTag

        public static int[] depthTag​(java.util.Vector<? super TagNode> html,
                                     java.lang.String htmlTag)
        This method will calculate the "Maximum" and "Minimum" depth for a particular HTML Tag. The Max-Depth just means the number of Maximum-Number of Opening HTML Element Opening Tags were found, before a matching closing version of the same Element is encountered. For instance: <DIV ...><DIV ..> Some Page</DIV></DIV> has a maximum depth of '2'. This means there is a point in the vectorized-html where there are 2 successive divider elements that are opened, before even one has been closed.

        VALIDITY NOTE: Generally, there are very few elements where the maximum depth should ever be greater than 1. For many standard elements such as the "Anchor Tag" (HTML '<A HREF=...>') having a maximum depth other than 1 would generally be thought of as "Invalid HTML." What to do about such occurrences shall be left to the programmer. Of course, there are elements that commonly reach a depth greater than 1, for instance: '<SPAN STYLE=...>'. Another counter-example would be the HTML 'table' element, which often contains 'inner-tables' within the specific rows and columns of the outer table. In such an HTML page, the elements 'tr', 'td', 'table' and possible 'th' would all have depths greater than 1.

        ALSO NOTE: This maximum and minimum depth count will not pay any attention to whether HTML open and close tags "enclose each-other" or are "interleaved." The actual mechanics of the for-loop performing the count shall hopefully explain the computation, in the included hilited java source-code below.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        htmlTag - This the html element whose maximum and minimum depth-count needs to be computed.
        Returns:
        The returned integer-array, shall be of length 3.

        1. Minimum Depth: return_array[0]
        2. Maximum Depth: return_array[1]
        3. Total Count: return_array[2]


        REDUNDANCY NOTE: The third element of the returned array should be identical to the result produced by an invocation of method: Balance.checkTag(html, htmlTag);
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If this 'htmlTag' is a 'singleton' (Self-Closing) Tag, this exception will throw.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
         // Check that this is a valid HTML Tag, throw an exception if invalid
         htmlTag = ARGCHECK.htmlTag(htmlTag);
        
         if (HTMLTags.isSingleton(htmlTag)) throw new SingletonException
             ("The tag you have passed: [" + htmlTag + "] is a singleton-tag, and is only allowed opening versions of the tag.");
        
         TagNode tn;     int i = 0;      int max = 0;        int min = 0;
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text, and not HTML Comments
         for (Object o : html) if (o instanceof TagNode) 
             if ((tn = (TagNode) o).tok.equals(htmlTag))
             {
                 i += tn.isClosing ? -1 : 1;
                     // An opening-version (TC.OpeningTags, For Instance <DIV ...>) will ADD 1 to the count
                     // A closing-tag (For Instance: </DIV>) will SUBTRACT 1 from the count
        
                 if (i > max) max = i;
                 if (i < min) min = i;
             }
        
         // Generate the output array, and return
         int[] ret = new int[2];
         ret[0] = min;
         ret[1] = max;
         ret[2] = i;
         return ret;
        
      • nonNestedCheck

        public static int[] nonNestedCheck​(java.util.Vector<? super TagNode> html,
                                           java.lang.String htmlTag)
        This will find the (likely) places where the "non-nested HTML Elements" have become nested. For the purposes of finding mismatched elements - such as an unclosed "Italics" Element, or an "Extra" Italics Element - this method will find places where a new HTML Tag has opened before a previous one has been closed - or vice-versa (where there is an 'extra' closed-tag).

        Certainly, if "nesting" is usually acceptable (for instance the HTML divider '<DIV>...</DIV>' construct) then the results of this method would not have any meaning. Fortunately, for the vast majority of HTML Elements <I>, <B>, <A>, etc... nesting the tags is not allowed or encouraged.

        The following example use of this method should make clear the application. If a user has identified that there is an unclosed HTML italics element (<I>...</I>) somewhere on a page, for-example, and that page has numerous italics elements, this method can pinpoint the failure instantly, using this example. Note that the file-name is a Java-Doc generated output HTML file. The documentation for this package received a copious amount of attention due to the sheer number of method-names and class-names used throughout.

        Example:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
         String           fStr    = FileRW.loadFileToString("javadoc/Torello/HTML/TagNode.html");
         Vector<HTMLNode> v       = HTMLPage.getPageTokens(fStr, false);
         int[]            posArr  = Balance.nonNestedCheck(v, "i");
         
         // Below, the class 'Debug' is used to pretty-print the vectorized-html page.  Here, the output will find the lone, 
         // non-closed, HTML italics <I> ... </I> tag-element, and output it to the terminal-window.  The parameter '5' means
         // the nearest 5 elements (in either direction) are printed, in addition to the elements at the indices in the posArr.
         // Parameter 'true' implies that two curly braces are printed surrounding the matched node.
         System.out.println(Debug.print(v, posArr, 5, " Skip a few ", true, Debug::K));
         
        
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        htmlTag - This the html element whose maximum and minimum depth-count was not 1 and 0, respectively. The precise location where the depth achieved either a negative depth, or depth greater than 1 will be returned in the integer array. In English: When two opening-tags or two closing-tags are identified, successively, then the index where the second tag was found is recorded into the output array.
        Returns:
        This will return an array of vectorized-html index-locations / index-pointers where the first instance of an extra opening, or an extra-closing tag, occurs. This will facilitate finding tags that are not intended to be nested. If "tag-nesting" (for example HTML divider, 'DIV', elements), then the results returned by this method will not be useful.
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If this 'htmlTag' is a 'singleton' (Self-Closing) Tag, this exception will throw.
        See Also:
        FileRW.loadFileToString(String), HTMLPage.getPageTokens(CharSequence, boolean), Debug.print(Vector, int[], int, String, boolean, BiConsumer)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
         // Java Streams are an easier way to keep variable-length lists.  They use
         // "builders" - and this one is for an "IntStream"
         IntStream.Builder b = IntStream.builder();
        
         // Check that this is a valid HTML Tag, throw an exception if invalid
         htmlTag = ARGCHECK.htmlTag(htmlTag);
        
         if (HTMLTags.isSingleton(htmlTag)) throw new SingletonException(
             "The tag you have passed: [" + htmlTag + "] is a singleton-tag, and is only " +
             "allowed opening versions of the tag."
         );
        
         Object o;     TagNode tn;     int len = html.size();      TC last = null;
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text,
         // and not HTML Comments
        
         for (int i=0; i < len; i++)
             if ((o = html.elementAt(i)) instanceof TagNode) 
                 if ((tn = (TagNode) o).tok.equals(htmlTag))
                 {
                     if ((tn.isClosing)      && (last == TC.ClosingTags))    b.add(i);
                     if ((! tn.isClosing)    && (last == TC.OpeningTags))    b.add(i);
                     last = tn.isClosing ? TC.ClosingTags : TC.OpeningTags;
                 }
        
         return b.build().toArray();
        
      • locationsAndDepth

        public static Ret2<int[],​int[]> locationsAndDepth​
                    (java.util.Vector<? super TagNode> html,
                     java.lang.String htmlTag)
        
        For likely greater than 95% of HTML tags - finding situations where that tag has 'nested tags' is highly unlikely. Unfortunately, two or three of the most common tags in use, for instance <DIV>, <SPAN>, finding where a mis-match has occurred (tracking down an "Unclosed divider") is an order of magnitude more difficult than finding an unclosed anchor '<A HREF...>'. This method shall return two parallel arrays. The first array will contain vector indices. The second array contains the depth (nesting level) of that tag at that position. In this way, finding an unclosed divider is tantamount to finding where all closing-dividers seem to evaluate to a depth of '1' (one) rather than '0' (zero).

        NOTE: This method can highly useful for SPAN and DIV, while the "non-standard depth locations" method can be extremely useful for simple, non-nested tags such as Anchor, Paragraph, Section, etc... - HTML Elements that are mostly never nested.

        Example:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
         String               file        = LFEC.loadFile("~/HTML/MyHTMLFile.html");  // Load an HTML File to a String
         Vector<HTMLNode>     v           = HTMLPage.getPageTokens(file, false);      // Parse, and convert to vectorized-html
         Ret2<int[], int[]>   r           = Balance.locationsAndDepth(v, "div");      // Run this method
         int[]                posArr      = (int[]) r.a;                              // This array has vector-indices
         int[]                depthArr    = (int[]) r.b;                              // This (parallel) array has the depth at that index.
         
         for (int i=0; i < posArr.length; i++) System.out.println(
             "(" + posArr[i] + ", " + depthArr[i] + "):\t" +    // Prints the Vector-Index, and the Depth
             C.BRED + v.elementAt(posArr[i]).str + C.RESET      // Prints the actual HTML divider.
         );
         
        

        The above code would produce a list of HTML Divider elements, along with their index in the Vector, and the exact depth (number of nested, open 'DIV' elements) at that location. This is usually helpful when trying to find unclosed HTML Tags.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? super TagNode' means is this method can receive a Vector<TagNode> or a Vector<HTMLNode>, without throwing an exception, or producing erroneous results. Note that if a Vector<Object> is passed, and there are no instances of class TagNode contained by that Vector, then this method will simply exit gracefully.
        htmlTag - This the html element that has an imbalanced OPEN-CLOSE ratio in the tree.
        Returns:
        Two parallel arrays, as follows:

        1. Ret2.a (int[])

          This shall be an integer array of Vector-indices where the HTML Element has been found.

        2. Ret2.b (int[])

          This shall contain an array of the value of the depth for the 'htmlTag' at the particular Vector-index identified in the first-array.
        Throws:
        HTMLTokException - If any of the tags passed are not valid HTML tags.
        SingletonException - If this 'htmlTag' is a 'singleton' (Self-Closing) Tag, this exception will throw.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
         // Java Streams are an easier way to keep variable-length lists.  They use
         // "builders" - and this one is for an "IntStream"
         IntStream.Builder locations         = IntStream.builder();
         IntStream.Builder depthAtLocation   = IntStream.builder();
        
         // Check that this is a valid HTML Tag, throw an exception if invalid
         htmlTag = ARGCHECK.htmlTag(htmlTag);
        
         if (HTMLTags.isSingleton(htmlTag)) throw new SingletonException(
             "The tag you have passed: [" + htmlTag + "] is a singleton-tag, and is only " +
             "allowed opening versions of the tag."
         );
        
         Object o;     TagNode tn;     int len = html.size();      int depth = 0;
        
         // Iterate through the HTML List, we are only counting HTML Elements, not text, and
         // not HTML Comments
         for (int i=0; i < len; i++) if ((o = html.elementAt(i)) instanceof TagNode) 
             if ((tn = (TagNode) o).tok.equals(htmlTag))
             {
                 depth += tn.isClosing ? -1 : 1;
                 locations.add(i);
                 depthAtLocation.add(depth);
             }
        
         return new Ret2<int[], int[]>
             (locations.build().toArray(), depthAtLocation.build().toArray());
        
      • toStringDepth

        public static java.lang.String toStringDepth​
                    (java.util.Hashtable<java.lang.String,​int[]> depthReport)
        
        Converts a depth report to a String, for printing.
        Parameters:
        depthReport - This should be a Hashtable returned by any of the depth-methods.
        Returns:
        This shall return the report as a String.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
         StringBuilder sb = new StringBuilder();
         for (String htmlTag : depthReport.keySet())
         {
             int[] arr = depthReport.get(htmlTag);
             sb.append(
                 "HTML Element: [" + htmlTag + "]:\t" +
                 "Min-Depth: " + arr[0] + ",\tMax-Depth: " + arr[1] + ",\tCount: " + arr[2] + "\n"
             );
         }
         return sb.toString();
        
      • toStringBalance

        public static java.lang.String toStringBalance​
                    (java.util.Hashtable<java.lang.String,​java.lang.Integer> balanceCheckReport)
        
        Converts a balance report to a String, for printing.
        Parameters:
        balanceCheckReport - This should be a Hashtable returned by any of the balance-check methods.
        Returns:
        This shall return the report as a String.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
        7
         StringBuilder sb = new StringBuilder();
        
         for (String htmlTag : balanceCheckReport.keySet()) sb.append(
             "HTML Element: [" + htmlTag + "]:\t" + balanceCheckReport.get(htmlTag).intValue() + "\n"
         );
        
         return sb.toString();
        
      • toStringBalance

        public static java.lang.String toStringBalance​
                    (int[] balanceCheckReport,
                     java.lang.String... htmlTags)
        
        Converts a balance report to a String, for printing.
        Parameters:
        balanceCheckReport - This should be a Hashtable returned by any of the balance-check methods.
        Returns:
        This shall return the report as a String.
        Throws:
        java.lang.IllegalArgumentException - This exception throws if the length of the two input arrays are not equal. It is imperative that the balance report being printed was created by the html-tags that are listed in the HTML Token var-args parameter. If the two arrays are the same length, but the tags used to create the report Hashtable are not the same ones being passed to the var-args parameter 'htmlTags' - the logic will not know the difference, and no exception is thrown.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
         if (balanceCheckReport.length != htmlTags.length) throw new IllegalArgumentException(
             "The balance report that you are checking was not generated using the html token " +
             "list provided, they are different lengths.  balanceCheckReport.length: " +
             "[" + balanceCheckReport.length + "]\t htmlTags.length: [" + htmlTags.length + "]"
         );
        
         StringBuilder sb = new StringBuilder();
        
         for (int i=0; i < balanceCheckReport.length; i++)
             sb.append("HTML Element: [" + htmlTags[i] + "]:\t" + balanceCheckReport[i] + "\n");
        
         return sb.toString();