Package Torello.HTML

Class HTMLTags


  • public class HTMLTags
    extends java.lang.Object
    HTMLTags - Documentation.

    The purpose of this class is to maintain the list of valid HTML tags in Java memory. There are under 200 of these, and they aid the HTML Parse class in picking valid HTML tags when scraping. This class also maintains in memory some "pre-instantiated" Java-HTML HTMLNode - TagNode instances. The class TagNode contains only "final variables" (is immutable) because at least 80% of HTML on any given page is just a tag / element instance that never needs to change in memory. Call the public TagNode hasTag(String, TC) to obtain a valid instance of class TagNode.



    • Method Detail

      • printAllToTerminal

        public static void printAllToTerminal​(boolean printDescriptions)
        This simply prints all data that is stored in the JAR file to terminal output. It uses the method with the near-same name, but utilizes 'System.out' for the Appendable instance. Because 'System.out' does not throw the IOException when printing, it is caught here, for convenience.
        Parameters:
        printDescriptions - If this is set to TRUE, then the ensure that the JAR Descriptions-Data-File has already been loaded into memory. If not, then the description-String's will be loaded into memory. These String's contain a one-sentence-long text-description of each HTML Element listed in this class. If this parameter is FALSE the data-file will not be visited, and the HTML Element descriptions will not be sent to the output stream.
        See Also:
        printAll(Appendable, boolean)
        Code:
        Exact Method Body:
        1
         try { printAll(System.out, printDescriptions); } catch (IOException e) { }
        
      • printAll

        public static void printAll​(java.lang.Appendable a,
                                    boolean printDescriptions)
                             throws java.io.IOException
        This simply prints all data that is stored in the JAR data-file to a java.lang.Appendable.
        Parameters:
        a - This parameter provides an instance that will receive the text output. This parameter may not be null, or a NullPointerException will throw. This parameter expects an implementation of Java's interface java.lang.Appendable which allows for a wide range of options when logging intermediate messages.
        Class or Interface InstanceUse & Purpose
        'System.out'Sends text to the standard-out terminal
        Torello.Java.StorageWriterSends text to System.out, and saves it, internally.
        FileWriter, PrintWriter, StringWriterGeneral purpose java text-output classes
        FileOutputStream, PrintStreamMore general-purpose java text-output classes

        IMPORTANT: The interface Appendable requires that the check exception IOException must be caught when using its append(CharSequence) methods.
        printDescriptions - If this is set to TRUE, then the ensure that the JAR Descriptions-Data-File has already been loaded into memory. If not, then the description-String's will be loaded into memory. These String's contain a one-sentence-long text-description of each HTML Element listed in this class. If this parameter is FALSE the data-file will not be visited, and the HTML Element descriptions will not be sent to the output stream.
        Throws:
        java.io.IOException - The general purpose interface java.lang.Appendable requires checking for an IOException throw when printing information. If the 'Appendable' provided to this method fails, this exception shall propagate out.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
         a.append("TAGS: ");
         for (String tag : tags)                     a.append(tag + ", ");
         a.append("\n\nDEPRECATED: ");
         for (String deprecatedTag : deprecated)     a.append(deprecatedTag + ", ");
         a.append("\n\nHTML5: ");
         for (String html5Tag : html5Tags)           a.append(html5Tag + ", ");
         a.append("\n\nSINGLETON-TAGS: ");
         for (String selfClosingTag : singletonTags) a.append(selfClosingTag + ", ");
         a.append("\n\nBLOCK-TAGS: ");
         for (String blockTag : blockTags)           a.append(blockTag + ", ");
         a.append("\n\nINLINE-TAGS: ");
         for (String inlineTag : inlineTags)         a.append(inlineTag + ", ");
         a.append("\n\ntagNodesOpening: ");
         for (String s : tagNodesOpening.keySet())   a.append(tagNodesOpening.get(s).toString() + ", ");
         a.append("\n\ntagNodesClosing: ");
         for (String s : tagNodesClosing.keySet())   a.append(tagNodesClosing.get(s).toString() + ", ");
         a.append("\n\ntagNodesOpeningUC: ");
         for (String s : tagNodesOpeningUC.keySet()) a.append(tagNodesOpeningUC.get(s).toString() + ", ");
         a.append("\n\ntagNodesClosingUC: ");
         for (String s : tagNodesClosingUC.keySet()) a.append(tagNodesClosingUC.get(s).toString() + ", ");
        
         if (printDescriptions)
         {
             loadDescriptions(); // Will only load if descriptions have not already been loaded.
             a.append("\n\n");
             for (String s : descriptions.keySet())
                 a.append(s + ((s.length() >= 7) ? ":\t" : ":\t\t") + descriptions.get(s) + "\n");
         }
        
      • loadDescriptions

        public static void loadDescriptions()
        The data-structure (a java TreeMap<String, String>) that holds the individual text-descriptions of each HTML tag is not loaded into memory from the JAR file when the class-loader loads this class. Instead, if the programmer would like to report information about HTML tags, and would like to include a short, one or two sentence description of the HTML Elements, use the method public static String getDescription(String htmlTag);

        IMPORANT: Unless this method loadDescriptions() has been invoked, that method will simply return null for each Element.

        NOTE: The only purpose of keeping these sentences in a jar file is that they are a little long, and really are never used at all - unless you are interested in doing reporting. By keeping them in the jar-file, unless requested, this will save some on "over-head."

        ALSO: If the descriptions have already loaded, this method will just exit and return.
        See Also:
        LFEC.readObjectFromFile_JAR(Class, String, boolean, Class)
        Code:
        Exact Method Body:
        1
        2
        3
         if (descriptions.size() == 0)
             descriptions.putAll((TreeMap<String, String>) LFEC.readObjectFromFile_JAR
                 (Torello.Data.DataFileLoader.class, "data03.tmdat", true, TreeMap.class));
        
      • maxTokenLength

        public static byte maxTokenLength()
        This will compute the String-length of the longest HTML token saved in the internal state TreeSet<String> of HTML Tokens.
        Returns:
        The length of the longest HTML Token String.
        Code:
        Exact Method Body:
        1
         return MAX_TOKEN_LENGTH;
        
      • addTag

        public static boolean addTag​(java.lang.String htmlTag)
        Adds a new HTML element to the list of elements that may be parsed, created and checked. This is not always advisable, as the complete list of HTML-5 tags are already internally stored, but if you would like to add or remove certain tags, there are two methods for doing this.
        Parameters:
        htmlTag - Any HTML tag that you would like to see parsed by the HTML page parser. If the parser encounters a construct such as: <YOUR_NEW_TAG ATTRIBUTES="..."> it will treat that as a new HTML element.
        Returns:
        TRUE if the element was indeed a new element to the list, and FALSE if the HTML-tokens-list already contained this HTML element. If so, this method call will just return gracefully - with no changes being made to the underlying list of acceptable HTML tokens.
        Throws:
        HTMLTokException - If the parameter contains non-alpha-numeric characters.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
         Matcher m = HTML_TAG_ALPHA_NUMERIC.matcher(htmlTag);
        
         if ((! m.find()) || (htmlTag.length() != m.group().length())) throw new HTMLTokException(
             "The HTML-Tag Parameter that was passed [" + htmlTag + "] doesn't conform to the " +
             "expected requirements for HTML-Tags.  It may only contain alpha-numeric characters, " +
             "and it must not begin with a number."
         );
        
         String tag = htmlTag.trim().toLowerCase();
        
         if (tag.length() > 127) throw new HTMLTokException(
             "The (trimmed) HTML-Tag Parameter that was passed [" + tag + "] is longer than 127 " +
             "characters.  This is not allowed here."
         );
        
         boolean ret = tags.add(tag);
        
         if (ret)
         {
             // NOTE: These four private, static fields are of type TreeMap<String, TagNode>
             //       tagNodesOpening, tagNodesOpeningUC, tagNodesClosing, tagNodesClosingUC
             //
             //       They can provide a significant savings for the Garbage Collector.  For any
             //       HTML Element that does not have any attributes, and has a standard 'case'
             //       (all upper-case, or all lower-case), the parser will "re-use" pre-existing
             //       instances of class TagNode, rather than building a new one.
             // FOR EXAMPLE: The parser will "re-use" the same instance of a "<BR>" TagNode, or
             //              any one, actually, as long as it does not have attributes.  Since 40%
             //              to 50% of class TagNode are "TC.ClosingTags", this can be a significant
             //              improvement
        
             // Build a Lower-Case, Pre-Instantiated, Zero-Attribute version of the HTML Element
             // Uses specialized package-only visible TagNode constructor.
             // Not available to the general public
             tagNodesOpening.put(tag, new TagNode(tag, TC.OpeningTags));
             tagNodesClosing.put(tag, new TagNode(tag, TC.ClosingTags));
        
             // Build an Upper-Case, Pre-Instantiated, Zero-Attribute version of the HTML Element
             tag = tag.toUpperCase();
             tagNodesOpeningUC.put(tag, new TagNode("<" + tag + ">"));
             tagNodesClosingUC.put(tag, new TagNode("</" + tag + ">"));
        
             // Update the MAX_TOKEN_LENGTH - but only if necessary.
             if (tag.length() > MAX_TOKEN_LENGTH) MAX_TOKEN_LENGTH = (byte) tag.length();
         }
        
         return ret;
        
      • removeTag

        public static boolean removeTag​(java.lang.String htmlTag)
        Removes and HTML element from the list of elements that may be parsed, created and checked. This is not always advisable, as the complete list of HTML-5 tags are already internally stored, but if you would like to add or remove certain tags, there are two methods for doing this.
        Parameters:
        htmlTag - Any HTML tag that you no longer want to see parsed by the HTML page parser. HTML nodes that contain this tag as their element will cause the parser to ignore the node, and treat it like a TextNode.
        Returns:
        TRUE if the element was removed, and FALSE if it was not - because it wasn't in the HTML-tokens-list in the first place.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
         String  tag = htmlTag.trim().toLowerCase();
         boolean ret = tags.remove(tag);
        
         if (ret)
         {
             // "Lower-Case" and "Pre-Instantiated" (Zero-Attributes) version of TagNode
             tagNodesOpening.remove(tag); 
             tagNodesClosing.remove(tag);
        
             tag = tag.toUpperCase();
        
             // "Upper-Case", Pre-Instantiated, Zero-Attribute version of TagNode
             tagNodesOpeningUC.remove(tag); 
             tagNodesClosingUC.remove(tag);
        
             // After removal, there is a small chance the
             // MAX_TOKEN_LENGTH is, now, shorter
             if (tag.length() == MAX_TOKEN_LENGTH) setMaxTokenLength();
         }
        
         return ret;
        
      • addSingleton

        public static boolean addSingleton​(java.lang.String htmlTagSingleton)
        Removes an HTML-element to the list of singleton HTML-elements. A singleton may only have an "opening" tag, and may not have a closing-version tag. For instance the <IMG SRC="..."> is the classic-singleton, it's data is all stored internally as attribute values.
        Parameters:
        htmlTagSingleton - Any HTML tag that you would like to see listed as a singleton HTML-element.
        Returns:
        TRUE if the element was indeed a new element to the list, and FALSE if the HTML-singleton tokens-list already contained this HTML element. If so, this method call will just return gracefully - with no changes being made to the underlying list of singleton tokens.
        Throws:
        java.lang.IllegalArgumentException - If you have tried to "register" a singleton tag that isn't a fundamental HTML-tag, then this method will throw an exception directing you to first add your token to the HTML-tags/tokens internal-list.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
         String tag = htmlTagSingleton.trim().toLowerCase();
        
         if (! tags.contains(tag)) throw new IllegalArgumentException(
             "The HTML token you have attempted to add [" + tag + "] may not be added to the " + 
             "singletons list, because it is not a known/registered HTML token, as of now.  First, " +
             "make sure it is listed as one of the parser's tokens by calling 'addTag(token)', and" + 
             "then invoking this method with that token."
         );
        
         // Internally, there is a private & static TreeSet<String> which saves the names
         // of all HTML 'singleton' elements.  Use Java's TreeSet.add(E) method
         return singletonTags.add(tag);
        
      • removeSingleton

        public static boolean removeSingleton​(java.lang.String htmlTagSingleton)
        Adds an HTML-element to the list of singleton HTML-elements. A singleton may only have an "opening" tag, and may not have a closing-version tag. For instance the <IMG SRC="..."> is the classic-singleton, it's data is all stored internally as attribute values.
        Parameters:
        htmlTagSingleton - Any HTML tag that you no longer want to see in the HTML-singleton tokens-list.
        Returns:
        TRUE if the element was removed, and FALSE if it was not - because it wasn't in the HTML-Singleton tokens-list in the first place.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
         String tag = htmlTagSingleton.trim().toLowerCase();
        
         // Internally, there is a private & static TreeSet<String> which saves the names
         // of all HTML 'singleton' elements.  Use Java's TreeSet.remove(Object) method
         return singletonTags.remove(tag);
        
      • hasTag

        public static TagNode hasTag​(java.lang.String tag,
                                     TC openOrClosed)
        The purpose of this function/method is to provide a little "optimization." Since 100% of class HTMLTag information is stored as constant/final - this class facilitates instantiating only one copy of each node when building HTML page node- Vectors. Internal to this class is a 'Vector<TagNode>' of each and every HTML tag available - both in upper-case tag-versions, and also in lower-case tags. There must also be an opening-version of the TagNode, and also a closing-version of the same TagNode. This does, indeed, make a total of four total pre-instantiated tags that are stored within java.util.TreeMap<String, TagNode> within this class.

        NOTE: Because this class is Serializable, these four instances of each an every (lower-case / upper-case, and open-tag, closing-tag) ... of each-and-every instance of class 'TagNode' these four instances have already been created and written to a data-file that is saved within the 'JavaHTML.jar' distributions of this library. The pre-instantiated instances of class java.util.TreeMap are loaded from the jar into memory by the Class-Loader at runtime startup. NOTE: It is not mandatory to "reuse" instantiated HTML TagNode's, but for memory management, garbage-collection efficiency, and other optimizations, the classes in this package use the pre-instantiated versions of these objects whenever possible.
        Parameters:
        tag - Any valid HTML tag. If the String passed is not a valid HTML tag, then this method will return null.
        openOrClosed - If TC.OpeningTags is passed, then an "open" version of the HTML tag will be returned, and if TC.ClosingTags is passed, then a closing version will be returned. If TC.Both is accidentally passed - it will default to TC.OpeningTags
        Returns:
        An opening (or closing) TagNode - or null if the passed String tag does not represent any valid HTML-Tag
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
         // FAIL-FAST: Check Input's immediately.  Throw Exception for invalid input.
         if (openOrClosed == null)
             throw new NullPointerException
                 ("Parameter 'openOrClosed' is null, but this is not allowed.");
        
         if (openOrClosed == TC.Both)
             throw new IllegalArgumentException
                 ("Parameter 'openOrClosed' was specified as TC.Both, but this is not allowed here.");
        
         // IMPORTANT NOTE:  For Singleton-Tags: There is no closing-version, so one SHOULD NOT be
         // requested.  (There is no '</IMG>' tag!)  However, this method DOES NOT throw
         // IllegalArgumentException in this case, but rather it just exits gracefully, and returns
         // null.
         String tagLC = tag.toLowerCase();
         if (singletonTags.contains(tagLC) && (openOrClosed == TC.ClosingTags)) return null;
        
         // First, Check if the 'tag' is all lower-case.  If it is, the string would be identical to
         // the 'tagLC' variable we have just created.
         if (tagLC.equals(tag)) 
         {
             // Debugging Information, Debug-println.  Un-comment to follow.  DO NOTE DELETE THIS LINE.
             // System.out.println("Used a pre-instantiated TagNode, Lower-Case TreeMap");
             return (openOrClosed == TC.OpeningTags) ? tagNodesOpening.get(tag)
                                                     : tagNodesClosing.get(tag);
         }
        
         // Now, here, the variable could not have been all-lower-case.  NEXT, Check if it is
         // all-upper-case
         //
         // NOTE: There are pre-defined tables that include pre-instantiated TagNode's - both for
         //       lower-case tags and for upper-case tags.
        
         String tagUC = tag.toUpperCase();
        
         if (tagUC.equals(tag)) 
         {
             // Debugging Information, Debug-println.  Un-comment to follow.  DO NOTE DELETE THIS LINE.
             // System.out.println("Used a pre-instantiated TagNode, Upper-Case TreeMap");
             return (openOrClosed == TC.OpeningTags) ? tagNodesOpeningUC.get(tag)
                                                     : tagNodesClosingUC.get(tag);
         }
        
         // SPECIAL CASE: (Very Rare / Unlikely, but possible)  The user has created an HTML Element
         // that has some lower-case alphabet letters, and some upper-case as well.  This does not
         // guarantee that it is a valid HTML Token, though, so check
         //
         // FOR EXAMPLE: If somebody typed <SeCtIoN>, we need to preserve the case, no matter how
         //              bizarre.  In such a case, a pre-packaged TagNode cannot be used, and instead
         //              a new TagNode must be instantiated.
                
         if (openOrClosed == TC.OpeningTags)
             return (tagNodesOpening.get(tagLC) == null)
                 ? null
                 : new TagNode("<" + tag + ">");
         else 
             return (tagNodesClosing.get(tagLC) == null)
                 ? null
                 : new TagNode("</" + tag + ">");
        
      • getTag_MEM_HEAP_CHECKOUT_COPY

        public static java.lang.String getTag_MEM_HEAP_CHECKOUT_COPY​
                    (java.lang.String tag)
        
        This is an optimized, internal method that is used to prevent lots of duplicate HTML token-String's from being created by the parser. Internally, there ought to be just one-instance of String's like: "img", "br", "div", etc... This is used by the parser to reuse an already instantiated token String. This method probably has relatively little use outside of the internal HTML parser code.
        Parameters:
        tag - This is an HTML token. An identical String to this 'token' String, but possible different memory reference on the heap shall be returned.
        Returns:
        The returned String shall obey this issue:

        • assert(tag.equals(returned_string)); // Identical String is returned
        • assert(! (tag == returnedString)); // Probably a different memory allocation on the // heap. PROBABLY!
          Note that Java does not make any contracts regarding String references! (This can only help...)
        • Hope this makes sense, thanks for playing jeopardy boys and girls.

        IMPORTANT: If the tag passed is not a valid HTML tag, then this method shall return null.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
         if (BUILDING_DATA_FILE___SKIP_OPTIMIZATION_TEMPORARILY) return tag.toLowerCase();
             // Obviously, for the 200 or so "pre-instantiated" (having-no-attributes) instances of
             // class TagNode that are kept, internally, in the data-structures of this class, 'HTMLTags'
             // We cannot retrieve a "pre-allocated" copy of the tag-as-a-string from the heap, because
             // we are building the data-file for the first time!
        
         TagNode tn = tagNodesOpening.get(tag.toLowerCase());
        
         if (tn == null) return null;
        
         return tn.tok;
             // This "version" (of the exact same html-element-name is already on the heap)
             // Obviously, because, variable 'tn' has already been instantiated and is in the TreeMap
             // If this EXACT SAME REFERENCE IS USED FOR ALL "TagNode.tok" instances, quite a bit of 
             // wasted-space in the heap's lookup table will be eliminated as the same "token"
             // (which is the name of the HTML Element: "div," "img," "span," etc...) is reused over 
             // and over and over again.  Helps a little bit!  Not that complicated!
        
      • isTag

        public static boolean isTag​(java.lang.String tag)
        Checks if a String is registered as a proper HTML tag according to the internally maintained lists.

        CASE INSENSITIVE: The test performed by this method shall ignore case.

        The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's parameter 'tag', and result in a return value of TRUE. This list is the complete list of HTML Element Names that are maintained, by default, in this class internal Lookup Table of HTML Elements.

        HTML Elements

        List Modification: The list of HTML Elements may, in fact, be altered. To add a new Element Name to the internal lookup table of valid HTML Elements, use addTag(String). To remove an HTML Element from the internal list, use removeTag(String).
        Returns:
        TRUE if this is a valid HTML tag. NOTE: All HTML-5 Element-Tag Strings will return TRUE as they are contained in the default internal list.
        Code:
        Exact Method Body:
        1
        2
        3
         // Internally, this class has a private & static TreeSet<String> that stores a list
         // of all the standard HTML Tags.  Just uses Java's TreeSet.contains(Object) method.
         return tags.contains(tag.toLowerCase());
        
      • isHTML5

        public static boolean isHTML5​(java.lang.String tag)
        Checks if a String is a proper HTML-5 (only) tag. This list is rather short, and only contains HTML Elements which specifically for the release of HTML 5. Any HTML Element which is both a valid HTML Release 4 (or earlier) and an HTML 5 Element will not result in TRUE being returned by this method.

        CASE INSENSITIVE: The test performed by this method shall ignore case.

        The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's parameter 'tag', and result in a return value of TRUE. This list is the complete list of HTML 5 Element Names that are maintained, by default, in this class internal Lookup Table of HTML 5 Elements.

        Elements Added for HTML-5
        Parameters:
        tag - Any HTML-Tag as a String.
        Returns:
        TRUE if this is a tag that was added for HTML-5, and not included in HTML 4, or earlier
        Code:
        Exact Method Body:
        1
        2
        3
         // Internally, this class has a private & static TreeSet<String> that stores a list
         // of all the HTML-5 Tags.  Just uses Java's TreeSet.contains(Object) method.
         return html5Tags.contains(tag.toLowerCase());
        
      • deprecated

        public static boolean deprecated​(java.lang.String tag)
        Checks if a String is listed as an HTML Element that was deprecated for HTML 5

        CASE INSENSITIVE: The test performed by this method shall ignore case.

        The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's parameter 'tag', and result in a return value of TRUE. This list is the complete list of Deprecated HTML Element Names that are maintained, by default, in this class internal Lookup Table of Deprecated HTML Elements.

        Elements Deprecated for HTML-5
        Parameters:
        tag - Any HTML-Tag as a String.
        Returns:
        TRUE if this tag was deprecated for HTML-5
        Code:
        Exact Method Body:
        1
        2
        3
        4
         // Internally, this class has a private & static TreeSet<String> that stores a list
         // of all the deprecated-for-HTML-5 Tags.  Just uses Java's TreeSet.contains(Object)
         // method.
         return deprecated.contains(tag.toLowerCase());
        
      • isSingleton

        public static boolean isSingleton​(java.lang.String tok)
        This method checks whether specific HTML elements are both "opening and closing" elements, such as: P, DIV, SPAN, along with myriad others, OR if this one of the (very few) "singleton HTML elements", such as the HTML <IMG SRC="..."> element which may not have a closing tag. Such tags are also called "Self-Closing" tags.

        CASE INSENSITIVE: The test performed by this method shall ignore case.

        The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's parameter 'tag', and result in a return value of TRUE. This list is the complete list of Singleton HTML Element Names that are maintained, by default, in this class internal Lookup Table of Self-Closing HTML Elements.

        Singleton Elements

        List Modification: The list of Singleton HTML Elements may, in fact, be altered. To add a new Singleton HTML Element Name to the internal lookup table of valid Singleton Elements, use addSingleton(String). To remove an HTML Elementfrom the internal list, use removeSingleton(String).
        Parameters:
        tok - This is the HTML element name to be tested.
        Returns:
        TRUE if this is a 'singleton' HTML Element - a.k.a., only OpeningTag versions of the element exist, because singleton HTML elements don't need / may not have a closing tag. Singleton examples include: IMG, HR, INPUT etc...

        FALSE is returned if the tag is not a singleton parameter.
        Code:
        Exact Method Body:
        1
        2
        3
         // Internally, this class has a private & static TreeSet<String> that stores a list
         // of all the 'singleton' HTML Tags.  Just uses Java's TreeSet.contains(Object) method.
         return singletonTags.contains(tok.toLowerCase());
        
      • isBlock

        public static boolean isBlock​(java.lang.String tok)
        This method checks whether specific HTML elements are among the 'Block' Tag elements list. An explanation of what a 'block' or 'inline' tag is, is beyond the scope of this document.

        CASE INSENSITIVE: The test performed by this method shall ignore case.

        The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's parameter 'tag', and result in a return value of TRUE. This list is the complete list of Block HTML Element Names that are maintained, by default, in this class internal Lookup Table of HTML Block Elements.

        HTML Block Elements
        Parameters:
        tok - This is the HTML element name to be tested.
        Returns:
        TRUE if this is a 'block' HTML Element, FALSE otherwise.
        Code:
        Exact Method Body:
        1
        2
        3
         // Internally, this class has a private & static TreeSet<String> that stores a list
         // of all the HTML 'Block' Tags.  Just uses Java's TreeSet.contains(Object) method.
         return blockTags.contains(tok.toLowerCase());
        
      • isInline

        public static boolean isInline​(java.lang.String tok)
        This method checks whether specific HTML elements are among the 'Inline' Tag elements list. An explanation of what a 'block' or 'inline' tag is, is beyond the scope of this document.

        CASE INSENSITIVE: The test performed by this method shall ignore case.

        The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's parameter 'tag', and result in a return value of TRUE. This list is the complete list of Inline HTML Element Names that are maintained, by default, in this class internal Lookup Table of HTML Inline Elements.

        HTML Inline Elements
        Parameters:
        tok - This is the HTML element name to be tested.
        Returns:
        TRUE if this is an 'inline' HTML Element, FALSE otherwise.
        Code:
        Exact Method Body:
        1
        2
        3
         // Internally, this class has a private & static TreeSet<String> that stores a list
         // of all the HTML 'Inline' Tags.  Just uses Java's TreeSet.contains(Object) method.
         return inlineTags.contains(tok.toLowerCase());
        
      • getDescription

        public static java.lang.String getDescription​(java.lang.String tag)
        Returns a brief, English Language Description, of an HTML Tag. These descriptions are stored in a small data-file,

        DATA-FILE LOAD: This method will attempt to load a particular data-file from the JAR-library into memory. This file contains a one-sentence description, stored as java.lang.String's for each of the HTML Elements known to this class. Under normal operation, these String-arrays remain on-disk, only.
        Parameters:
        tag - Any valid HTML tag.
        Returns:
        A short English-Language description of the Tag in HTML, or null if this tag is unknown.
        See Also:
        loadDescriptions()
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
         // Loads the descriptions map, ONLY IF they have not already been loaded into memory from
         // the JAR data-files
         loadDescriptions();
        
         return descriptions.get(tag.toLowerCase());
        
      • iterator

        public static java.util.Iterator<java.lang.String> iterator()
        Internally, tags are stored in a Java java.util.TreeSet<String>. This method invokes the iterator() method on that TreeSet.

        REMOVE NOTE: In order to prevent accidental removal of any HTML Tags via the Iterator.remove() method, the 'Iterator<String>' class has been overloaded - "wrapped" - in a simple class that throws an exception if remove() is invoked. The purpose is to prevent a user from accidentally a member of the internal-set data-structure.
        Returns:
        an Iterator<String> that iterates over all the Tag-String's in alphabetical order.

        Data File Contents: The contents of this Iterator are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by the Class Loader. The Iterator) may be viewed, here, by clicking the link below:

        HTML Elements
        See Also:
        RemoveUnsupportedIterator
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         // Internally, this class has a private & static TreeSet<String> that stores a list
         // of all the standard HTML Tags.  Just uses Java's TreeSet.iterator() method.
         //
         // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this
         //       TreeSet
         return new RemoveUnsupportedIterator<String>(tags.iterator());
        
      • iteratorDescriptions

        public static java.util.Iterator<java.util.Map.Entry<java.lang.String,​java.lang.String>> iteratorDescriptions
                    ()
        
        Will build an Iterator that can return attributes and their text-String descriptions.

        NOTE: This will force this class to load the "HTML-Element Descriptions Data File" to load the list of Tag-Description String-Data into memory. Generally, in this class, if the methods invoked do not require the Event-Description String-Data, then the Class-Loader will not load this extensive text-data into memory from the JAR data-files.
        Returns:
        an Iterator that iterates the HTML-Tag / HTML-Tag-Description key-value pairs as instances of "Map.Entry<String, String>"

        Data File Contents: The contents of this Iterator are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load is only performed on request. The contents of this data-file (and the list of Map.Entry's returned by the Iterator) may be viewed, here, by clicking the link below:

        HTML Elements with Descriptions
        See Also:
        loadDescriptions(), RemoveUnsupportedIterator
        Code:
        Exact Method Body:
        1
        2
        3
         loadDescriptions(); // Will only load if descriptions have not already been loaded.
         return new RemoveUnsupportedIterator<Map.Entry<String, String>>
             (descriptions.entrySet().iterator());
        
      • iteratorAddedForHTML5

        public static java.util.Iterator<java.lang.String> iteratorAddedForHTML5()
        Internally, HTML-5 tags are stored in a Java java.util.TreeSet<String>. This method invokes the iterator() method on that TreeSet.

        REMOVE NOTE: In order to prevent accidental removal of HTML-5 tags via the Iterator.remove() method, the 'Iterator<String>' class has been overloaded - "wrapped" - in a simple class that throws an exception if remove() is invoked. The purpose is to prevent a user from accidentally destroying a member of the internal-set data-structure.
        Returns:
        an Iterator<String> that cycles through the list of HTML Tag-String's that were added for in HTML-5.

        Data File Contents: The contents of this Iterator are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by the Class Loader. The contents of this data-file (and the list of String's returned by the Iterator) may be viewed, here, by clicking the link below:

        Elements Added for HTML-5
        See Also:
        RemoveUnsupportedIterator
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         // Internally, this class has a private & static TreeSet<String> that stores a list
         // of all the HTML-5 Tags.  Just uses Java's TreeSet.iterator() method.
         //
         // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this
         //       TreeSet
         return new RemoveUnsupportedIterator<String>(html5Tags.iterator());
        
      • iteratorDeprecatedForHTML5

        public static java.util.Iterator<java.lang.String> iteratorDeprecatedForHTML5
                    ()
        
        Internally, deprecated tags are stored in a Java java.util.TreeSet<String>. This method invokes the iterator() method on that TreeSet.

        REMOVE NOTE: In order to prevent accidental removal of deprecated tags-list via the Iterator.remove() method, the 'Iterator<String>' class has been overloaded - "wrapped" - in a simple class that throws an exception if remove() is invoked. The purpose is to prevent a user from accidentally destroying a member of the internal-set data-structure.
        Returns:
        an Iterator<String> that cycles through the list of HTML Tag-String's that were removed for HTML-5.

        Data File Contents: The contents of this Iterator are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by the Class Loader. The contents of this data-file (and the list of String's returned by the Iterator) may be viewed, here, by clicking the link below:

        Elements Deprecated for HTML-5
        See Also:
        RemoveUnsupportedIterator
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         // Internally, this class has a private & static TreeSet<String> that stores a list
         // of all the deprecated Tags.  Just uses Java's TreeSet.iterator() method.
         //
         // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this
         //       TreeSet
         return new RemoveUnsupportedIterator<String>(deprecated.iterator());
        
      • iteratorSingletonTags

        public static java.util.Iterator<java.lang.String> iteratorSingletonTags()
        Internally, singleton / self-closing tags are stored in a Java java.util.TreeSet<String>. This method invokes the iterator() method on that TreeSet.

        REMOVE NOTE: In order to prevent accidental removal of singleton tags via the Iterator.remove() method, the 'Iterator<String>' class has been overloaded - "wrapped" - in a simple class that throws an exception if remove() is invoked. The purpose is to prevent a user from accidentally destroying a member of the internal-set data-structure.
        Returns:
        an Iterator<String> that cycles through the list of HTML Tag-String's that qualify as singleton elements, and may not have closing-tag versions.

        Data File Contents: The contents of this Iterator are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by the Class Loader. The contents of this data-file (and the list of String's returned by the Iterator) may be viewed, here, by clicking the link below:

        Singleton Elements
        See Also:
        RemoveUnsupportedIterator
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         // Internally, this class has a private & static TreeSet<String> that stores a list
         // of all the HTML 'Singleton' Tags.  Just uses Java's TreeSet.iterator() method.
         //
         // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this
         //       TreeSet
         return new RemoveUnsupportedIterator<String>(singletonTags.iterator());
        
      • iteratorBlockTags

        public static java.util.Iterator<java.lang.String> iteratorBlockTags()
        Internally, singleton / self-closing tags are stored in a Java java.util.TreeSet<String>. This method invokes the iterator() method on that TreeSet.

        REMOVE NOTE: In order to prevent accidental removal of Block-Tags via the Iterator's remove() method, the 'Iterator<String>' class has been overloaded - "wrapped" - in a simple class that throws an exception if remove() is invoked. The purpose is to prevent a user from accidentally a member of the internal-set data-structure.
        Returns:
        an Iterator<String> that cycles through the list of HTML Tag-String's that qualify as block elements.

        Data File Contents: The contents of this Iterator are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by the Class Loader. The contents of this data-file (and the list of String's returned by the Iterator) may be viewed, here, by clicking the link below:

        HTML Block Elements
        See Also:
        RemoveUnsupportedIterator
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         // Internally, this class has a private & static TreeSet<String> that stores a list
         // of all the HTML 'Inline' Tags.  Just uses Java's TreeSet.iterator() method.
         //
         // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this
         //       TreeSet
         return new RemoveUnsupportedIterator<String>(blockTags.iterator());
        
      • iteratorInlineTags

        public static java.util.Iterator<java.lang.String> iteratorInlineTags()
        Internally, "HTML Block Tags" are stored in a Java java.util.TreeSet<String>. This method invokes the iterator(); method on that TreeSet.

        REMOVE NOTE: In order to prevent accidental removal of Inline-Tags via the Iterator's "Remove" Method, the 'Iterator<String>' class has been overloaded - "wrapped" - in a simple class that throws an exception if remove() is invoked. The purpose is to prevent a user from accidentally a member of the internal-set data-structure.
        Returns:
        an Iterator<String> that cycles through the list of HTML Tag-String's that qualify as inline elements.

        Data File Contents: The contents of this Iterator are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by the Class Loader. The contents of this data-file (and the list of String's returned by the Iterator) may be viewed, here, by clicking the link below:

        HTML Inline Elements
        See Also:
        RemoveUnsupportedIterator
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         // Internally, this class has a private & static TreeSet<String> that stores a list
         // of all the HTML 'Block' Tags.  Just uses Java's TreeSet.iterator() method.
         //
         // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this
         //       TreeSet
         return new RemoveUnsupportedIterator<String>(inlineTags.iterator());