Package Torello.HTML
Class HTMLTags
- java.lang.Object
-
- Torello.HTML.HTMLTags
-
public class HTMLTags extends java.lang.Object
HTMLTags - Documentation.
The purpose of this class is to maintain the list of valid HTML tags in Java memory. There are under 200 of these, and they aid the HTMLParse
class in picking valid HTML tags when scraping. This class also maintains in memory some "pre-instantiated" Java-HTMLHTMLNode - TagNode
instances. Theclass TagNode
contains only "final variables" (is immutable) because at least 80% of HTML on any given page is just a tag / element instance that never needs to change in memory. Call thepublic TagNode hasTag(String, TC)
to obtain a valid instance ofclass TagNode
.
-
-
Method Summary
Basic Methods Modifier and Type Method static String
getDescription(String tag)
static TagNode
hasTag(String tag, TC openOrClosed)
List Known Tags Modifier and Type Method static Iterator<String>
iterator()
static Iterator<String>
iteratorAddedForHTML5()
static Iterator<String>
iteratorBlockTags()
static Iterator<String>
iteratorDeprecatedForHTML5()
static Iterator<Map.Entry<String,
String>>iteratorDescriptions()
static Iterator<String>
iteratorInlineTags()
static Iterator<String>
iteratorSingletonTags()
Check Tag Categories Modifier and Type Method static boolean
deprecated(String tok)
static boolean
isBlock(String tok)
static boolean
isHTML5(String tok)
static boolean
isInline(String tok)
static boolean
isSingleton(String tok)
static boolean
isTag(String tag)
Add or Remove Tags (to/from the Internal-List) Modifier and Type Method static boolean
addSingleton(String htmlTagSingleton)
static boolean
addTag(String htmlTag)
static boolean
removeSingleton(String htmlTagSingleton)
static boolean
removeTag(String htmlTag)
Print the Internal Tag List Modifier and Type Method static void
printAll(Appendable a, boolean printDescriptions)
static void
printAllToTerminal(boolean printDescriptions)
Utilities Modifier and Type Method static String
getTag_MEM_HEAP_CHECKOUT_COPY(String tag)
static void
loadDescriptions()
static byte
maxTokenLength()
-
-
-
Method Detail
-
printAllToTerminal
public static void printAllToTerminal(boolean printDescriptions)
This simply prints all data that is stored in the JAR file to terminal output. It uses the method with the near-same name, but utilizes'System.out'
for theAppendable
instance. Because'System.out'
does not throw theIOException
when printing, it is caught here, for convenience.- Parameters:
printDescriptions
- If this is set to TRUE, then the ensure that the JAR Descriptions-Data-File has already been loaded into memory. If not, then the description-String's
will be loaded into memory. TheseString's
contain a one-sentence-long text-description of each HTML Element listed in this class. If this parameter is FALSE the data-file will not be visited, and the HTML Element descriptions will not be sent to the output stream.- See Also:
printAll(Appendable, boolean)
- Code:
- Exact Method Body:
1
try { printAll(System.out, printDescriptions); } catch (IOException e) { }
-
printAll
public static void printAll(java.lang.Appendable a, boolean printDescriptions) throws java.io.IOException
This simply prints all data that is stored in the JAR data-file to ajava.lang.Appendable
.- Parameters:
a
- This parameter provides an instance that will receive the text output. This parameter may not be null, or aNullPointerException
will throw. This parameter expects an implementation of Java'sinterface java.lang.Appendable
which allows for a wide range of options when logging intermediate messages.Class or Interface Instance Use & Purpose 'System.out'
Sends text to the standard-out terminal Torello.Java.StorageWriter
Sends text to System.out
, and saves it, internally.FileWriter, PrintWriter, StringWriter
General purpose java text-output classes FileOutputStream, PrintStream
More general-purpose java text-output classes
IMPORTANT: Theinterface Appendable
requires that the check exceptionIOException
must be caught when using itsappend(CharSequence)
methods.printDescriptions
- If this is set to TRUE, then the ensure that the JAR Descriptions-Data-File has already been loaded into memory. If not, then the description-String's
will be loaded into memory. TheseString's
contain a one-sentence-long text-description of each HTML Element listed in this class. If this parameter is FALSE the data-file will not be visited, and the HTML Element descriptions will not be sent to the output stream.- Throws:
java.io.IOException
- The general purposeinterface java.lang.Appendable
requires checking for anIOException
throw when printing information. If the'Appendable'
provided to this method fails, this exception shall propagate out.- Code:
- Exact Method Body:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
a.append("TAGS: "); for (String tag : tags) a.append(tag + ", "); a.append("\n\nDEPRECATED: "); for (String deprecatedTag : deprecated) a.append(deprecatedTag + ", "); a.append("\n\nHTML5: "); for (String html5Tag : html5Tags) a.append(html5Tag + ", "); a.append("\n\nSINGLETON-TAGS: "); for (String selfClosingTag : singletonTags) a.append(selfClosingTag + ", "); a.append("\n\nBLOCK-TAGS: "); for (String blockTag : blockTags) a.append(blockTag + ", "); a.append("\n\nINLINE-TAGS: "); for (String inlineTag : inlineTags) a.append(inlineTag + ", "); a.append("\n\ntagNodesOpening: "); for (String s : tagNodesOpening.keySet()) a.append(tagNodesOpening.get(s).toString() + ", "); a.append("\n\ntagNodesClosing: "); for (String s : tagNodesClosing.keySet()) a.append(tagNodesClosing.get(s).toString() + ", "); a.append("\n\ntagNodesOpeningUC: "); for (String s : tagNodesOpeningUC.keySet()) a.append(tagNodesOpeningUC.get(s).toString() + ", "); a.append("\n\ntagNodesClosingUC: "); for (String s : tagNodesClosingUC.keySet()) a.append(tagNodesClosingUC.get(s).toString() + ", "); if (printDescriptions) { loadDescriptions(); // Will only load if descriptions have not already been loaded. a.append("\n\n"); for (String s : descriptions.keySet()) a.append(s + ((s.length() >= 7) ? ":\t" : ":\t\t") + descriptions.get(s) + "\n"); }
-
loadDescriptions
public static void loadDescriptions()
The data-structure (a javaTreeMap<String, String>
) that holds the individualtext-descriptions
of each HTML tag is not loaded into memory from the JAR file when the class-loader loads this class. Instead, if the programmer would like to report information about HTML tags, and would like to include a short, one or two sentence description of the HTML Elements, use the methodpublic static String getDescription(String htmlTag);
IMPORANT: Unless this methodloadDescriptions()
has been invoked, that method will simply returnnull
for each Element.
NOTE: The only purpose of keeping these sentences in a jar file is that they are a little long, and really are never used at all - unless you are interested in doing reporting. By keeping them in the jar-file, unless requested, this will save some on "over-head."
ALSO: If the descriptions have already loaded, this method will just exit and return.- See Also:
LFEC.readObjectFromFile_JAR(Class, String, boolean, Class)
- Code:
- Exact Method Body:
1 2 3
if (descriptions.size() == 0) descriptions.putAll((TreeMap<String, String>) LFEC.readObjectFromFile_JAR (Torello.Data.DataFileLoader.class, "data03.tmdat", true, TreeMap.class));
-
maxTokenLength
public static byte maxTokenLength()
This will compute theString
-length of the longest HTML token saved in the internal stateTreeSet<String>
of HTML Tokens.- Returns:
- The length of the longest HTML Token String.
- Code:
- Exact Method Body:
1
return MAX_TOKEN_LENGTH;
-
addTag
public static boolean addTag(java.lang.String htmlTag)
Adds a new HTML element to the list of elements that may be parsed, created and checked. This is not always advisable, as the complete list of HTML-5 tags are already internally stored, but if you would like to add or remove certain tags, there are two methods for doing this.- Parameters:
htmlTag
- Any HTML tag that you would like to see parsed by the HTML page parser. If the parser encounters a construct such as:<YOUR_NEW_TAG ATTRIBUTES="...">
it will treat that as a new HTML element.- Returns:
- TRUE if the element was indeed a new element to the list, and FALSE if the HTML-tokens-list already contained this HTML element. If so, this method call will just return gracefully - with no changes being made to the underlying list of acceptable HTML tokens.
- Throws:
HTMLTokException
- If the parameter contains non-alpha-numeric characters.- Code:
- Exact Method Body:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Matcher m = HTML_TAG_ALPHA_NUMERIC.matcher(htmlTag); if ((! m.find()) || (htmlTag.length() != m.group().length())) throw new HTMLTokException( "The HTML-Tag Parameter that was passed [" + htmlTag + "] doesn't conform to the " + "expected requirements for HTML-Tags. It may only contain alpha-numeric characters, " + "and it must not begin with a number." ); String tag = htmlTag.trim().toLowerCase(); if (tag.length() > 127) throw new HTMLTokException( "The (trimmed) HTML-Tag Parameter that was passed [" + tag + "] is longer than 127 " + "characters. This is not allowed here." ); boolean ret = tags.add(tag); if (ret) { // NOTE: These four private, static fields are of type TreeMap<String, TagNode> // tagNodesOpening, tagNodesOpeningUC, tagNodesClosing, tagNodesClosingUC // // They can provide a significant savings for the Garbage Collector. For any // HTML Element that does not have any attributes, and has a standard 'case' // (all upper-case, or all lower-case), the parser will "re-use" pre-existing // instances of class TagNode, rather than building a new one. // FOR EXAMPLE: The parser will "re-use" the same instance of a "<BR>" TagNode, or // any one, actually, as long as it does not have attributes. Since 40% // to 50% of class TagNode are "TC.ClosingTags", this can be a significant // improvement // Build a Lower-Case, Pre-Instantiated, Zero-Attribute version of the HTML Element // Uses specialized package-only visible TagNode constructor. // Not available to the general public tagNodesOpening.put(tag, new TagNode(tag, TC.OpeningTags)); tagNodesClosing.put(tag, new TagNode(tag, TC.ClosingTags)); // Build an Upper-Case, Pre-Instantiated, Zero-Attribute version of the HTML Element tag = tag.toUpperCase(); tagNodesOpeningUC.put(tag, new TagNode("<" + tag + ">")); tagNodesClosingUC.put(tag, new TagNode("</" + tag + ">")); // Update the MAX_TOKEN_LENGTH - but only if necessary. if (tag.length() > MAX_TOKEN_LENGTH) MAX_TOKEN_LENGTH = (byte) tag.length(); } return ret;
-
removeTag
public static boolean removeTag(java.lang.String htmlTag)
Removes and HTML element from the list of elements that may be parsed, created and checked. This is not always advisable, as the complete list of HTML-5 tags are already internally stored, but if you would like to add or remove certain tags, there are two methods for doing this.- Parameters:
htmlTag
- Any HTML tag that you no longer want to see parsed by the HTML page parser. HTML nodes that contain this tag as their element will cause the parser to ignore the node, and treat it like aTextNode
.- Returns:
- TRUE if the element was removed, and FALSE if it was not - because it wasn't in the HTML-tokens-list in the first place.
- Code:
- Exact Method Body:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
String tag = htmlTag.trim().toLowerCase(); boolean ret = tags.remove(tag); if (ret) { // "Lower-Case" and "Pre-Instantiated" (Zero-Attributes) version of TagNode tagNodesOpening.remove(tag); tagNodesClosing.remove(tag); tag = tag.toUpperCase(); // "Upper-Case", Pre-Instantiated, Zero-Attribute version of TagNode tagNodesOpeningUC.remove(tag); tagNodesClosingUC.remove(tag); // After removal, there is a small chance the // MAX_TOKEN_LENGTH is, now, shorter if (tag.length() == MAX_TOKEN_LENGTH) setMaxTokenLength(); } return ret;
-
addSingleton
public static boolean addSingleton(java.lang.String htmlTagSingleton)
Removes an HTML-element to the list of singleton HTML-elements. A singleton may only have an "opening" tag, and may not have a closing-version tag. For instance the<IMG SRC="...">
is the classic-singleton, it's data is all stored internally as attribute values.- Parameters:
htmlTagSingleton
- Any HTML tag that you would like to see listed as a singleton HTML-element.- Returns:
- TRUE if the element was indeed a new element to the list, and FALSE if the HTML-singleton tokens-list already contained this HTML element. If so, this method call will just return gracefully - with no changes being made to the underlying list of singleton tokens.
- Throws:
java.lang.IllegalArgumentException
- If you have tried to "register" a singleton tag that isn't a fundamental HTML-tag, then this method will throw an exception directing you to first add your token to the HTML-tags/tokens internal-list.- Code:
- Exact Method Body:
1 2 3 4 5 6 7 8 9 10 11 12
String tag = htmlTagSingleton.trim().toLowerCase(); if (! tags.contains(tag)) throw new IllegalArgumentException( "The HTML token you have attempted to add [" + tag + "] may not be added to the " + "singletons list, because it is not a known/registered HTML token, as of now. First, " + "make sure it is listed as one of the parser's tokens by calling 'addTag(token)', and" + "then invoking this method with that token." ); // Internally, there is a private & static TreeSet<String> which saves the names // of all HTML 'singleton' elements. Use Java's TreeSet.add(E) method return singletonTags.add(tag);
-
removeSingleton
public static boolean removeSingleton(java.lang.String htmlTagSingleton)
Adds an HTML-element to the list of singleton HTML-elements. A singleton may only have an "opening" tag, and may not have a closing-version tag. For instance the<IMG SRC="...">
is the classic-singleton, it's data is all stored internally as attribute values.- Parameters:
htmlTagSingleton
- Any HTML tag that you no longer want to see in the HTML-singleton tokens-list.- Returns:
- TRUE if the element was removed, and FALSE if it was not - because it wasn't in the HTML-Singleton tokens-list in the first place.
- Code:
- Exact Method Body:
1 2 3 4 5
String tag = htmlTagSingleton.trim().toLowerCase(); // Internally, there is a private & static TreeSet<String> which saves the names // of all HTML 'singleton' elements. Use Java's TreeSet.remove(Object) method return singletonTags.remove(tag);
-
hasTag
public static TagNode hasTag(java.lang.String tag, TC openOrClosed)
The purpose of this function/method is to provide a little "optimization." Since 100% ofclass HTMLTag
information is stored as constant/final - this class facilitates instantiating only one copy of each node when building HTML page node-Vectors.
Internal to this class is a'Vector<TagNode>'
of each and every HTML tag available - both in upper-case tag-versions, and also in lower-case tags. There must also be an opening-version of theTagNode
, and also a closing-version of the sameTagNode
. This does, indeed, make a total of four total pre-instantiated tags that are stored withinjava.util.TreeMap<String, TagNode>
within thisclass
.
NOTE: Because this class isSerializable
, these four instances of each an every (lower-case / upper-case, and open-tag, closing-tag) ... of each-and-every instance ofclass 'TagNode'
these four instances have already been created and written to a data-file that is saved within the'JavaHTML.jar'
distributions of this library. The pre-instantiated instances ofclass java.util.TreeMap
are loaded from the jar into memory by the Class-Loader at runtime startup. NOTE: It is not mandatory to "reuse" instantiated HTML TagNode's, but for memory management, garbage-collection efficiency, and other optimizations, the classes in this package use the pre-instantiated versions of these objects whenever possible.- Parameters:
tag
- Any valid HTML tag. If the String passed is not a valid HTML tag, then this method will return null.openOrClosed
- IfTC.OpeningTags
is passed, then an "open" version of the HTML tag will be returned, and ifTC.ClosingTags
is passed, then a closing version will be returned. IfTC.Both
is accidentally passed - it will default toTC.OpeningTags
- Returns:
- An opening (or closing)
TagNode
- ornull
if the passedString tag
does not represent any valid HTML-Tag - Code:
- Exact Method Body:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
// FAIL-FAST: Check Input's immediately. Throw Exception for invalid input. if (openOrClosed == null) throw new NullPointerException ("Parameter 'openOrClosed' is null, but this is not allowed."); if (openOrClosed == TC.Both) throw new IllegalArgumentException ("Parameter 'openOrClosed' was specified as TC.Both, but this is not allowed here."); // IMPORTANT NOTE: For Singleton-Tags: There is no closing-version, so one SHOULD NOT be // requested. (There is no '</IMG>' tag!) However, this method DOES NOT throw // IllegalArgumentException in this case, but rather it just exits gracefully, and returns // null. String tagLC = tag.toLowerCase(); if (singletonTags.contains(tagLC) && (openOrClosed == TC.ClosingTags)) return null; // First, Check if the 'tag' is all lower-case. If it is, the string would be identical to // the 'tagLC' variable we have just created. if (tagLC.equals(tag)) { // Debugging Information, Debug-println. Un-comment to follow. DO NOTE DELETE THIS LINE. // System.out.println("Used a pre-instantiated TagNode, Lower-Case TreeMap"); return (openOrClosed == TC.OpeningTags) ? tagNodesOpening.get(tag) : tagNodesClosing.get(tag); } // Now, here, the variable could not have been all-lower-case. NEXT, Check if it is // all-upper-case // // NOTE: There are pre-defined tables that include pre-instantiated TagNode's - both for // lower-case tags and for upper-case tags. String tagUC = tag.toUpperCase(); if (tagUC.equals(tag)) { // Debugging Information, Debug-println. Un-comment to follow. DO NOTE DELETE THIS LINE. // System.out.println("Used a pre-instantiated TagNode, Upper-Case TreeMap"); return (openOrClosed == TC.OpeningTags) ? tagNodesOpeningUC.get(tag) : tagNodesClosingUC.get(tag); } // SPECIAL CASE: (Very Rare / Unlikely, but possible) The user has created an HTML Element // that has some lower-case alphabet letters, and some upper-case as well. This does not // guarantee that it is a valid HTML Token, though, so check // // FOR EXAMPLE: If somebody typed <SeCtIoN>, we need to preserve the case, no matter how // bizarre. In such a case, a pre-packaged TagNode cannot be used, and instead // a new TagNode must be instantiated. if (openOrClosed == TC.OpeningTags) return (tagNodesOpening.get(tagLC) == null) ? null : new TagNode("<" + tag + ">"); else return (tagNodesClosing.get(tagLC) == null) ? null : new TagNode("</" + tag + ">");
-
getTag_MEM_HEAP_CHECKOUT_COPY
public static java.lang.String getTag_MEM_HEAP_CHECKOUT_COPY (java.lang.String tag)
This is an optimized, internal method that is used to prevent lots of duplicate HTML token-String's
from being created by theparser.
Internally, there ought to be just one-instance ofString's
like:"img", "br", "div",
etc... This is used by theparser
to reuse an already instantiated tokenString.
This method probably has relatively little use outside of the internal HTMLparser
code.- Parameters:
tag
- This is an HTML token. An identicalString
to this 'token'String
, but possible different memory reference on the heap shall be returned.- Returns:
- The returned
String
shall obey this issue:- assert(tag.equals(returned_string)); // Identical
String
is returned - assert(! (tag == returnedString)); // Probably a different memory allocation on the
// heap. PROBABLY!
Note that Java does not make any contracts regardingString
references! (This can only help...) - Hope this makes sense, thanks for playing jeopardy boys and girls.
IMPORTANT: If the tag passed is not a valid HTML tag, then this method shall return null. - assert(tag.equals(returned_string)); // Identical
- Code:
- Exact Method Body:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
if (BUILDING_DATA_FILE___SKIP_OPTIMIZATION_TEMPORARILY) return tag.toLowerCase(); // Obviously, for the 200 or so "pre-instantiated" (having-no-attributes) instances of // class TagNode that are kept, internally, in the data-structures of this class, 'HTMLTags' // We cannot retrieve a "pre-allocated" copy of the tag-as-a-string from the heap, because // we are building the data-file for the first time! TagNode tn = tagNodesOpening.get(tag.toLowerCase()); if (tn == null) return null; return tn.tok; // This "version" (of the exact same html-element-name is already on the heap) // Obviously, because, variable 'tn' has already been instantiated and is in the TreeMap // If this EXACT SAME REFERENCE IS USED FOR ALL "TagNode.tok" instances, quite a bit of // wasted-space in the heap's lookup table will be eliminated as the same "token" // (which is the name of the HTML Element: "div," "img," "span," etc...) is reused over // and over and over again. Helps a little bit! Not that complicated!
-
isTag
public static boolean isTag(java.lang.String tag)
Checks if aString
is registered as a proper HTML tag according to the internally maintained lists.
CASE INSENSITIVE: The test performed by this method shall ignore case.
The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's parameter'tag'
, and result in a return value of TRUE. This list is the complete list of HTML Element Names that are maintained, by default, in this class internalLookup Table
ofHTML Elements
.
HTML Elements
List Modification: The list ofHTML Elements
may, in fact, be altered. To add a newElement Name
to the internal lookup table of valid HTML Elements, useaddTag(String)
. To remove an HTML Element from the internal list, useremoveTag(String)
.- Returns:
- TRUE if this is a valid HTML tag. NOTE: All HTML-5 Element-Tag
Strings
will return TRUE as they are contained in the default internal list. - Code:
- Exact Method Body:
1 2 3
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the standard HTML Tags. Just uses Java's TreeSet.contains(Object) method. return tags.contains(tag.toLowerCase());
-
isHTML5
public static boolean isHTML5(java.lang.String tok)
Checks if aString
is a proper HTML-5 (only) tag. This list is rather short, and only containsHTML Elements
which specifically for the release of HTML 5. AnyHTML Element
which is both a validHTML Release 4
(or earlier) and anHTML 5 Element
will not result in TRUE being returned by this method.
CASE INSENSITIVE: The test performed by this method shall ignore case.
The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's parameter'tok'
, and result in a return value of TRUE. This list is the complete list of HTML 5 Element Names that are maintained, by default, in this class internalLookup Table
ofHTML 5 Elements
.
Elements Added for HTML-5
- Parameters:
tok
- Any HTML-Tag as aString
.- Returns:
- TRUE if this is a tag that was added for HTML-5, and not included in HTML 4, or earlier
- Code:
- Exact Method Body:
1 2 3
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML-5 Tags. Just uses Java's TreeSet.contains(Object) method. return html5Tags.contains(tok.toLowerCase());
-
deprecated
public static boolean deprecated(java.lang.String tok)
Checks if aString
is listed as an HTML Element that was deprecated for HTML 5
CASE INSENSITIVE: The test performed by this method shall ignore case.
The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's parameter'tok'
, and result in a return value of TRUE. This list is the complete list of Deprecated HTML Element Names that are maintained, by default, in this class internalLookup Table
ofDeprecated HTML Elements
.
Elements Deprecated for HTML-5
- Parameters:
tok
- Any HTML-Tag as aString
.- Returns:
- TRUE if this tag was deprecated for HTML-5
- Code:
- Exact Method Body:
1 2 3 4
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the deprecated-for-HTML-5 Tags. Just uses Java's TreeSet.contains(Object) // method. return deprecated.contains(tok.toLowerCase());
-
isSingleton
public static boolean isSingleton(java.lang.String tok)
This method checks whether specific HTML elements are both "opening and closing" elements, such as:P, DIV, SPAN,
along with myriad others, OR if this one of the (very few) "singleton HTML elements", such as the HTML<IMG SRC="...">
element which may not have a closing tag. Such tags are also called "Self-Closing" tags.
CASE INSENSITIVE: The test performed by this method shall ignore case.
The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's parameter'tag'
, and result in a return value of TRUE. This list is the complete list of Singleton HTML Element Names that are maintained, by default, in this class internalLookup Table
ofSelf-Closing HTML Elements
.
Singleton Elements
List Modification: The list ofSingleton HTML Elements
may, in fact, be altered. To add a newSingleton HTML Element Name
to the internal lookup table of valid Singleton Elements, useaddSingleton(String)
. To remove an HTML Elementfrom the internal list, useremoveSingleton(String)
.- Parameters:
tok
- This is the HTML element name to be tested.- Returns:
- TRUE if this is a
'singleton'
HTML Element - a.k.a., onlyOpeningTag
versions of the element exist, because singleton HTML elements don't need / may not have a closing tag.Singleton
examples include:IMG, HR, INPUT
etc...
FALSE is returned if the tag is not asingleton
parameter. - Code:
- Exact Method Body:
1 2 3
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the 'singleton' HTML Tags. Just uses Java's TreeSet.contains(Object) method. return singletonTags.contains(tok.toLowerCase());
-
isBlock
public static boolean isBlock(java.lang.String tok)
This method checks whether specific HTML elements are among the'Block'
Tag elements list. An explanation of what a'block'
or'inline'
tag is, is beyond the scope of this document.
CASE INSENSITIVE: The test performed by this method shall ignore case.
The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's parameter'tag'
, and result in a return value of TRUE. This list is the complete list of Block HTML Element Names that are maintained, by default, in this class internalLookup Table
ofHTML Block Elements
.
HTML Block Elements
- Parameters:
tok
- This is the HTML element name to be tested.- Returns:
- TRUE if this is a
'block'
HTML Element, FALSE otherwise. - Code:
- Exact Method Body:
1 2 3
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML 'Block' Tags. Just uses Java's TreeSet.contains(Object) method. return blockTags.contains(tok.toLowerCase());
-
isInline
public static boolean isInline(java.lang.String tok)
This method checks whether specific HTML elements are among the'Inline'
Tag elements list. An explanation of what a'block'
or'inline'
tag is, is beyond the scope of this document.
CASE INSENSITIVE: The test performed by this method shall ignore case.
The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's parameter'tag'
, and result in a return value of TRUE. This list is the complete list of Inline HTML Element Names that are maintained, by default, in this class internalLookup Table
ofHTML Inline Elements
.
HTML Inline Elements
- Parameters:
tok
- This is the HTML element name to be tested.- Returns:
- TRUE if this is an
'inline'
HTML Element, FALSE otherwise. - Code:
- Exact Method Body:
1 2 3
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML 'Inline' Tags. Just uses Java's TreeSet.contains(Object) method. return inlineTags.contains(tok.toLowerCase());
-
getDescription
public static java.lang.String getDescription(java.lang.String tag)
Returns a brief, English Language Description, of an HTML Tag. These descriptions are stored in a small data-file,
DATA-FILE LOAD: This method will attempt to load a particular data-file from the JAR-library into memory. This file contains a one-sentence description, stored asjava.lang.String's
for each of the HTML Elements known to this class. Under normal operation, theseString
-arrays remain on-disk, only.- Parameters:
tag
- Any valid HTML tag.- Returns:
- A short English-Language description of the Tag in HTML, or null if this tag is unknown.
- See Also:
loadDescriptions()
- Code:
- Exact Method Body:
1 2 3 4 5
// Loads the descriptions map, ONLY IF they have not already been loaded into memory from // the JAR data-files loadDescriptions(); return descriptions.get(tag.toLowerCase());
-
iterator
public static java.util.Iterator<java.lang.String> iterator()
Internally, tags are stored in a Javajava.util.TreeSet<String>
. This method invokes theiterator()
method on thatTreeSet
.
REMOVE NOTE: In order to prevent accidental removal of any HTML Tags via theIterator.remove()
method, the'Iterator<String>'
class has been overloaded - "wrapped" - in a simple class that throws an exception ifremove()
is invoked. The purpose is to prevent a user from accidentally a member of the internal-set data-structure.- Returns:
- an
Iterator<String>
that iterates over all the Tag-String's
in alphabetical order.
Data File Contents: The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by theClass Loader
. TheIterator
) may be viewed, here, by clicking the link below:
HTML Elements
- See Also:
RemoveUnsupportedIterator
- Code:
- Exact Method Body:
1 2 3 4 5 6
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the standard HTML Tags. Just uses Java's TreeSet.iterator() method. // // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this // TreeSet return new RemoveUnsupportedIterator<String>(tags.iterator());
-
iteratorDescriptions
public static java.util.Iterator<java.util.Map.Entry<java.lang.String,java.lang.String>> iteratorDescriptions ()
Will build anIterator
that can return attributes and their text-String
descriptions.
NOTE: This will force this class to load the "HTML-Element Descriptions Data File" to load the list of Tag-DescriptionString
-Data into memory. Generally, in this class, if the methods invoked do not require the Event-DescriptionString
-Data, then the Class-Loader will not load this extensive text-data into memory from the JAR data-files.- Returns:
- an
Iterator
that iterates the HTML-Tag / HTML-Tag-Description key-value pairs as instances of"Map.Entry<String, String>"
Data File Contents: The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load is only performed on request. The contents of this data-file (and the list ofMap.Entry's
returned by theIterator
) may be viewed, here, by clicking the link below:
HTML Elements with Descriptions
- See Also:
loadDescriptions()
,RemoveUnsupportedIterator
- Code:
- Exact Method Body:
1 2 3
loadDescriptions(); // Will only load if descriptions have not already been loaded. return new RemoveUnsupportedIterator<Map.Entry<String, String>> (descriptions.entrySet().iterator());
-
iteratorAddedForHTML5
public static java.util.Iterator<java.lang.String> iteratorAddedForHTML5()
Internally, HTML-5 tags are stored in a Javajava.util.TreeSet<String>
. This method invokes theiterator()
method on thatTreeSet
.
REMOVE NOTE: In order to prevent accidental removal of HTML-5 tags via theIterator.remove()
method, the'Iterator<String>'
class has been overloaded - "wrapped" - in a simple class that throws an exception ifremove()
is invoked. The purpose is to prevent a user from accidentally destroying a member of the internal-set data-structure.- Returns:
- an
Iterator<String>
that cycles through the list of HTML Tag-String's that were added for in HTML-5.
Data File Contents: The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by theClass Loader
. The contents of this data-file (and the list ofString's
returned by theIterator
) may be viewed, here, by clicking the link below:
Elements Added for HTML-5
- See Also:
RemoveUnsupportedIterator
- Code:
- Exact Method Body:
1 2 3 4 5 6
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML-5 Tags. Just uses Java's TreeSet.iterator() method. // // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this // TreeSet return new RemoveUnsupportedIterator<String>(html5Tags.iterator());
-
iteratorDeprecatedForHTML5
public static java.util.Iterator<java.lang.String> iteratorDeprecatedForHTML5 ()
Internally, deprecated tags are stored in a Javajava.util.TreeSet<String>
. This method invokes theiterator()
method on thatTreeSet
.
REMOVE NOTE: In order to prevent accidental removal of deprecated tags-list via theIterator.remove()
method, the'Iterator<String>'
class has been overloaded - "wrapped" - in a simple class that throws an exception ifremove()
is invoked. The purpose is to prevent a user from accidentally destroying a member of the internal-set data-structure.- Returns:
- an
Iterator<String>
that cycles through the list of HTML Tag-String's that were removed for HTML-5.
Data File Contents: The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by theClass Loader
. The contents of this data-file (and the list ofString's
returned by theIterator
) may be viewed, here, by clicking the link below:
Elements Deprecated for HTML-5
- See Also:
RemoveUnsupportedIterator
- Code:
- Exact Method Body:
1 2 3 4 5 6
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the deprecated Tags. Just uses Java's TreeSet.iterator() method. // // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this // TreeSet return new RemoveUnsupportedIterator<String>(deprecated.iterator());
-
iteratorSingletonTags
public static java.util.Iterator<java.lang.String> iteratorSingletonTags()
Internally, singleton / self-closing tags are stored in a Javajava.util.TreeSet<String>
. This method invokes theiterator()
method on thatTreeSet
.
REMOVE NOTE: In order to prevent accidental removal ofsingleton
tags via theIterator.remove()
method, the'Iterator<String>'
class has been overloaded - "wrapped" - in a simple class that throws an exception ifremove()
is invoked. The purpose is to prevent a user from accidentally destroying a member of the internal-set data-structure.- Returns:
- an
Iterator<String>
that cycles through the list of HTML Tag-String's that qualify as singleton elements, and may not have closing-tag versions.
Data File Contents: The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by theClass Loader
. The contents of this data-file (and the list ofString's
returned by theIterator
) may be viewed, here, by clicking the link below:
Singleton Elements
- See Also:
RemoveUnsupportedIterator
- Code:
- Exact Method Body:
1 2 3 4 5 6
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML 'Singleton' Tags. Just uses Java's TreeSet.iterator() method. // // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this // TreeSet return new RemoveUnsupportedIterator<String>(singletonTags.iterator());
-
iteratorBlockTags
public static java.util.Iterator<java.lang.String> iteratorBlockTags()
Internally, singleton / self-closing tags are stored in a Javajava.util.TreeSet<String>
. This method invokes theiterator()
method on thatTreeSet
.
REMOVE NOTE: In order to prevent accidental removal of Block-Tags via theIterator's remove()
method, the'Iterator<String>'
class has been overloaded - "wrapped" - in a simple class that throws an exception ifremove()
is invoked. The purpose is to prevent a user from accidentally a member of the internal-set data-structure.- Returns:
- an
Iterator<String>
that cycles through the list of HTML Tag-String's that qualify as block elements.
Data File Contents: The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by theClass Loader
. The contents of this data-file (and the list ofString's
returned by theIterator
) may be viewed, here, by clicking the link below:
HTML Block Elements
- See Also:
RemoveUnsupportedIterator
- Code:
- Exact Method Body:
1 2 3 4 5 6
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML 'Inline' Tags. Just uses Java's TreeSet.iterator() method. // // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this // TreeSet return new RemoveUnsupportedIterator<String>(blockTags.iterator());
-
iteratorInlineTags
public static java.util.Iterator<java.lang.String> iteratorInlineTags()
Internally, "HTML Block Tags" are stored in a Javajava.util.TreeSet<String>
. This method invokes theiterator();
method on thatTreeSet
.
REMOVE NOTE: In order to prevent accidental removal of Inline-Tags via the Iterator's "Remove" Method, the'Iterator<String>'
class has been overloaded - "wrapped" - in a simple class that throws an exception ifremove()
is invoked. The purpose is to prevent a user from accidentally a member of the internal-set data-structure.- Returns:
- an
Iterator<String>
that cycles through the list of HTML Tag-String's that qualify as inline elements.
Data File Contents: The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by theClass Loader
. The contents of this data-file (and the list ofString's
returned by theIterator
) may be viewed, here, by clicking the link below:
HTML Inline Elements
- See Also:
RemoveUnsupportedIterator
- Code:
- Exact Method Body:
1 2 3 4 5 6
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML 'Block' Tags. Just uses Java's TreeSet.iterator() method. // // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this // TreeSet return new RemoveUnsupportedIterator<String>(inlineTags.iterator());
-
-