Package Torello.HTML

Class Features.Meta

  • Enclosing class:
    Features

    public static class Features.Meta
    extends java.lang.Object
    Features Meta - Documentation.

    This nested inner class deals, explicitly, with inserting and getting HTML <META ...> tags. The contents of this java-class help retrieve and insert HTML <META> tag elements into a webpage. This is sometimes referred to using the general category known as "SEO" or search-engine-optimization. When the (albeit few) major web-search companies use web-crawlers to troll the internet and find pages to list on their web-site indexing services, they usually start with the META-information that a web-page designer explicitly puts on the page for that exact purpose - telling companies like Google what they feel is most salient about each particular page which has SEO <META> tags available.

    Google and other search engines, and even non-search engine companies, individuals or organizations can and will use HTML META elements in any way they so choose. Some META-tag information will influence Google's software decisions on what web-sites to return with it's search-results pages more than other META-tag information devices will.

    This class also includes some simplification of the 'Open Graph' protocol so that users can also identify their content to anybody who would want to include links to their pages. Open-Graph can be useful to Google, but it is often used to identify which elements of an HTML page should be displayed when someone posts a link to any particular web-page that has open-graph meta-tags in its HTML head section. For instance, if an Open Graph HTML meta-tag is added to a web-page header that picks a particular HTML picture (<IMG SRC='...'> element), if a person surfing the web decides to e-mail a link to your web-page, when the e-mail company that does the send e-mail work 'glues' or 'puts' that URL on the e-mail, the picture or image that is specified in the Open-Graph image META-tag is the image that will be included in the e-mail.

    Static (Functional) API: The methods in this class are all (100%) defined with the Java Key-Word / Key-Concept 'static'. Furthermore, there is no way to obtain an instance of this class, because there are no public (nor private) constructors. Java's Spring-Boot, MVC feature is *not* utilized because it flies directly in the face of the light-weight data-classes philosophy. This has many advantages over the rather ornate Component Annotations (@Component, @Service, @AutoWired, etc... 'Java Beans') syntax:

    • The methods here use the key-word 'static' which means (by implication) that there is no internal-state. Without any 'internal state' there is no need for constructors in the first place! (This is often the complaint by MVC Programmers).
    • A 'Static' (Functional-Programming) API expects to use fewer data-classes, and light-weight data-classes, making it easier to understand and to program.
    • The Vectorized HTML data-model allows more user-control over HTML parse, search, update & scrape. Also, memory management, memory leakage, and the Java Garbage Collector ought to be intelligible through the 'reuse' of the standard JDK class Vector for storing HTML Web-Page data.

    The power that object-oriented programming extends to a user is (mostly) limited to data-representation. Thinking of "Services" as "Objects" (Spring-MVC, 'Java Beans') is somewhat 'over-applying' the Object Oriented Programming Model. Like most classes in the Java-HTML JAR Library, this class backtracks to a more C-Styled Functional Programming Model (no Objects) - by re-using (quite profusely) the key-word static with all of its methods, and by sticking to Java's well-understood class Vector

    Static Fields: The methods in this class do not create any internal state that is maintained - however there are a few private & static fields defined. These fields are instantiated only once during the Class Loader phase (and only if this class shall be used), and serve as data 'lookup' fields (static constants). View this class' source-code in the link provided below to see internally used data.

    The internally defined fields include seven HTML <META> Tag's. All seven are typed as java.lang.String finals / constants. There is also a java.util.TreeMap lookup-table for storing the list of Open-Graph Properties.



    • Field Detail

      • metaTagItemProp

        public static final java.lang.String metaTagItemProp
        This HTML <META ...> tag is less frequently used, but does provide some properties needed by and used by various web-servers. It is the "ITEMPROP" meta-tag.
        See Also:
        getItemProp(Vector, String), insertItemProp(Vector, String, String), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        2
        public static final String metaTagItemProp =
                    "<meta itemprop='INSERT-ITEMPROP-STRING-HERE' content='INSERT-CONTENT-STRING-HERE' >";
        
      • descriptionMetaTag

        public static final java.lang.String descriptionMetaTag
        When search engines crawl the internet for web-pages and their key-word descriptions to index those web-pages, this HTML <META> tag is one of the first things they look for. You can add a meta description in the <head> section of your site’s HTML. You should have complete control of your meta description, and if you use an SEO plug-in frequently / often will be able to add a meta description to the ‘meta description’ section, and even preview examples of your web-site would look in search engine results pages (SERP's).

        A meta description can influence the decision of the searcher as to whether they want to click through on your content from search results or not. The more descriptive, attractive and relevant the description, the more likely someone will click through. Google has stated that meta descriptions are NOT a ranking signal. But, again, the quality of the description will influence click-through rate, so it is very important to use this element wisely.
        See Also:
        insertDescription(Vector, String), hasDescription(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        2
        public static final String descriptionMetaTag =
                    "<meta name='description' content='INSERT-DESCRIPTION-OR-KEYWORDS-HERE'>";
        
      • UTF8MetaTag

        public static final java.lang.String UTF8MetaTag
        This was actually the most used HTML '<META ...>' tag element used on both the Spanish News Board, and the Chinese News Board websites. When this meta-tag is included on a web-page, it guarantees that the web-browser will make every effort to use and accommodate UTF-8 characters. Generally, when working with strictly-English pages, the ASCII character-set (the first 256 characters of the UTF-8 character-set) will suffice. However, there are actually multiple complete foreign-language characters sets, with UTF-8 being the far-and-away front-runner.

        NOTE: It might be the situation that the 'http-equiv' inner-tag is not actually necessary for the browser to catch the content='text/html; charset-utf-8' inner-tag. It has been this coders experience, though, that when the text-String below is inserted into a web-page, it is guaranteed to work with all major web-browsers.

        ALSO: Many of the web-hosting companies, for instance Google Cloud Server have a file-by-file setting that may be set (again, for-example, in Google Cloud Server - there are "Storage Buckets" settings for each individual file that is hosted by a web-domain using the storage-buckets). When using the file-by-file settings, this META-tag is not actually necessary, and the web-server will inform connecting web-browsers that a particular file expects its content to be interpreted as UTF-8 character data. Either way, or both, will work fine when rendering language characters in Korean, Japanese, Mandarin, Spanish, Arabic, etc...
        
        你好,世界!
        こんにちは世界!
        안녕, 세상!
        ¡Hola Mundo!
        مرحبا بالعالم!
        Hello World!
        

        This java-doc and Torello.HTML.Tools.JavaDoc generated HTML-page, indeed, has a a UTF-8 HTML '<META ...>' tag inserted in its header.

        ACCORDING TO: Popular website 'http://www.w3schools.com', the following is true about setting the char-set for a web-site using HTML 'meta' tags:

        Differences Between HTML 4.01 and HTML5:

        Using http-equiv is no longer the only way to specify the character set of an HTML document:

        • HTML 4.01: <meta http-equiv="content-type" content="text/html; charset=UTF-8">
        • HTML5:     <meta charset="UTF-8">

        See Also:
        insertUTF8MetaTag(Vector), hasUTF8MetaTag(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        2
        public static final String UTF8MetaTag =
                    "<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />";
        
      • openGraphMetaTag

        public static final java.lang.String openGraphMetaTag
        The following content was copied from: http://ogp.me, the primary Open-Graph Protocol Website. It was word-for-word text-lifted on May 25th, 2019.

        Introduction

        The Open Graph protocol enables any web page to become a rich object in a social graph. For instance, this is used on Facebook to allow any web page to have the same functionality as any other object on Facebook.

        While many different technologies and schema exist and could be combined together, there isn't a single technology which provides enough information to richly represent any web page within the social graph. The Open Graph protocol builds on these existing technologies and gives developers one thing to implement. Developer simplicity is a key goal of the Open Graph protocol which has informed many of the technical design decisions.

        Basic Metadata

        To turn your web pages into graph objects, you need to add basic metadata to your page. We've based the initial version of the protocol on RDFa which means that you'll place additional <meta> tags in the <head> of your web page. The four required properties for every page are:

        og:title - The title of your object as it should appear within the graph, e.g., "The Rock".
        og:type - The type of your object, e.g., "video.movie". Depending on the type you specify, other properties may also be required.
        og:image - An image URL which should represent your object within the graph.
        og:url - The canonical URL of your object that will be used as its permanent ID in the graph, e.g., "http://www.imdb.com/title/tt0117500/"

        Optional Metadata

        The following properties are optional for any object and are generally recommended:

        og:audio - A URL to an audio file to accompany this object.
        og:description - A one to two sentence description of your object.
        og:determiner - The word that appears before this object's title in a sentence. An enum of (a, an, the, "", auto). If auto is chosen, the consumer of your data should chose between "a" or "an". Default is "" (blank).
        og:locale - The locale these tags are marked up in. Of the format language_TERRITORY. Default is en_US.
        og:locale:alternate - An array of other locales this page is available in.
        og:site_name - If your object is part of a larger web site, the name which should be displayed for the overall site. e.g., "IMDb".
        og:video - A URL to a video file that complements this object.

        Structured Properties

        Some properties can have extra metadata attached to them. These are specified in the same way as other metadata with property and content, but the property will have extra :.

        The og:image property has some optional structured properties:

        og:image:url - Identical to og:image.
        og:image:secure_url - An alternate url to use if the webpage requires HTTPS.
        og:image:type - A MIME type for this image.
        og:image:width - The number of pixels wide.
        og:image:height - The number of pixels high.
        og:image:alt - A description of what is in the image (not a caption). If the page specifies an og:image it should specify og:image:alt.

        Video


        Namespace URI: http://ogp.me/ns/video#
        og:type values:

        video.movie

        video:actor - profile array - Actors in the movie.
        video:actor:role - string - The role they played.
        video:director - profile array - Directors of the movie.
        video:writer - profile array - Writers of the movie.
        video:duration - integer >=1 - The movie's length in seconds.
        video:release_date - datetime - The date the movie was released.
        video:tag - string array - Tag words associated with this movie.

        video.episode

        video:actor - Identical to video.movie
        video:actor:role
        video:director
        video:writer
        video:duration
        video:release_date
        video:tag
        video:series - video.tv_show - Which series this episode belongs to.

        Music


        Namespace URI: http://ogp.me/ns/music#
        og:type values:

        music.song

        music:duration - integer >=1 - The song's length in seconds.
        music:album - music.album array - The album this song is from.
        music:album:disc - integer >=1 - Which disc of the album this song is on.
        music:album:track - integer >=1 - Which track this song is.
        music:musician - profile array - The musician that made this song.

        music.album

        music:song - music.song - The song on this album.
        music:song:disc - integer >=1 - The same as music:album:disc but in reverse.
        music:song:track - integer >=1 - The same as music:album:track but in reverse.
        music:musician - profile - The musician that made this song.
        music:release_date - datetime - The date the album was released.

        music.playlist

        music:song - Identical to the ones on music.album
        music:song:disc
        music:song:track
        music:creator - profile - The creator of this playlist.

        music.radio_station

        music:creator - profile - The creator of this station.

        No Vertical

        These are globally defined objects that just don't fit into a vertical but yet are broadly used and agreed upon.

        og:type values:

        article - Namespace URI: http://ogp.me/ns/article#
        article:published_time - datetime - When the article was first published.
        article:modified_time - datetime - When the article was last changed.
        article:expiration_time - datetime - When the article is out of date after.
        article:author - profile array - Writers of the article.
        article:section - string - A high-level section name. E.g. Technology
        article:tag - string array - Tag words associated with this article.

        book - Namespace URI: http://ogp.me/ns/book#
        book:author - profile array - Who wrote this book.
        book:isbn - string - The ISBN
        book:release_date - datetime - The date the book was released.
        book:tag - string array - Tag words associated with this book.

        profile - Namespace URI: http://ogp.me/ns/profile#
        profile:first_name - string - A name normally given to an individual by a parent or self-chosen.
        profile:last_name - string - A name inherited from a family or marriage and by which the individual is commonly known.
        profile:username - string - A short unique string to identify them.
        profile:gender - enum(male, female) - Their gender.

        website - Namespace URI: http://ogp.me/ns/website#
        No additional properties other than the basic ones. Any non-marked up webpage should be treated as og:type website.

        Types

        The following types are used when defining attributes in Open Graph protocol.

        Type Description Literals
        Boolean A Boolean represents a true or false value true, false, 1, 0
        DateTime A DateTime represents a temporal value composed of a date (year, month, day) and an optional time component (hours, minutes) https://en.wikipedia.org/w/index.php?title=ISO_8601&oldid=898205005
        Enum A type consisting of bounded set of constant string values (enumeration members). A string value that is a member of the enumeration
        Float A 64-bit signed floating point number All literals that conform to the following formats:

        1.234
        -1.234
        1.2e3
        -1.2e3
        7E-10
        Integer A 32-bit signed integer. In many languages integers over 32-bits become floats, so we limit Open Graph protocol for easy multi-language use. All literals that conform to the following formats:

        1234
        -123
        String A sequence of Unicode characters All literals composed of Unicode characters with no escape characters
        URL A sequence of Unicode characters that identify an Internet resource. All valid URLs that utilize the http:// or https:// protocols

        See Also:
        insertOGMetaTag(Vector, String, String), getAllOGMetaTags(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        2
        public static final String openGraphMetaTag =
                    "<meta property='og:INSERT-OG-PROPERTY-HERE' content='INSERT-OG-VALUE-HERE' />";
        
      • openGraphProperties

        public static final java.util.TreeMap<java.lang.String,​java.lang.String> openGraphProperties
        All Open-Graph Property names.
        Code:
        Exact Field Declaration Expression:
        1
        public static final TreeMap<String, String> openGraphProperties = new TreeMap<>();
        
      • keyWordsMetaTag

        public static final java.lang.String keyWordsMetaTag
        This helps identify relevant, pertinent or 'germane' words that describe the content of a web-site or web-page to a web-indexing or web-search organization.
        See Also:
        insertKeyWords(Vector, String[]), getAllKeyWords(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        2
        public static final String keyWordsMetaTag =
                    "<meta name='keywords' content='INSERT-COMMA-SEPARATED-KEYWORDS-HERE'>";
        
      • authorMetaTag

        public static final java.lang.String authorMetaTag
        This helps identify web-sites or web-pages "author-names" to web-indexing and web-search organization.
        See Also:
        insertAuthor(Vector, String), hasAuthor(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        2
        public static final String authorMetaTag =
                    "<meta name='author' content='INSERT-AUTHOR-NAME-HERE'>";
        
    • Method Detail

      • get

        public static java.lang.String get​(java.util.Vector<HTMLNode> html,
                                           java.lang.String name)
        This method will find an HTML <META NAME=... CONTENT=...> element whose 'name' property has a String-value equal-to, ignoring case, the String-value of the provided String-parameter 'name'. After this HTML 'META' element has been identified, the String-value of it's 'content' property parameter will be extracted and returned.

        NOTE: If the page provided does not have an HTML 'META' element with the specified name property, or if such an element is identified, but the element that is found does not have a 'content' attribute, then this method shall return 'null', gracefully.

        ALSO: Before the comparison using the 'name' parameter is performed, the String is trimmed using java.lang.String.trim(), and the comparison performed is case-insensitive.
        Parameters:
        html - Any vectorized HTML page, or sub-page.
        name - The name of the <META NAME=...> tag-element.
        Returns:
        The String-value of the 'content'-attribute for a 'META'-tag whose 'name' attribute is equal to the specified name provided by parameter 'name'. If such information is not found on the page, then this method shall return null.
        See Also:
        getItemProp(Vector, String), getHTTPEquiv(Vector, String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
         // Find the first <META NAME=... CONTENT=...> tag element where the name equals
         // the string-value provided by parameter name.
         TagNode tn = InnerTagGet.first
             (html, "meta", "name", TextComparitor.EQ_CI, name.trim());
        
         // If there are no <META NAME='NAME' CONTENT=...> elements found on the page,
         // then this method returns null.
         if (tn == null) return null;
        
         // Return the string-value of the attribute 'content'.  Note that if this
         // attribute isn't available, this method shall return 'null', gracefully.
         return tn.AV("content");
        
      • getAllMeta

        public static java.util.Vector<java.util.Properties> getAllMeta​
                    (java.util.Vector<HTMLNode> page)
        
        This simple method will retrieve a java.util.Properties object for each and every HTML <META ...> tag found within a parsed-vectorized HTML page.
        Parameters:
        page - Any Vectorized-HTML page. It is expected that this page contain a few META Tags. If not, the method will still return a Vector<Properties>, but it will have length zero.
        Returns:
        The Java "Properties" object that is returned from a call to TagNode.allAV()
        See Also:
        TagNode.allAV(), TagNodeGet
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
         Vector<Properties> ret = new Vector<>();
        
         // Retrieve all TagNode's that are HTML <META ...> Elements.  Invoke TagNode.allAV()
         // on each of these nodes to retrieve a java.util.Properties instance.
         // NOTE: These "Properties" could possibly be combined into a single Properties
         //       instance, but because of the ever-changing nature of Web-Page 
         //       Meta-Information tags, this is not employed here.  It is an exercise
         //       left to the programmer.
         for (TagNode tn : TagNodeGet.all(page, TC.OpeningTags, "meta"))
             ret.add(tn.allAV());
        
         return ret;
        
      • insertMetaTagName

        public static void insertMetaTagName​
                    (java.util.Vector<HTMLNode> html,
                     MetaTagName m,
                     java.lang.String contentAttributeValue)
        
        This does a very simple insertion of an HTML Meta-Tag for a specific type, meta-tags that have a name="..." and also a content="..." attribute-value pair set.
        Parameters:
        html - This is any vectorized-html page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        m - This is any of the enumerated-types of specific Meta-Tag Name property/content pairs
        contentAttributeValue - This is the value that will be used to set the 'content' attribute
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <meta name='robots' content='index/noindex, follow/nofollow'> would have to be inserted.
        QuotesException - If the 'contentAttributeValue' String uses a single-quote mark ('), anywhere in the String.
        See Also:
        metaTagName, getAllMetaTagNames(Vector), DotPair, TagNode
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
         // Builds and inserts a TagNode HTML Element that looks like:
         // <meta name='INSERT-NAME-STRING-HERE' content='INSERT-CONTENT-STRING-HERE'>
        
         // Single Quotes are used, so the attribute-value may not contain single quotes.
         checkForSingleQuote(contentAttributeValue);
            
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "<META name=... content=...> tag"));
        
         // Build a <META> tag, as in the comment above
         TagNode metaTN  = new TagNode(
             metaTagName
                 .replace("INSERT-NAME-STRING-HERE", m.name)
                 .replace("INSERT-CONTENT-STRING-HERE", contentAttributeValue)
         );
        
         // Insert the meta-tag into the page.  Put it at the top of the header,
         // just after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, metaTN, NEWLINE);
        
      • insertMetaTagNames

        public static void insertMetaTagNames​
                    (java.util.Vector<HTMLNode> html,
                     java.util.Hashtable<MetaTagName,​java.lang.String> metaTags)
        
        This does an insertion of a list of HTML Meta-Tags from a java Hashtable of Meta-Tag Name-Attribute / Content-Attribute pairs. All name-based meta-tags have both an name="..." attribute, and also a content="..." attribute.
        Parameters:
        html - This is any vectorized-html page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        metaTags - This is a hash-table of the enumerated-types of specific Meta-Tag Name property/content pairs.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <meta name='robots' content='index/noindex, follow/nofollow'> would have to be inserted.
        QuotesException - If any of the values from the key-value pair hash-table contain a string that has a single-quotation mark, anywhere in the String.
        See Also:
        metaTagName, getAllMetaTagNames(Vector), insertMetaTagName(Vector, MetaTagName, String), TagNode
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
         // Builds and inserts a TagNode HTML Element that looks like:
         // "<meta name='INSERT-NAME-STRING-HERE' content='INSERT-CONTENT-STRING-HERE'";
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "<META name=... content=...> tag"));
        
         // Java Stream's can be addictive...  It is an easier way to build a list.
         Stream.Builder<HTMLNode> b = Stream.builder();
         b.accept(NEWLINE);
        
         // Iterate the complete list of meta-tag names to insert
         for (MetaTagName m : metaTags.keySet())
         {
             String contentAttributeValue = metaTags.get(m);
             checkForSingleQuote(contentAttributeValue);
        
             // Build the new node
             TagNode metaTN = new TagNode(
                 metaTagName
                     .replace("INSERT-NAME-STRING-HERE", m.name)
                     .replace("INSERT-CONTENT-STRING-HERE", contentAttributeValue)
             );
        
             b.accept(metaTN);  b.accept(NEWLINE);
         }
                    
         // Insert the meta-tag names into the page.  Put it at the top of the header,
         // just after <HEAD>
         Util.insertNodes(html, header.start + 1, b.build().toArray(HTMLNode[]::new));
        
      • getAllMetaTagNames

        public static java.util.Hashtable<MetaTagName,​java.lang.String> getAllMetaTagNames​
                    (java.util.Vector<? extends HTMLNode> html)
        
        This will retrieve all Meta-Tag's that have name/content attribute pairs.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        Returns:
        a java.util.Hashtable of all the Meta-Tag Name/Content pairs that did not have null values.
        Throws:
        java.lang.IllegalArgumentException - The method MetaTagName.valueOf(...) will throw an Illegal Argument Exception if any of the <META NAME=...> elements use a value of "NAME" that is not listed or identified in the Enumerated Type "MetaTagName".

        ALTERNATIVE: As Internet Companies come and go, pinning down a complete list of valid Meta Tag's that use the "NAME" Attribute is a possibly misguided approach. In lieu of eliminating the Enumerated-Type MetaTagName, it should be easier to just use the standard TagNode search below:

        Example:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
         // This code should be used as an alternative to this method if there are non-standard
         // HTML Meta Tag Names.  It uses the more fundamental InnerTagGet Method.
        
         // This will retrieve all <META ...> HTML Elements that have a "NAME" Property.
         Vector<TagNode> metaTags = InnerTagGet.all(page, "meta", "name");
         
         // This will print out those results:
         for (TagNode metaTag : metaTags) System.out.println
             ("Name:\t" + metaTag.AV("name") + "\tContent:\t" + metaTag.AV("content"));
         
        
        See Also:
        MetaTagName, metaTagName, insertMetaTagName(Vector, MetaTagName, String), InnerTagGet
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
         Hashtable<MetaTagName, String> ret = new Hashtable<>();
        
         // Converting the output "Vector<TagNode>" to a "Stream<TagNode>" by calling the
         // .stream() method mainly because java streams provide the very simple
         // 'filter(Predicate)' and 'forEach(Consumer)' methods.  Vector.removeIf and
         // Vector.forEach could also have been easily used as well.
            
         // InnerTagGet.all returns a vector containing all <META NAME=...> TagNode's where
         // the value of the 'name' attribute is one of the pre-defined MetaTagName
         // EnumeratedTypes.
        
         // NOTE: This is done via a java.util.function.Predicate<String> and a lambda
         //       expression
        
         InnerTagGet .all        (html, "meta", "name", (String nameAttributeValue) ->
                                     MetaTagName.valueOf(nameAttributeValue.toLowerCase().trim()) != null)
                     .stream     ()
                     .filter     ((TagNode tn) -> tn.AV("content") != null)
                     .forEach    ((TagNode tn) -> ret.put(
                                     MetaTagName.valueOf(tn.AV("name").toLowerCase().trim()),
                                     tn.AV("content")
                                 ));
         return ret;
        
      • insertItemProp

        public static void insertItemProp​(java.util.Vector<HTMLNode> html,
                                          java.lang.String itemProp,
                                          java.lang.String contentAttributeValue)
        This does a very simple insertion of an HTML Meta-Tag for a specific type, meta-tags that have a itemprop="..." and also a content="..." attribute-value pair set.
        Parameters:
        html - This is any vectorized-html page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        itemProp - This is the property that is passed using the 'itemprop' attribute.
        contentAttributeValue - This is the value that will be used to set the 'content' attribute
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <META ITEMPROP=... CONTENT=...> tag would have to be inserted.
        QuotesException - If the 'contentAttributeValue' String uses a single-quote mark ('), anywhere in the String.
        See Also:
        metaTagItemProp, getItemProp(Vector, String), DotPair, TagNode
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
         // Builds and inserts a TagNode HTML Element that looks like:
         // <meta itemprop='INSERT-ITEMPROP-STRING-HERE' content='INSERT-CONTENT-STRING-HERE' >
        
         // Single Quotes are used, so the attribute-value may not contain single quotes.
         checkForSingleQuote(contentAttributeValue);
            
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "<META itemprop=... content=...> tag"));
        
         // Build a <META> tag, as in the comment above
         TagNode metaTN  = new TagNode(
             metaTagHTTPEquiv
                 .replace("INSERT-ITEMPROP-STRING-HERE", itemProp)
                 .replace("INSERT-CONTENT-STRING-HERE", contentAttributeValue)
         );
        
         // Insert the meta-tag into the page.  Put it at the top of the header,
         // just after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, metaTN, NEWLINE);
        
      • getItemProp

        public static java.lang.String getItemProp​
                    (java.util.Vector<HTMLNode> html,
                     java.lang.String itemProp)
        
        This method will find an HTML <META ITEMPROP=... CONTENT=...> element whose 'ITEMPROP' property has a String-value equal-to, ignoring case, the String-value of the provided String-parameter 'itemProp'. After this HTML 'META' element has been identified, the String-value of it's 'content' property parameter will be extracted and returned.

        NOTE: If the page provided does not have an HTML 'META' element with the specified property, or if such an element is identified, but the element that is found does not have a 'content' attribute, then this method shall return 'null', gracefully.

        ALSO: This method is nearly identical to method get(Vector, String) - with the sole difference between that the attribute that identifies the 'META' element we are looking for is called "ITEMPROP", instead of "NAME".

        FINALLY: Before the comparison using the 'ITEMPROP' parameter is performed, the String is trimmed using java.lang.String.trim(), and the comparison performed is case-insensitive.
        Parameters:
        html - Any vectorized HTML page, or sub-page.
        itemProp - The property-name of the <META ITEMPROP=...> tag-element.
        Returns:
        The String-value of the 'CONTENT'-attribute for a 'META'-tag whose 'ITEMPROP' attribute is equal to the specified name provided by parameter 'itemProp'. If such information is not found on the page, then this method shall return null.
        See Also:
        get(Vector, String), getHTTPEquiv(Vector, String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
         // Find the first <META ITEMPROP=... CONTENT=...> tag element where the name equals
         // the string-value provided by parameter 'itemProp'.
         TagNode tn = InnerTagGet.first
             (html, "meta", "itemprop", TextComparitor.EQ_CI, itemProp.trim());
        
         // If there are no <META ITEMPROP='itemProp' CONTENT=...> elements found on the page,
         // then this method returns null.
         if (tn == null) return null;
        
         // Return the string-value of the attribute 'content'.  Note that if this
         // attribute isn't available, this method shall return 'null', gracefully.
         return tn.AV("content");
        
      • insertHTTPEquiv

        public static void insertHTTPEquiv​(java.util.Vector<HTMLNode> html,
                                           java.lang.String httpEquiv,
                                           java.lang.String contentAttributeValue)
        This does a very simple insertion of an HTML Meta-Tag for a specific type, meta-tags that have a http-equiv="..." and also a content="..." attribute-value pair set.
        Parameters:
        html - This is any vectorized-html page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        httpEquiv - This is the property that is passed using the http-equiv attribute.
        contentAttributeValue - This is the value that will be used to set the 'content' attribute
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <META HTTP-EQUIV=... CONTENT=...> tag would have to be inserted.
        QuotesException - If the 'contentAttributeValue' String uses a single-quote mark ('), anywhere in the String.
        See Also:
        metaTagHTTPEquiv, getHTTPEquiv(Vector, String), DotPair, TagNode
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
         // Builds and inserts a TagNode HTML Element that looks like:
         // <meta http-equiv='INSERT-HTTP-EQUIV-STRING-HERE' content='INSERT-CONTENT-STRING-HERE' >
        
         // Single Quotes are used, so the attribute-value may not contain single quotes.
         checkForSingleQuote(contentAttributeValue);
            
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "<META http-equiv=... content=...> tag"));
        
         // Build a <META> tag, as in the comment above
         TagNode metaTN  = new TagNode(
             metaTagHTTPEquiv
                 .replace("INSERT-HTTP-EQUIV-STRING-HERE", httpEquiv)
                 .replace("INSERT-CONTENT-STRING-HERE", contentAttributeValue)
         );
        
         // Insert the meta-tag into the page.  Put it at the top of the header,
         // just after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, metaTN, NEWLINE);
        
      • getHTTPEquiv

        public static java.lang.String getHTTPEquiv​
                    (java.util.Vector<HTMLNode> html,
                     java.lang.String httpEquiv)
        
        This method will find an HTML <META HTTP-EQUIV=... CONTENT=...> element whose 'HTTP-EQUIV' property has a String-value equal-to, ignoring case, the String-value of the provided String-parameter 'httpEquiv'. After this HTML 'META' element has been identified, the String-value of it's 'content' property parameter will be extracted and returned.

        NOTE: If the page provided does not have an HTML 'META' element with the specified property, or if such an element is identified, but the element that is found does not have a 'content' attribute, then this method shall return 'null', gracefully.

        ALSO: This method is nearly identical to method get(Vector, String) - with the sole difference between that the attribute that identifies the 'META' element we are looking for is called "HTTP-EQUIV", instead of "NAME".

        FINALLY: Before the comparison using the 'HTTP-EQUIV' parameter is performed, the String is trimmed using java.lang.String.trim(), and the comparison performed is case-insensitive.
        Parameters:
        html - Any vectorized HTML page, or sub-page.
        httpEquiv - The property-name of the <META HTTP-EQUIV=...> tag-element.
        Returns:
        The String-value of the 'CONTENT'-attribute for a 'META'-tag whose 'HTTP-EQUIV' attribute is equal to the specified name provided by parameter 'httpEquiv'. If such information is not found on the page, then this method shall return null.
        See Also:
        get(Vector, String), getItemProp(Vector, String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
         // Find the first <META HTTP-EQUIV=... CONTENT=...> tag element where the name equals
         // the string-value provided by parameter 'httpEquiv'.
         TagNode tn = InnerTagGet.first
             (html, "meta", "http-equiv", TextComparitor.EQ_CI, httpEquiv.trim());
        
         // If there are no <META HTTP-EQUIV='httpEquiv' CONTENT=...> elements found on the
         // page, then this method returns null.
         if (tn == null) return null;
        
         // Return the string-value of the attribute 'content'.  Note that if this
         // attribute isn't available, this method shall return 'null', gracefully.
         return tn.AV("content");
        
      • insertRobots

        public static void insertRobots​(java.util.Vector<HTMLNode> html,
                                        boolean index,
                                        boolean follow)
        One common HTML <META> tag is the one which informs Google & Yahoo (and all search-engine sites) which of your pages you would like to be indexed by their search engine, and which pages you would like to not be indexed. Worrying about what Google does and does not index may seem daunting, but this meta-tag can prevent certain behaviors.

        The robots meta tag informs search engines which pages on your site should be indexed. This meta tag serves a similar purpose to robots.txt; it is generally used to prevent a search engine from indexing individual pages, while robots.txt will prevent it from indexing a whole site or section of a site.

        A robots meta tag which instructs the search engine crawler not to index a page, or follow any links on it, would be written as below.

        HTML Elements:
        1
        2
        3
         <meta name="robots" content="noindex, nofollow" />
         <meta name="robots" content="index, follow" />
         
        
        Parameters:
        html - This is any vectorized-html page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        index - This is a boolean parameter that when set to TRUE will force this method to place a "index" string into the finally-exported HTML element. If FALSE a "noindex" string will be put into the HTML element.
        follow - This is also a boolean parameter. When TRUE this will force the method to put a "follow" string into the finally-exported HTML element. When FALSE "nofollow" will be inserted.

        Relevant Code with error-checking message abbreviated:
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <meta name='robots' content='index/noindex, follow/nofollow'> would have to be inserted.
        See Also:
        robotsMetaTag, getAllRobots(Vector), getAllRobotsNOMHE(Vector), TagNode
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
         // Builds a robots meta tag.  These are used by google and search engines
         // <meta name='robots' content='INSERT-CONTENT-STRING-HERE' />
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "Robots <META ... > tag"));
        
         // Build a 'robots' TagNode
         TagNode robotsTN    = new TagNode(
             robotsMetaTag.replace(
                 "INSERT-CONTENT-STRING-HERE",
                 (index ? "index" : "noindex") + ", " + (follow ? "follow" : "nofollow")
             ));
        
         // Insert the robots-tag into the page.  Put it at the top of the header, just after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, robotsTN, NEWLINE);
        
      • insertRobots

        public static void insertRobots​(java.util.Vector<HTMLNode> html,
                                        Robots... rArr)
        This will add an HTML Meta-Tag 'robots' attribute.

        IMPORTANT NOTE: This method avoids all presumed validity check, primarily because making an attempt to identify what is absolutely correct or absolutely not-correct seems a little far-fetched. Although the number of actual values the 'robots' attribute may contain is very low, throwing a Malformed-HTML exception for some errors, and ignoring others is going to be avoided in this particular method.

        ASIDE: If a programmer were to pass both the 'Robots.follow' and the 'Robots.noFollow', both of these tags would be inserted into an HTML Meta-Tag 'robots' property/attribute element. This, clearly, would be a faulty HTML value, though.
        Parameters:
        html - This is any vectorized-html page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        rArr - This is an array of the enumerated-type 'Robots' It may contain a list of any number of the items available to add into an HTML Meta-Tag elements 'robots' attribute. If any of the array elements are null, they will be skipped and ignored.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <meta name='robots' content='...'> would have to be inserted.
        See Also:
        robotsMetaTag, getAllRobots(Vector), insertRobots(Vector, boolean, boolean), DotPair
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
         // Builds a series-of-robots meta tag.  These are used by google and search engines
         // <meta name='robots' content='INSERT-CONTENT-STRING-HERE' />
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "Robots <META ... > tag"));
        
         String robotsStr = StrCSV.toCSV(rArr, (int i, Robots r) -> r.name, false, null);
        
         // Build the <META> TagNode
         TagNode robotsTN = new TagNode
             (robotsMetaTag.replace("INSERT-CONTENT-STRING-HERE", robotsStr));
        
         // Insert the robots-tag into the page.  Put it at the top of the header, just
         // after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, robotsTN, NEWLINE);
        
      • getAllRobots

        public static java.util.Vector<RobotsgetAllRobots​
                    (java.util.Vector<? extends HTMLNode> html)
                throws MalformedHTMLException
        
        This method looks for robots HTML <META NAME='robots' ...> tag, and returns the value of the 'content=...' attribute / inner-tag.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        Returns:
        This will return a vector of the robots named or specified by the HTML Meta-Tag's present on this page.

        NOTE: Please do not be disturbed by java-streams, they are of limited use, but once a programmer is accustomed to the words above, they actually improve code-readability (once in a while!). A series of simple for-loops which eliminate-duplicates / add / sort would accomplish the same task as above.
        Throws:
        MalformedHTMLException - If any invalid robot-strings are found on the page, this method will throw an exception. The impetus behind this is to prevent accidentally ignoring newly found tags, or incorrect tags. The extraction of the robots meta tag from an HTML page can be performed manually, if throwing an exception is causing problems. The code to do this is listed in the documentation of this method.
        See Also:
        robotsMetaTag, insertRobots(Vector, boolean, boolean)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
         // Here, again, using Java Streams can be sometimes useful - primarily whenever a
         // 'filter' operation is going to be used on a Vector.  Vector.removeIf works, BUT
         // this also extracts attribute values, and the original TagNode are discarded, and
         // replaced by the the <META> attributes.
         //
         // ALSO SALIENT: the "Arrays.asList" produces an array of string, and the "::addAll"
         //               puts each separate String in each array into the TreeSet.
         // NOTE: The TreeSet also functions as a "duplicate checker" although this is also
         //       provided by Stream.distinct()
            
         // InnerTagGet.all; Returns a vector of TagNode's that resemble: <META NAME="robots" ...>
         // EQ_CI_TRM: Check the 'name' Attribute-Value using a Case-Insensitive, Equality 
         //            String-Comparison
         //            Trim the 'name' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
        
         TreeSet<String> temp = InnerTagGet
                 .all        (html, "meta", "name", TextComparitor.EQ_CI_TRM, "robots")
                 .stream     ()
                 .map        ((TagNode tn)           -> tn.AV("content"))
                 .filter     ((String contents)      -> (contents != null) && (contents.trim().length() > 0))
                 .map        ((String contents)      -> Arrays.asList(StrCSV.CSV(contents.toLowerCase())))
                 .collect    (TreeSet<String>::new, TreeSet::addAll, TreeSet::addAll);
        
         // I cannot use EXCEPTIONS and STREAMS together, there is no simple way.
         // It would be too ugly to read.
        
         Vector<Robots> ret = new Vector<>();
         for (String s : temp) ret.add(Robots.getRobot(s));
             // If an invalid robot-attribute is found, this will
             // throw a MalformedHTMLException
        
         return ret;
        
      • getAllRobotsNOMHE

        public static java.util.Vector<RobotsgetAllRobotsNOMHE​
                    (java.util.Vector<? extends HTMLNode> html)
        
        This will retrieve the robots meta-tag attribute values present on a web-page. If any of them are not in accordance with the tags listed in the enumerated-type 'Robots', this will not cause a MalformedHTMLException to throw. Instead, the result will just be eliminated and ignored. Take care that all the necessary robots tags are listed in the enumerated type, and that there are no "undefined, but necessary" robot elements to be found before using this method!
        Parameters:
        html - This is the vectorized-html webpage.
        Returns:
        A vector of all the valid robots attribute values found on the web-page.
        See Also:
        robotsMetaTag, insertRobots(Vector, boolean, boolean), TagNode.AV(String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
         // Java Streams, used here, filter out irrelevant meta tags, and also convert the
         // HTML Meta TagNode's into their their "CONTENT" Attribute String value.  The TreeSet
         // provides a duplicate check elimination and sorts the {@code String's} as well.
         //
         // ALSO SALIENT: the "Arrays.asList" produces an array of string, and the "::addAll" puts each separate
         //               String in each array into the TreeSet
         // NOTE: The 'getRobotNOMHE' suppresses a possible exception, and converts such a
         //       situation to 'null.'  The suppressed-exception is the "MalformedHTMLException"
            
         // InnerTagGet.all; Returns a vector of TagNode's that resemble:
         // <META NAME="robots" ...>
         //
         // EQ_CI_TRM: Check the 'name' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison
         //            Trim the 'name' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
        
         return InnerTagGet
                 .all        (html, "meta", "name", TextComparitor.EQ_CI_TRM, "robots")
                 .stream     ()
                 .map        ((TagNode tn)           -> tn.AV("content"))
                 .filter     ((String contents)      -> (contents != null) && (contents.trim().length() > 0))
                 .map        ((String contents)      -> Arrays.asList(StrCSV.CSV(contents.toLowerCase())))
                 .collect    (TreeSet<String>::new, TreeSet::addAll, TreeSet::addAll)
                 .stream     ()
                 .map        ((String robotParam)    -> Robots.getRobotNOMHE(robotParam))
                 .filter     ((Robots robot)         -> robot != null)
                 .collect    (Collectors.toCollection(Vector<Robots>::new));
        
      • insertDescription

        public static void insertDescription​(java.util.Vector<HTMLNode> html,
                                             java.lang.String description)
        Another common HTML <META> tag is the one that provides a brief description of the page in question. This method facilitates adding a "meta" tag that contains two attributes:

        1. inner-tag 'name' whose value must be 'description'
        2. inner-tag 'content' whose should be a brief textual description of the content of the page
        Parameters:
        html - This may be any vectorized-html webpage, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        description - This is a textual-description of the web-page to which this HTML <META name='description' content='...' tag is being added. If Google Corporation, or any of the Internet Search Sites, return your web-page as a part of a search-results, this description is usually used. Furthermore, the key-words that are listed here are some-how (in a way that is not-knownst to this programmer) used in indexing your particular page in the search-algorithms.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <meta name='description' content='... description_as_string ...'> would have to be inserted.
        QuotesException - If the description-string uses a single-quote mark, anywhere in the String.
        See Also:
        descriptionMetaTag, hasDescription(Vector), Features.checkForSingleQuote(String), TagNode
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
         // Meta-Tag for Descriptions.  This will be inserted into the HTML page.
         // <meta name='description' content='INSERT-DESCRIPTION-OR-KEYWORDS-HERE'>
        
         checkForSingleQuote(description);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "Description <META ... > tag"));
        
         // Build the Meta Tag for a description to google and search engines
         TagNode metaTN = new TagNode
             (descriptionMetaTag.replace("INSERT-DESCRIPTION-OR-KEYWORDS-HERE", description));
        
         // Insert the description-tag into the page.  Put it at the top of the header,
         // just after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, metaTN, NEWLINE);
        
      • hasDescription

        public static java.lang.String hasDescription​
                    (java.util.Vector<? extends HTMLNode> html)
                throws MalformedHTMLException
        
        This method attempts to retrieve the content-description of an HTML page. If no meta-tag defining the content of the page is found, then method shall return null. If a partial meta-tag is found, but is incomplete, then a 'MalformedHTMLException' will be thrown.
        Parameters:
        html - Any vectorized-html web-page.
        Returns:
        The content-description that has been extracted from the html meta-tag <META NAME="description" CONTENT="the-description">. If this tag is not found, then null is returned. If this tag is found, but does not posses a content-attribute, a MalformedHTMLException is thrown.
        Throws:
        MalformedHTMLException - This is thrown if there are multiple definitions of the robots meta-tag. There ought to only be a single definition, and if multiple are found, it would be better to identify why, and do the data-extraction manually. This is en-lieu of randomly picking one of them, and returning (randomly) that content attribute-value.
        This exception will also be thrown if proper-values for 'index' or 'follow' are not found in the 'content' attribute of this html meta-tag.

        Probably, these are an unlikely occurrences. However, checking for the tag, the issues just happen. The Malformed-HTML exception is actually a 'checked' exception and must have a try-catch block associated with it, or declared thrown in your method-declaration.
        See Also:
        descriptionMetaTag, insertDescription(Vector, String), InnerTagGet
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
         // InnerTagGet.all; Returns a vector of TagNode's that resemble:
         // <META NAME="description" ...>
         //
         // EQ_CI_TRM: Check the 'name' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison
         //            Trim the 'name' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
         Vector<TagNode> v = InnerTagGet.all
             (html, "meta", "name", TextComparitor.EQ_CI_TRM, "description");
        
         if (v.size() == 0) return null;
        
         if (v.size() > 1) throw new MalformedHTMLException(
             "You have asked for the value of the HTML 'description' <META ...> tag, but " +
             "unfortunately there were multiple instances of this tag on your page.  " +
             "This is poorly formatted HTML, and not allowed here."
         );
        
         String s = v.elementAt(0).AV("content");
        
         if (s == null) throw new MalformedHTMLException(   
             "An HTML meta-tag was found with an attribute 'name' whose value was " +
             "'description,' but unfortunately this meta-tag did not posses attribute 'content'"
         );
                
         return s;
        
      • insertUTF8MetaTag

        public static void insertUTF8MetaTag​(java.util.Vector<HTMLNode> html)
        The method will insert a UTF-8 Meta Tag that identifies the HTML page to any web-browser that attempts to render its content, that the page contains textual data that utilizes/makes-use-of characters in a higher 'byte-range' than the traditional single-byte (256 different-characters) ASCII character-set. UTF-8 allows for Chinese, Japanese and just about every variant of language in Asia, Europe, and most of the world.
        Parameters:
        html - This may be any html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <META http-equiv='Content-Type' content='text/html; charset=utf-8' /> would have to be inserted.
        See Also:
        hasUTF8MetaTag(Vector), UTF8MetaTag, TagNode, DotPair
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
         // Meta-Tag to assert that the UTF-8 Charset is being used:
         // <meta http-equiv='Content-Type' content='text/html; charset=utf-8' />
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "UTF-8 <META> tag"));
        
         // Insert the UTF-8 tag into the page.  Put it at the top of the header, just
         // after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, new TagNode(UTF8MetaTag), NEWLINE);
        
      • hasUTF8MetaTag

        public static boolean hasUTF8MetaTag​
                    (java.util.Vector<? extends HTMLNode> html)
        
        This will detect whether a UTF-8 HTML <META ...> tag is included in this page.

        HTML Elements:
        1
        2
        3
         <meta http-equiv="content-type" content="text/html; charset=UTF-8">
         <meta charset="UTF-8">
         
        
        Parameters:
        html - This may be any vectorized-html web-page.
        Returns:
        TRUE If an appropriate HTML Meta tag identifying this page as a UTF-8 character-set web-site. will FALSE otherwise.
        See Also:
        hasUTF8MetaTag(Vector), UTF8MetaTag, StrCmpr.containsAND_CI(String, String[]), TagNode.AV(String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
         String s;
        
         // InnerTagGet.all: Returns a vector of TagNode's that resemble:
         // <META http-equiv="content-type" ...>
         //
         // EQ_CI_TRM: Check the 'http-equiv' Attribute-Value using a Case-Insensitive, 
         //            Equality String-Comparison
         //            Trim the 'http-equiv' Attribute-Value String of possible leading & 
         //            trailing White-Space before performing the comparison.
        
         Vector<TagNode> v = InnerTagGet.all
             (html, "meta", "http-equiv", TextComparitor.EQ_CI_TRM, "content-type");
        
         for (TagNode tn : v)
             if ((s = tn.AV("content")) != null)
                 if (StrCmpr.containsAND_CI(s, "charset", "utf-8"))
                     return true;
        
         // InnerTagGet.aall retrieves all TagNode's that resemble <META charset="utf-8" ...>
         // EQ_CI_TRM: Equality-Test, Case-Insensitive, Trim any White-Space before 
         // performing comparison.
        
         v = InnerTagGet.all(html, "meta", "charset", TextComparitor.EQ_CI_TRM, "utf-8");
         for (TagNode tn : v)
             if ((s = tn.AV("charset")) != null)
                 if (StrCmpr.containsAND_CI(s, "utf-8"))
                     return true;
        
         return false;
        
      • insertOGMetaTag

        public static void insertOGMetaTag​(java.util.Vector<HTMLNode> html,
                                           java.lang.String ogProperty,
                                           java.lang.String ogValueAsStr)
        This will insert a single open-graph meta tag into an HTML page.

        IMPORTANT: The name of the property MUST NOT begin with the characters "og:", because they will be prepended when the HTML <META PROPERTY='...' CONTENT='...' /> element is instantiated. Please see exact method body below.
        Parameters:
        html - This may be any html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        ogProperty - This is the name of the open-graph protocol property that is being inserted. Generally these are simple text-String's with alphanumeric-limited names, or they are series of alphanumeric text-String's, separated by a period '.' character.
        ogValueAsStr - If you look at the definition of the ogMetaTag above in this class, you may view all of the acceptable types that open-graph properties may use. Whichever property or field that is being inserted, mostly, the field must have been converted to a string when passed to this method.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <meta property='og:INSERT-OG-PROPERTY-HERE' content='INSERT-OG-VALUE-HERE'> would have to be inserted.
        QuotesException - If the 'ogProperty' or 'ogValueAsStr' strings uses a single-quote mark, anywhere inside the String's.
        See Also:
        openGraphMetaTag, getAllOGMetaTags(Vector), Features.checkForSingleQuote(String), TagNode
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
         // Open graph tag looks like this:
         // <meta property='og:INSERT-OG-PROPERTY-HERE' content='INSERT-OG-VALUE-HERE' />
        
         checkForSingleQuote(ogProperty);
         checkForSingleQuote(ogValueAsStr);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException(
             NO_HEADER_MESSAGE.replace
                 ("INSERT-STR", "Open-Graph <META nanme='og:...' ...> tag")
         );
        
         // Build the Open-Graph Meta Tag
         TagNode metaTN = new TagNode(
             openGraphMetaTag
                 .replace("INSERT-OG-PROPERTY-HERE", ogProperty)
                 .replace("INSERT-OG-VALUE-HERE", ogValueAsStr)
         );
        
         // Insert the tag into the page.  Put it at the top of the header, just after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, metaTN, NEWLINE);
        
      • getAllOGMetaTags

        public static java.util.Properties getAllOGMetaTags​
                    (java.util.Vector<? extends HTMLNode> html)
        
        This will search any vectorized-html page for <META property='og:...' content='...'> tags, and retrieve them to place inside a java.util.Properties table.
        Parameters:
        html - The vectorized-html web-page.
        Returns:
        This will return a java 'Properties' object, with all Open-Graph properties saved inside.
        See Also:
        openGraphMetaTag, insertOGMetaTag(Vector, String, String), TagNode.AV(String), InnerTagGet
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
         // InnerTagGet.all: Returns a vector of TagNode's that resemble:
         // <META property="og:..." ...>
         //
         // SW_CI_TRM: Check the 'property' Attribute-Value using a Case-Insensitive,
         //            'Starts-With' String-Comparison
         //            Trim the 'property' Attribute-Value String of possible leading & 
         //            trailing White-Space before performing the comparison.
        
         Vector<TagNode> v   = InnerTagGet.all
                                 (html, "meta", "property", TextComparitor.SW_CI_TRM, "og:");
        
         Properties      ret = new Properties();
        
         for (TagNode tn : v)
             ret.put(
                 tn.AV("property").substring(3),
                 tn.AV("content")
             );
        
         return ret;
        
      • insertKeyWords

        public static void insertKeyWords​(java.util.Vector<HTMLNode> html,
                                          java.lang.String... keyWords)
        This will attempt to insert key-words into an HTML meta tag. This is usually used to summarize-explain 'main-points' that a web-page author wants to make to any search-engineer or any-listener on the internet about the web-page that includes such a meta-tag.

        IMPORTANT: This method does a few minor validity checks regarding the content inside of a description keyword. All it does is look for things like "white-space" and some punctuation. If either of these is found inside any of the key-words that are provided to the 'String... keyWords' parameter, then an IllegalArgumentException will be thrown.

        NOTE: This list of disallowed punctuation marks inside key-words consists of:
        1
        2
        3
         if (StrCmpr.containsOR(keyWord, ";", ",", "'", "\"", "!", "#", "<", ">", "(", ")", "*", "/", "\\"))
              throw new IllegalArgumentException(...);
         
        
        Parameters:
        html - This may be any html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        keyWords - This is a list of germane key-words that help identify, indicate or describe the content of the web-page in which they are placed.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <meta name='keywords' content='INSERT-COMMA-SEPARATED-KEYWORDS-HERE'> would have to be inserted.
        java.lang.IllegalArgumentException - If any of the key-words provided to the java var-args key-words parameter contain invalid punctuation characters, or white-space.
        See Also:
        keyWordsMetaTag, getAllKeyWords(Vector), StringParse.hasWhiteSpace(String), StrCmpr.containsOR(String, String[]), StrCSV.toCSV(String[], boolean, boolean, Integer)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
         // The meta-tag for key-words.  Search Engines look for these key-words when indexing
         // <meta name='keywords' content='INSERT-COMMA-SEPARATED-KEYWORDS-HERE'>
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "key-words meta-tag"));
        
         for (String keyWord : keyWords) if (StringParse.hasWhiteSpace(keyWord)) 
             throw new IllegalArgumentException(
                 "You have tried to insert keywords into an HTML meta 'key-word' property, " +
                 "but unfortunately one of the words provided [" + keyWord + "] contains " +
                 "white-space.  This is not allowed here."
             );
        
         for (String keyWord : keyWords)
             if (StrCmpr.containsOR(keyWord, ";", ",", "'", "\"", "!", "<", ">", "(", ")", "*", "/", "\\"))
                 throw new IllegalArgumentException(
                     "You have tried to insert keywords into an HTML meta 'key-word' " +
                     "property, but unfortunately one of the words provide [" + keyWord + "] " +
                     "contains error-prone punctuation, and cannot be used here."
                 );
        
         // All this does is build a list - Comma Separated values.
         String listAsString = StrCSV.toCSV(keyWords, true, false, null);
        
         // Build the TagNode, it will contain all key-words listed in the input var-args String array
         TagNode metaTN = new TagNode
             (keyWordsMetaTag.replace("INSERT-COMMA-SEPARATED-KEYWORDS-HERE", listAsString));
        
         // Insert the tag into the page.  Put it at the top of the header, just after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, metaTN, NEWLINE);
        
      • getAllKeyWords

        public static java.lang.String[] getAllKeyWords​
                    (java.util.Vector<? extends HTMLNode> html)
        
        This method will extract any / all HTML <META name="keywords" ...> meta-tags, and extract the relevant page key-words. These key-words will be returned as a java-string vector.
        Parameters:
        html - This is a vectorized-html web-page. It will be searched for key-word meta-tags.
        Returns:
        The list of words that were stored in the 'keywords' HTML <META ...> tags. If there were no keywords in any keyword meta-tags, then an empty java vector will be returned.

        NOTE: If the code below looks really complicated, Java's 'streams' package does have a tendency to make simple, **simple**, things look difficult, but once the meaning of the words "collect," "stream," "map," and "filter" are committed to memory, this starts too look like "Java+" or something like that. This is just a bunch of for-loops that:

        • Get all HTML <META name="keywords" content="..."> elements
        • Extracts the 'content' attribute, and particularly the value stored in the content attribute
        • Removes blanks, and nulls
        • Converts a String[] to List<String>
        • Collects all the List into a single java String-Array
        See Also:
        insertKeyWords(Vector, String[]), keyWordsMetaTag, TagNode, TagNode.AV(String), StrCSV.CSV(String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
         // Java Streams here both filter irrelevant meta tags, and also convert the type from
         // TagNode to String... using the 'map' function.  Ultimately, those strings are 'collected' into
         // the returned vector.
         // ALSO SALIENT: the "Arrays.asList" produces an array of string, and the "::addAll" puts each separate
         //               String into the returned Vector.
        
         // InnerTagGet.all: Returns a vector of TagNode's that resemble: <META name="keywords" ...>
         // EQ_CI_TRM: Check the 'name' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison
         //            Trim the 'name' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
        
         return InnerTagGet.all(html, "meta", "name", TextComparitor.EQ_CI_TRM, "keywords")
                 .stream     ()
                 .map        ((TagNode tn)           -> tn.AV("content"))
                 .filter     ((String contents)      -> (contents != null) && (contents.trim().length() > 0))
                 .map        ((String contents)      -> Arrays.asList(StrCSV.CSV(contents)))
                 .collect    (Vector::new, Vector::addAll, Vector::addAll)
                 .stream     ()
                 .toArray    (String[]::new);
        
      • insertAuthor

        public static void insertAuthor​(java.util.Vector<HTMLNode> html,
                                        java.lang.String author)
        This method will insert an "author" HTML meta-element into the <HEAD> ... </HEAD> section of this page.
        Parameters:
        html - This is any java vectorized-html web-page.
        author - This is the author of this web-page.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <meta name='author' content='INSERT-AUTHOR-NAME-HERE'> would have to be inserted.
        QuotesException - If the author's name prevents the html-engine from building any version of an author meta-tag. This will happen, certainly, if the author's name string contains both a single and a double quote. Choose either the single-quote, or the double. Do not use both, or this exception will be thrown.

        MOST IMPORTANT Most author's names don't have any quotes at all! Checking for these things prevents unexplainable exceptions later on.
        See Also:
        authorMetaTag, hasAuthor(Vector), SD, DotPair
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
         // The 'Author' Meta tag shall be inserted into the html page.
         // <meta name='author' content='INSERT-AUTHOR-NAME-HERE'>
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "author meta-tag"));
        
         if ((author.indexOf("'") != -1) && (author.indexOf("\"") != -1)) throw new QuotesException(
             "The author string provided here contains both a single-quote and a double-quote, " +
             "but this cannot be inserted into any HTML element.  Please remove one or the other."
         );
        
         // Use the more complicated TagNode constructor to build the "author" tag.
         SD          quote   = (author.indexOf("'") == -1) ? SD.SingleQuotes : SD.DoubleQuotes;
         Properties  p       = new Properties();
        
         p.put("name", "author");
         p.put("content", author);
        
         // This constructor accepts a properties instance.
         TagNode authorTN = new TagNode("meta", p, quote, true);
        
         // Insert the tag into the page.  Put it at the top of the header, just after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, authorTN, NEWLINE);
        
      • hasAuthor

        public static final java.lang.String hasAuthor​
                    (java.util.Vector<? extends HTMLNode> html)
                throws MalformedHTMLException
        
        This helps identify web-sites or web-pages "author-names" to web-indexing and web-search organization.
        Parameters:
        html - This is a vectorized-html webpage. It will be searched for an author's name meta-tag.
        Returns:
        This returns the author's name of a web-page, as delineated in the 'author meta-tag', or 'null' if the passed web-page parameter does not have an author meta tag.
        Throws:
        MalformedHTMLException - If multiple 'author meta-tag' elements are found, this method is forced to throw an exception. It is necessary to avoid "picking a favorite author among a list." HTML does not provide an exact requirements, so if there is such a scenario, throwing an exception is easier than, instead, for-example returning a string-array or vector-of-string, which could also work as an alternative. If this method throws such an exception, it is better to know about the situation, and perform the search again. The code for this method is listed here in these documentation files.
        See Also:
        insertAuthor(Vector, String), authorMetaTag, TagNode.AV(String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
         // InnerTagGet.all: Returns a vector of TagNode's that resemble:
         // <META name="author" ...>
         // EQ_CI_TRM: Check the 'name' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison
         //            Trim the 'name' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
         Vector<TagNode> v = InnerTagGet.all
             (html, "meta", "name", TextComparitor.EQ_CI_TRM, "author");
        
         if (v.size() > 1) throw new MalformedHTMLException(
             "This method has identified multiple author meta-tags.  To handle this " +
             "situation, the search should be performed manually using InnerTagGet, with " +
             "your code deciding what to do about the HTML web-page having multiple 'author' " +
             "meta-tags."
         );
        
         // No HTML TagNode's were found that resembled <META NAME='author' ...>
         if (v.size() == 0) return null;
        
         // Just return the first one that was found, always check for 'null' first to
         // avoid the embarrassing NullPointerException.
         String author = v.elementAt(0).AV("content");
         if (author == null) return null;
         return author.trim();