Package Torello.HTML

Class Features


  • public class Features
    extends java.lang.Object
    Features - Documentation.

    This class handles some of the extremely common features found in HTML web-pages. The collection of capabilities listed here are sometimes referred to as "SEO" or "Search Engine Optimization." Generally, the features do not actually work as well as Search Engine Companies would like you to believe. Sure, there are tags SEO tags for companies that cater to very specialized, and very "niche markets." If you have a website that sells cup-cakes in Dallas, and you specialize in cup-cakes, your SEO settings will probably work all-right - probably!

    If you have decided to write a Java-Based HTML Search Engine, and would like your Java Libraries to be ranked at the top of search-engine requests any-time a user types the words "Java and HTML" into a browser, there is not a lot SEO will be able to do for your - not even using the features in this "Features" class!

    Static (Functional) API: The methods in this class are all (100%) defined with the Java Key-Word / Key-Concept 'static'. Furthermore, there is no way to obtain an instance of this class, because there are no public (nor private) constructors. Java's Spring-Boot, MVC feature is *not* utilized because it flies directly in the face of the light-weight data-classes philosophy. This has many advantages over the rather ornate Component Annotations (@Component, @Service, @AutoWired, etc... 'Java Beans') syntax:

    • The methods here use the key-word 'static' which means (by implication) that there is no internal-state. Without any 'internal state' there is no need for constructors in the first place! (This is often the complaint by MVC Programmers).
    • A 'Static' (Functional-Programming) API expects to use fewer data-classes, and light-weight data-classes, making it easier to understand and to program.
    • The Vectorized HTML data-model allows more user-control over HTML parse, search, update & scrape. Also, memory management, memory leakage, and the Java Garbage Collector ought to be intelligible through the 'reuse' of the standard JDK class Vector for storing HTML Web-Page data.

    The power that object-oriented programming extends to a user is (mostly) limited to data-representation. Thinking of "Services" as "Objects" (Spring-MVC, 'Java Beans') is somewhat 'over-applying' the Object Oriented Programming Model. Like most classes in the Java-HTML JAR Library, this class backtracks to a more C-Styled Functional Programming Model (no Objects) - by re-using (quite profusely) the key-word static with all of its methods, and by sticking to Java's well-understood class Vector

    Static Fields: The methods in this class do not create any internal state that is maintained - however there are a few private & static fields defined. These fields are instantiated only once during the Class Loader phase (and only if this class shall be used), and serve as data 'lookup' fields (static constants). View this class' source-code in the link provided below to see internally used data.

    The internally defined fields include five HTML Header Tag's. All five are typed as java.lang.String finals / constants. There is also an error-message String-constant for consistent exception reporting and a newline '\n' TextNode, so as to avoid re-instantiating the new-line character over and over.



    • Field Detail

      • NO_HEADER_MESSAGE

        public static final java.lang.String NO_HEADER_MESSAGE
        See Also:
        Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        2
        3
        4
        public static final String NO_HEADER_MESSAGE =
                "You are attempting to insert an HTML INSERT-STR, but such an element belongs in the " +
                "page's header.  Unfortunately, the page or sub-page you have passed does not have a " +
                "<HEAD>...</HEAD> sub-section.  Therefore, there is no place to insert the elements.";
        
      • favicon

        public static final java.lang.String favicon
        This String may be inserted in the HTML <HEAD> ... </HEAD> section to add a "logo-image" at the top-left corner of the web-browser's tab for the page when it loads. Some people call this a 'favicon'. I mean I don't usually call it that, but I guess here it's going to be a 'favicon.'
        See Also:
        insertFavicon(Vector, String), hasFavicon(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        2
        public static final String favicon =
                "<link rel='icon' type='image/INSERT-IMAGE-TYPE-HERE' href='INSERT-URL-STRING-HERE' />";
        
      • cssExternalSheet

        public static final java.lang.String cssExternalSheet
        This String may be inserted in the HTML <HEAD> ... </HEAD> section to add a Cascading Style Sheet (a ".css" file) to your page. The web-browser that ultimately loads the HTML that you are exporting will render the style elements across all the HTML elements in your page that match the CSS selectors. Without going into a diatribe about how CSS works, instead, the String that is ultimately instantiated as a TagNode is provided here.
        See Also:
        insertCSSLink(Vector, String), getAllCSSLinks(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        2
        public static final String cssExternalSheet =
                "<link rel='stylesheet' type='text/css' href='INSERT-URL-STRING-HERE' />";
        
      • javaScriptExternalPage

        public static final java.lang.String javaScriptExternalPage
        This String may be inserted in the HTML <HEAD> ... </HEAD> section to add a Java-Script '.js' file. The web-browser will download this Java-Script page from the URL that you ultimately provide and load all variable definitions, and dispatch to any methods that are invoked by the event-handlers when user or operating system events are fired.

        IMPORTANT NOTE: Inserting an external java-script page has one important difference vis-a-vis inserting an external CSS page. Inserting a link to a '.js' page requires both the opened and the closed HTML <SCRIPT ..></SCRIPT> tag-elements. This is expected and required even-when / especially-when there is no actual java-script code being placed on the '.html' page itself. Effectively, regardless of whether you are putting actual java-script code on your HTML page, or just inserting a link knowing the browser will download the external '.js' file for you, you still must create an both the open and the closed HTML <SCRIPT SRC='...'></SCRIPT> elements and insert them into your vectorized-html web-page.

        HTML Elements:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
         <!-- This is a short note about including the HTML SCRIPT element in your web-pages. -->
         <HTML>
         <HEAD>
         <!-- Version #1 Inserting a java-script 'variables & functions' external-page -->
         <SCRIPT TYPE='text/javascript' SRC='/script/javaScriptFiles/functions.js'>
         </SCRIPT>
         <!-- Right here (line above) we always need the closing Script-tag, even when there is no actual java-script
         present, and the methods/variables are going to be downloaded from the java-script file identified in
         by the SRC="..." attribute! --> 
        
         <SCRIPT TYPE='text/javascript'>
         var someVar1;
         var someVar2;
         function someFunction()
         { return;    }
         </SCRIPT> <!-- Either way, the closing-script tag is expected. -->
         
        
        See Also:
        insertExternalJavaScriptLink(Vector, String), getAllExternalJSLinks(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        2
        public static final String javaScriptExternalPage =
                "<script type='text/javascript' src='INSERT-URL-STRING-HERE'>";
        
      • canonicalTag

        public static final java.lang.String canonicalTag
        If you have pages on your site that are almost identical, then you may need to inform search engines which one to prioritize. Or you might have syndicated content on your site which was republished elsewhere. You can do both of these things without incurring a duplicate content penalty – as long as you use a canonical tag.

        Instead of confusing Google and missing your ranking on the SERP's, you are guiding the crawlers as to which URL counts as the “main” one. This places the emphasis on the right URL and prevents the others from cannibalizing your SEO.

        Use canonical tags to avoid having problems with duplicate content that may affect your rankings.

        NOTE: Content of this java-documentation description was copied from a page on web-domain 'http://searchenginewatch.com'. It was lifted on May 24th, 2019. See link below, if still valid:
        https://searchenginewatch.com/2018/04/04/a-quick-and-easy-guide-to-meta-tags-in-seo/
        See Also:
        insertCanonicalURL(Vector, String), hasCanonicalURL(Vector), Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        2
        public static final String canonicalTag = 
                "<link rel='canonical' href='INSERT-URL-STRING-HERE' />";
        
      • NEWLINE

        protected static final TextNode NEWLINE
        This is a new-line HTMLNode
        Code:
        Exact Field Declaration Expression:
        1
        protected static final TextNode NEWLINE = new TextNode("\n");
        
    • Method Detail

      • checkForSingleQuote

        protected static void checkForSingleQuote​(java.lang.String s)
        This method checks whether the parameter string contains a single-quotations punctuation-mark anywhere in the String. If so, an exception is thrown. This is generally an internal-helper method.
        Parameters:
        s - This is any java-string, but generally it is one used to insert into an HTML 'content' attribute.
        Throws:
        QuotesException - If the passed parameter string contains any instance of single-quotation.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
        7
         int pos;
        
         if ((pos = s.indexOf("'")) != -1) throw new QuotesException(
             "The passed string-parameter may not contain a single-quote punctuation mark.  " +
             "Yours was: [" + s + "], and has a single-quotation mark at string-position " +
             "[" + pos + "]"
         );
        
      • insertFavicon

        public static void insertFavicon​(java.util.Vector<HTMLNode> html,
                                         java.lang.String imageURLAsString)
        This inserts a favicon HTML link element into the right location so that a particular web-page will render an "browser icon image" in the page's browser-tab left corner when the page loads into a browser.
        Parameters:
        html - The vectorized-html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        imageURLAsString - This is the String that will be copied into the public static final String 'favicon' and converted into an HTML 'TagNode' It will then be inserted as the first element of the html page header.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <link rel='icon' href='image_url'> would have to be inserted.
        QuotesException - If the image URL uses a single-quote mark, anywhere in the URL-string.
        See Also:
        favicon, checkForSingleQuote(String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
         // Insert the Favicon <LINK ...> element into the <HEAD> section of the input html page.
         // <link rel='icon' type='image/INSERT-IMAGE-TYPE-HERE' href='INSERT-URL-STRING-HERE' />
        
         checkForSingleQuote(imageURLAsString);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "favicon <LINK> element"));
        
         // Build a new Favicon TagNode.
         TagNode faviconTN = new TagNode(
             favicon
                 .replace("INSERT-URL-STRING-HERE", imageURLAsString)
                 .replace("INSERT-IMAGE-TYPE-HERE", IF.getGuess(imageURLAsString).extension)
         );
        
         // Insert the Favicon into the page.  Put it at the top of the header, just after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, faviconTN, NEWLINE);
        
      • hasFavicon

        public static java.lang.String hasFavicon​
                    (java.util.Vector<? extends HTMLNode> html)
        
        This method will search for an HTML <LINK REL="icon" ...> element, specifically expecting the link element to contain an inner-tag / attribute name 'REL' whose value is 'icon'. If it finds one, it will return the value of the other attribute named 'HREF=...'.
        Parameters:
        html - Any html page, but preferably one that contains a <LINK REL="icon" ...> element.
        Returns:
        This method will return the String value of the 'HREF=...' attribute found inside the '<LINK>' element, if this page or sub-page has such an element, with such an attribute. If there are no LINK elements found on this page, then 'null' will be returned.

        NOTE: In the event that multiple copies of the HTML <LINK> element are found, and more than one has a 'REL' attribute whose value contains the String 'icon', this method will just return the first value it finds.
        See Also:
        InnerTagGet, favicon, TagNode.AV(String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
         // InnerTagGet.all: Returns a vector of TagNode's that resemble: <LINK rel="icon" ...>
         // EQ_CI_TRM: Check the 'rel' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison.
         //            Trim the 'rel' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
         Vector<TagNode> list = InnerTagGet.all
             (html, "link", "rel", TextComparitor.EQ_CI_TRM, "icon");
        
         // If there were no HTML "<LINK ...>" elements with REL='ICON' attributes, then
         // there was no favicon.
         if (list.size() == 0) return null;
        
         // Just in case there were multiple favicon <LINK ...> tags, just return the first
         // one found.  Inside of a <LINK REL="icon" HREF="..."> the 'HREF' Attribute contains
         // the Image-URL.  Use TagNode.AV("HREF") to retrieve that image url.
         String s;
         for (TagNode tn : list) if ((s = tn.AV("href")) != null) return s;
        
         // If for some reason, none of these <LINK REL='ICON' ...> elements had an "HREF" 
         // attribute, then just return null.
         return null;
        
      • insertCSSLink

        public static void insertCSSLink​
                    (java.util.Vector<HTMLNode> html,
                     java.lang.String externalCSSFileURLAsString)
        
        This inserts an HTML '<LINK ...>' element into the right location for linking an externally-defined Cascading Style Sheet '.css' page.
        Parameters:
        html - Any vectorized-html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        externalCSSFileURLAsString - This is the String that will be copied into the public static final String 'cssExternalSheet' and converted into an HTML 'TagNode' It will then be inserted as the first element of the html page header.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <link rel='stylesheet' type='text/css' href='local-url/someFile.css' /> would have to be inserted.
        QuotesException - If the CSS-sheet URL uses a single-quote mark, anywhere in the URL-string.
        See Also:
        cssExternalSheet, cssExternalSheetWithMediaAttribute, insertCSSLink(Vector, String, String), getAllCSSLinks(Vector), checkForSingleQuote(String), DotPair, TagNode
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
         // Inserts an external CSS Link into the <HEAD> section of this html page vector
         // <link rel='stylesheet' type='text/css' href='INSERT-URL-STRING-HERE' />
        
         checkForSingleQuote(externalCSSFileURLAsString);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException(
             NO_HEADER_MESSAGE.replace
                 ("INSERT-STR", "externally-linked CSS page <LINK> element")
         );
        
         TagNode cssTN   = new TagNode
             (cssExternalSheet.replace("INSERT-URL-STRING-HERE", externalCSSFileURLAsString));
        
         // Insert the Style-Sheet link into the page.  Put it at the top of the header,
         // just after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, cssTN, NEWLINE);
        
      • insertCSSLink

        public static void insertCSSLink​
                    (java.util.Vector<HTMLNode> html,
                     java.lang.String externalCSSFileURLAsString,
                     java.lang.String mediaInnerTagValue)
        
        This inserts an HTML '<LINK ...>' element into the right location for linking an externally-defined Cascading Style Sheet '.css' page.
        Parameters:
        html - Any vectorized-html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        externalCSSFileURLAsString - This is the String that will be copied into the public static final String 'cssExternalSheet' and converted into an HTML 'TagNode' It will then be inserted as the first element of the html page header.
        mediaInnerTagValue - Externally linked CSS pages, which are included using the HTML <LINK ...> element may explicitly request a 'media' attribute be inserted into the link element. That media attribute may take one of five values. The media attribute in a link tag specifies when the CSS rules are to be applied.

        Here are the most common values for attribute 'media,' below:

        Attribute ValueIntended CSS Meaning
        screenindicates for use on a computer screen.
        projectionfor projected presentations.
        handheldfor handheld devices (typically with small screens).
        printto style printed web pages.
        all(default value) This is what most people choose. You can leave off the media attribute completely if you want your styles to be applied for all media types.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <link ... type='text/css' rel="stylesheet" href="local-url/someFile.css" media="some-media"> node would have to be inserted.
        QuotesException - If the CSS-sheet URL, or the media-tag, use a single-quote mark, anywhere inside the String's.
        See Also:
        cssExternalSheet, cssExternalSheetWithMediaAttribute, insertCSSLink(Vector, String), getAllCSSLinks(Vector), checkForSingleQuote(String), DotPair, TagNode
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
         // Inserts an external CSS Link (with 'media' attribute) into the <HEAD> section of
         // this html page vector 
         /// <link rel='stylesheet' type='text/css' href='INSERT-URL-STRING-HERE' media='INSERT-MEDIA-ATTRIBUTE-VALUE-HERE' />
        
         checkForSingleQuote(externalCSSFileURLAsString);
         checkForSingleQuote(mediaInnerTagValue);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException(
             NO_HEADER_MESSAGE.replace
                 ("INSERT-STR", "externally-linked CSS page <LINK> element")
         );
        
         // Build the TagNode
         TagNode cssTN   = new TagNode(
             cssExternalSheetWithMediaAttribute
                 .replace("INSERT-URL-STRING-HERE", externalCSSFileURLAsString)
                 .replace("INSERT-MEDIA-ATTRIBUTE-VALUE-HERE", mediaInnerTagValue)
         );
        
         // Insert the Style-Sheet link into the page.  Put it at the top of the header, just
         // after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, cssTN, NEWLINE);
        
      • getAllCSSLinks

        public static java.util.Vector<TagNodegetAllCSSLinks​
                    (java.util.Vector<? extends HTMLNode> html)
        
        This will retrieve all linked CSS pages from a vectorized-html web-page.
        Parameters:
        html - This may be any vectorized-html web-page.
        Returns:
        This will return the links as a list of TagNode
        See Also:
        insertCSSLink(Vector, String), insertCSSLink(Vector, String, String), InnerTagGet
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
        7
         // InnerTagGet.all: Returns a vector of TagNode's that resemble: 
         //                  <LINK rel="stylesheet" ...>
         // EQ_CI_TRM: Check the 'rel' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison
         //            Trim the 'rel' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
         return InnerTagGet.all(html, "link", "rel", TextComparitor.EQ_CI_TRM, "stylesheet");
        
      • insertExternalJavaScriptLink

        public static void insertExternalJavaScriptLink​
                    (java.util.Vector<HTMLNode> html,
                     java.lang.String externalJSFileURLAsString)
        
        This inserts an HTML '<LINK ...>' element into the right location for linking an externally-defined java-script '.js' file-page.
        Parameters:
        html - Any vectorized-html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        externalJSFileURLAsString - This is the String that will be copied into the public static final String 'javaScriptExternalPage' and converted into an HTML 'TagNode' It will then be inserted as the first element of the html page header.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <SCRIPT ... SRC="local-url/someScriptFile.js"></SCRIPT> nodes would have to be inserted.
        QuotesException - If the java-script page-URL uses a single-quote mark, anywhere in the url-string.
        See Also:
        javaScriptExternalPage, getAllExternalJSLinks(Vector), checkForSingleQuote(String), TagNode, TextNode, DotPair, HTMLTags.hasTag(String, TC)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
         // Builds an external Java-Script link, and inserts it into the header portion of
         // this html page.
         // <script type='text/javascript' src='INSERT-URL-STRING-HERE'>
        
         checkForSingleQuote(externalJSFileURLAsString);
        
         // Build an HTML <SCRIPT ...> node, and a </SCRIPT> node.
         HTMLNode n = new TagNode(javaScriptExternalPage.replace
                         ("INSERT-URL-STRING-HERE", externalJSFileURLAsString));
        
         HTMLNode closeN = HTMLTags.hasTag("script", TC.ClosingTags);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException(
             NO_HEADER_MESSAGE.replace(
                 "INSERT-STR", "externally-linked Java-Script <SCRIPT> ... </SCRIPT> elements")
         );
        
         // Insert the Java-Script link into the page.  Put it at the top of the header, just
         // after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, n, closeN, NEWLINE);
        
      • getAllExternalJSLinks

        public static java.lang.String[] getAllExternalJSLinks​
                    (java.util.Vector<? extends HTMLNode> html)
        
        First, inserting java-script directly onto an HTML page and including an external link to a '.js' file page are extremely similar tasks. The construct is simply: <SCRIPT TYPE='text/javascript'> ... </SCRIPT> either way! When the actual functions and methods are pasted into the HTML page directly, they are pasted exactly where the ellipses '...' are listed in the HTML code noted previously. When a link is made to an external page from the same server directory... (See: 'linking external java-script pages from the same host' at the Google Search Bar), when linking pages - both the open and close <SCRIPT> ... </SCRIPT> tag-elements must be included, while the text-content in place of the ellipses '...' should just be left blank. The URL to the java-script page is included in the src='js_file_url' inner-tag value.

        This page will retrieve any and all script nodes that meet these criteria:

        1. The "script body" must be empty, meaning there is no java-script between the open and close script-tags
        2. The src='' attribute must contain some non-null, non-zero-length value
        Parameters:
        html - This is any vectorized-html web-page.
        Returns:
        This will return a list of relative URL's to externally linked java-script pages as Strings.
        See Also:
        InnerTagGetInclusive, javaScriptExternalPage, insertExternalJavaScriptLink(Vector, String), TagNode, TextNode, TagNode.AV(String), HTMLNode.str
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
         // InnerTagGetInclusive.all: Returns a vector of TagNode's that resemble:
         //                              <SCRIPT TYPE="javascript" ...>
         // CN_CI: Check the 'rel' Attribute-Value using a Case-Insensitive, "Contains"
         //        String-Comparison
         //        'contains' rather than 'equals' testing is done because this value may be
         //        "javascript", but it may also be "text/javascript"
         // Inclusive: This means that everything between the <SCRIPT type="javascript"> ... and
         //            the closing </SCRIPT> tag are returned in a vector of vectors.
        
         Vector<Vector<HTMLNode>> v = InnerTagGetInclusive.all
             (html, "script", "type", TextComparitor.CN_CI, "javascript");
        
         Stream.Builder<String> b = Stream.builder();
        
         TOP:
         for (Vector<HTMLNode> scriptSection : v)
         {
             String srcValue=null;
             for (HTMLNode n : scriptSection)
             {
                 if (n.isTagNode())
                     if ((srcValue = ((TagNode) n).AV("src")) != null)
                         break;
        
                 if (n.isTextNode())
                     if (n.str.trim().length() > 0)
                         break TOP;
             }
             b.add(srcValue);
         }
        
         return b.build().toArray(String[]::new);
        
      • insertCanonicalURL

        public static void insertCanonicalURL​(java.util.Vector<HTMLNode> html,
                                              java.lang.String canonicalURLAsStr)
        This section will insert a "canonical url" into the web-page passed via the html-parameter. This canonical-url will be placed in an html <link rel='canonical' href='the_url'> element. This element must be placed in the head section of the passed html page, and if the vectorized-html page that was passed does not contain a 'header' section, a 'NodeNotFoundException' (a run-time / unchecked Exception) will be thrown.
        Parameters:
        html - This is any vectorized-html web-page, but it is important that it be one that contains an HTML <HEAD> ... </HEAD> sub-section, or this method will generate / throw a 'NodeNotFoundException' because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML 'header' section.
        canonicalURLAsStr - This text-String' will be substituted for the 'href' attribute-value inside an HTML <LINK> element, and then inserted into the vectorized-page at the top of the <HEAD>...</HEAD> section.
        Throws:
        NodeNotFoundException - This is thrown if there is no HTML <HEAD> ... </HEAD> section on the page where the <link rel='canonical' href='canonical_url'> would have to be inserted.
        QuotesException - If the canonical-page URL uses a single-quote mark, anywhere in the url-string.
        See Also:
        canonicalTag, hasCanonicalURL(Vector), checkForSingleQuote(String), TagNode, DotPair
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
         // Inserts a link element into the header of this page
         // <link rel='canonical' href='INSERT-URL-STRING-HERE' />
        
         checkForSingleQuote(canonicalURLAsStr);
        
         // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw.
         DotPair header = TagNodeFindInclusive.first(html, "head");
        
         if (header == null) throw new NodeNotFoundException
             (NO_HEADER_MESSAGE.replace("INSERT-STR", "Canonical-url <LINK> element"));
        
         // Builds the canonical <LINK ...> element
         TagNode linkTN  = new TagNode
             (canonicalTag.replace("INSERT-URL-STRING-HERE", canonicalURLAsStr));
        
         // Insert the canonical-url into the page.  Put it at the top of the header, just
         // after <HEAD>
         Util.insertNodes(html, header.start + 1, NEWLINE, linkTN, NEWLINE);
        
      • hasCanonicalURL

        public static java.lang.String hasCanonicalURL​
                    (java.util.Vector<? extends HTMLNode> html)
                throws MalformedHTMLException
        
        This method will check whether a vectorized-html page has an HTML <LINK REL='canonical' ...> tag informing search-engines whether or not the page indicates there is a "Canonical URL" available that may be visited when trying to index a web-site with many pages and sub-pages. Canonical URL's are similar to a top-level "table of contents" web-page that allow the search-engine to avoiding sifting through thousands of sub-pages, and trying to index all of them.
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        Returns:
        This will return whatever text was placed inside the canonical-url HREF='some_url' attribute/value pair of the HTML link tag. If there were no HTML <LINK REL='canonical' HREF='some_url'> tag, then this method will return null.
        Throws:
        MalformedHTMLException - This exception will be thrown if there are multiple html tags that match the link, and rel='canonical' search criteria requirements. If an HTML element <link rel='canonical'> is found, but that element does not have an href='...' attribute, or that attribute is of zero length, then this a situation that will also force this exception to throw.
        See Also:
        InnerTagGet, canonicalTag, insertCanonicalURL(Vector, String), TagNode.AV(String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
         // InnerTagGet.all: Returns a vector of TagNode's that resemble:
         //                  <LINK rel="canonical" ...>
         // EQ_CI_TRM: Check the 'rel' Attribute-Value using a Case-Insensitive, Equality
         //            String-Comparison
         //            Trim the 'rel' Attribute-Value String of possible leading & trailing
         //            White-Space before performing the comparison.
         Vector<TagNode> v = InnerTagGet.all
             (html, "link", "rel", TextComparitor.EQ_CI_TRM, "canonical");
        
         if (v.size() == 0) return null;
        
         if (v.size() > 1) throw new MalformedHTMLException(
             "The web-page you have passed has precisely " + v.size() +
             " canonical-url link elements, but it may not have more than 1.  This is " +
             "invalid HTML."
         );
        
         String s = v.elementAt(0).AV("href");
        
         if (s == null) throw new MalformedHTMLException(
             "The HTML link element that was retrieved, contained a " +
             "rel='canonical' inner-tag / value pair, but did not have an href='...' " +
             "attribute.  This is invalid HTML."
         );
        
         if (s.length() == 0) throw new MalformedHTMLException(
             "The HTML link element that was retrieved contained a zero-length " +
             "string as an attribute-value for the href='...' attribute.   This is not " +
             "invalid, but poorly formatted HTML."
         );
        
         return s;