Package Torello.HTML

Class Listeners


  • public class Listeners
    extends java.lang.Object
    Listeners - Documentation.

    This class allows a user to search for listeners in page or sub-page. It uses the exact same hierarchy of programmer-call options to decide what to look. Search parameters are left as differing method-calls with differing argument marshalling.

    NOTE: Quite a number of large web-sites no longer use java-script in their page itself. Searching through a major hub and looking for java-script will usually return 0 results. There are often java-script files downloaded from the <HEAD>...<SCRIPT></SCRIPT> tags, but generally if there is scripted-content, the script will operate on the class=..., id=..., and data-SOME_TAG=... attributes in the HTML Element. In this way inserting script into the body-text HTML page directly is avoided. If you are scraping a page you have written yourself, and it does have java-script, then by-all-means - test it out. However If these methods are returning '0' results, at least for many of the large news-websites and search-engines which were tested - listeners inside HTML Elements seemed uncommon.

    FIND, GET Find implies that a (int) position within the Vector will be returned as a search result(s). Get implies that the actual TagNode itself shall be returned.

    • int sPos, int ePos: When these parameters are present, only HTMLNode's between these specified Vector indices will be considered for matching the search criteria.
    • String htmlTags: When this parameter is present, only HTML TagNode's whose "primary tag" matches this string will be considered.


    Static (Functional) API: The methods in this class are all (100%) defined with the Java Key-Word / Key-Concept 'static'. Furthermore, there is no way to obtain an instance of this class, because there are no public (nor private) constructors. Java's Spring-Boot, MVC feature is *not* utilized because it flies directly in the face of the light-weight data-classes philosophy. This has many advantages over the rather ornate Component Annotations (@Component, @Service, @AutoWired, etc... 'Java Beans') syntax:

    • The methods here use the key-word 'static' which means (by implication) that there is no internal-state. Without any 'internal state' there is no need for constructors in the first place! (This is often the complaint by MVC Programmers).
    • A 'Static' (Functional-Programming) API expects to use fewer data-classes, and light-weight data-classes, making it easier to understand and to program.
    • The Vectorized HTML data-model allows more user-control over HTML parse, search, update & scrape. Also, memory management, memory leakage, and the Java Garbage Collector ought to be intelligible through the 'reuse' of the standard JDK class Vector for storing HTML Web-Page data.

    The power that object-oriented programming extends to a user is (mostly) limited to data-representation. Thinking of "Services" as "Objects" (Spring-MVC, 'Java Beans') is somewhat 'over-applying' the Object Oriented Programming Model. Like most classes in the Java-HTML JAR Library, this class backtracks to a more C-Styled Functional Programming Model (no Objects) - by re-using (quite profusely) the key-word static with all of its methods, and by sticking to Java's well-understood class Vector

    Static Field: The methods in this class do not create any internal state that is maintained - but there is a single private & static field defined. This field is instantiated only once during the Class Loader phase (and only if this class shall be used), and serves as a data 'lookup' field (like a static constant). View this class' source-code in the link provided below to see internally used data.

    The sole internally-defined private, static field is a lookup-table listing the names of all HTML 'listener' attributes. This table is stored in the Java-HTML JAR library file, unless this class is loaded by the Class Loader.



    • Method Detail

      • listAllAvailable

        public static java.util.Iterator<java.lang.String> listAllAvailable()
        This will return an Iterator of the listed java-script listeners available in this class
        Code:
        Exact Method Body:
        1
         return new RemoveUnsupportedIterator<String>(l.iterator());
        
      • addNewListenerName

        public static boolean addNewListenerName​(java.lang.String listenerName)
        This just allows the user to add a name of a new listener that was not already stored in the internal-set of known java-script listeners. When searching a page for listeners, this class will only (obviously) be able to find ones whose names are known.
        Parameters:
        listenerName - The name of a listener that is not already 'known-about' in by this class
        Returns:
        TRUE If the internal table of listener names was not already stored in the set, FALSE if attempting to add a listener that is already in the set.
        Code:
        Exact Method Body:
        1
         return l.add(listenerName.toLowerCase());
        
      • extract

        public static java.util.Properties extract​(TagNode tn)
        This will test whether listeners are present in the TagNode, and if so - return them.
        Input TagNodeOutput Properties:
        <frameset cols="20%,80%" title="Documentation frame" onload="top.loadFrames()"> onload: top.loadFrames()
        <a href="javascript:void(0);" onclick="return j2gb('http://www.gov.cn');"> onclick: return j2gb('http://www.gov.cn');
        Parameters:
        tn - This may be any TagNode, but it will be tested for JavaScript listeners.
        Returns:
        Will return a java.util.Properties object that contains a key-value table of any/all listeners present in the TagNode. If there are no listeners, this method will not return null, it will return an empty Properties object.
        See Also:
        TagNode.AV(String), StrCmpr.containsIgnoreCase(String, String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
         Properties p = new Properties();    String s;
        
         for (String listener : l)
             if (StrCmpr.containsIgnoreCase(tn.str, listener))
                 if ((s = tn.AV(listener)) != null) 
                     // This **may** seem redundant, but it is not, because what if it was phony?
                     // What if the "listener" key-word was actually buried in some "ALT=..." text?
                     // It's an "optimization"
                     p.put(listener, s);
        
         return p;
        
      • extractAll

        public static java.util.Properties[] extractAll​
                    (java.util.Vector<TagNode> list)
        
        If you have performed a Java-Script Listener Get, this method will cycle through the list that was returned and generate an identical length return Properties[] array that has called extract(tn) for-each element in the parameter 'list.'
        Parameters:
        list - A list of TagNode's that are expected to contain Java-Script listeners. If some of the members of this input Vector have TagNode's with no listeners, the return array will still remain a parallel (same-size) array, however some of it's elements will have Properties with no key/value pairs in them (zero-size).
        Returns:
        A list of Properties for each element in this 'list.'
        See Also:
        extract(TagNode)
        Code:
        Exact Method Body:
        1
        2
        3
         Properties[] ret = new Properties[list.size()];
         for (int i=0; i < list.size(); i++) ret[i] = extract(list.elementAt(i));
         return ret;
        
      • find

        public static int[] find​(java.util.Vector<? extends HTMLNode> html,
                                 int sPos,
                                 int ePos)
        Find all HTML Elements (TagNode elements) that have listeners. Limit the index of the page to a sublist of that page,
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        A list of index-pointers into the underlying parameter 'html' where each node pointed to by the list contains a TagNode element with a listener attribute / inner-tag. Search results shall be limited to only considering elements between sPos ... ePos.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        hasListener(TagNode), LV
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
         IntStream.Builder   b = IntStream.builder();        // Use Java Streams to keep lists of int's
         LV                  l = new LV(html, sPos, ePos);   // Loop from sPos to ePos-1
         HTMLNode            n;                              // Temporary Variable
        
         for (int i=l.start; i < l.end; i++)
             if (    (n = html.elementAt(i)).isTagNode()             // Only Search TagNode's, not Text or Comment
                 &&  hasListener((TagNode) n)    )                   // TagNode's with a listener
                 b.add(i);                                           // Save the index
        
         return b.build().toArray();
        
      • find

        public static int[] find​(java.util.Vector<? extends HTMLNode> html,
                                 int sPos,
                                 int ePos,
                                 java.lang.String... htmlTags)
        Find all HTML Elements (TagNode elements) that have listeners. Limit the index of the page to a sublist of that page, and also limit the search to only allow for matches where the HTML Element is among the list of elements in parameter 'htmlTags'
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        htmlTags - A list of HTML Elements, as a varargs String... Array, that constitute a match. Any HTML Element in the web-page that has a listener attribute, but whose HTML tag/token is not present in this list will not be considered a match, and will not be returned in this method's search results.
        Returns:
        A list of index-pointers into the underlying parameter 'html' where each node pointed to by the list contains a TagNode element with a listener attribute / inner-tag. Search results shall be limited to only considering elements between sPos ... ePos, and also limited to HTML Elements in parameter 'htmlTags'
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        HAS_TOK_MATCH(String, String[]), hasListener(TagNode), LV
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
         IntStream.Builder   b = IntStream.builder();        // Use Java Streams to keep lists of int's
         LV                  l = new LV(html, sPos, ePos);   // Loop from sPos to ePos-1
         HTMLNode            n;                              // Temporary Variable
         TagNode             tn;                             // Same
        
         htmlTags = toLowerCase(htmlTags);
        
         for (int i=l.start; i < l.end; i++)
             if (    (n = html.elementAt(i)).isTagNode()             // Only Search TagNode's, not Text or Comment
                 &&  HAS_TOK_MATCH((tn = (TagNode) n).tok, htmlTags) // Make sure HTML Element is among list in 'htmlTags'
                 &&  hasListener(tn) )                               // TagNode's with a listener
                 b.add(i);                                           // Save the index
        
         return b.build().toArray();
        
      • get

        public static java.util.Vector<TagNodeget​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos)
        
        Find all HTML Elements (TagNode elements) that have listeners. Limit the index of the page to a sublist of that page,
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        A list TagNode elements that have a listener attribute / inner-tag. Search results shall be limited to only considering elements between sPos ... ePos.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        hasListener(TagNode), LV
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
         Vector<TagNode> ret = new Vector<>();           // Keep Matching TagNode's here, for return
         LV              l   = new LV(html, sPos, ePos); // Loop from sPos to ePos-1
         HTMLNode        n;                              // Temporary Variable
         TagNode         tn;                             // Same
        
         for (int i=l.start; i < l.end; i++)
             if (    (n = html.elementAt(i)).isTagNode()             // Only Search TagNode's, not Text or Comment
                 &&  hasListener(tn = (TagNode) n)   )               // TagNode's with a listener
                 ret.add(tn);                                        // Save this TagNode
        
         return ret;
        
      • get

        public static java.util.Vector<TagNodeget​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos,
                     java.lang.String... htmlTags)
        
        Find all HTML Elements (TagNode elements) that have listeners. Limit the index of the page to a sublist of that page, and also limit the search to only allow for matches where the HTML Element is among the list of elements in parameter 'htmlTags'
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        htmlTags - A list of HTML Elements, as a varargs String Array, that constitute a match. Any HTML Element in the web-page that has a listener attribute, but whose HTML tag/token is not present in this list will not be considered a match, and will not be returned in this method's search results.
        Returns:
        A list of TagNode elements that have a listener attribute / inner-tag. Search results shall be limited to only considering elements between sPos ... ePos, and also limited to HTML Elements in parameter 'htmlTags'
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        HAS_TOK_MATCH(String, String[]), hasListener(TagNode), LV
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
         Vector<TagNode> ret = new Vector<>();           // Keep Matching TagNode's here, for return
         LV              l   = new LV(html, sPos, ePos); // Loop from sPos to ePos-1
         HTMLNode        n;                              // Temporary Variable
         TagNode         tn;                             // Same
        
         htmlTags = toLowerCase(htmlTags);
        
         for (int i=l.start; i < l.end; i++)
             if (    (n = html.elementAt(i)).isTagNode()     
                 &&  HAS_TOK_MATCH((tn = (TagNode) n).tok, htmlTags) // Make sure the HTML Element is among 'htmlTags'
                 &&  hasListener(tn) )                               // TagNode's with a listener
                 ret.add(tn);                                        // Save this TagNode
        
         return ret;
        
      • hasListener

        public static boolean hasListener​(TagNode tn)
        Checks if a certain class TagNode has a listener inner-tag / attribute.
        Parameters:
        tn - Any HTML Element TagNode
        Returns:
        TRUE If this TagNode has a listener, and FALSE otherwise.
        See Also:
        StrCmpr.containsIgnoreCase(String, String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
         Properties p = new Properties();
        
         for (String listener : l)
             if (StrCmpr.containsIgnoreCase(tn.str, listener))
                 if (tn.AV(listener) != null)
                         // This **may** seem redundant, but it is not, because what if it was phony?
                         // What if the "listener" key-word was actually buried in some "ALT=..." text?
                     return true;
        
         return false;
        
      • toLowerCase

        protected static java.lang.String[] toLowerCase​(java.lang.String[] tags)
        Converts the varargs parameter to lower-case Strings.

        NOTE: This is var-args varargs safe, because a new String array is created, with new String-pointers.
        Parameters:
        tags - The varargs String parameter acquired from the search-methods in this class.
        Returns:
        a lower-case version of the input.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
         String[] ret = new String[tags.length];
        
         for (int i=0; i < tags.length; i++)
             if (tags[i] != null)
                 ret[i] = tags[i].toLowerCase();
             else throw new HTMLTokException(
                 "One of the HTML tokens you have passed to the variable-length parameter " +
                 "'htmlTags' was null."
             );
        
         return ret;
        
      • HAS_TOK_MATCH

        protected static boolean HAS_TOK_MATCH​(java.lang.String htmlTag,
                                               java.lang.String... htmlTags)
        Checks if the var-args parameter String... htmlTags matches a particular token
        Parameters:
        htmlTag - The token to be checked against the user's requested 'htmlTags' list parameter
        htmlTags - The list of acceptable HTML Tag Elements. This is a search specification parameter used by some of the search-methods in this class.
        Returns:
        TRUE If the tested token parameter 'htmlTag' is a member of this elements in list parameter 'htmlTags', and FALSE otherwise.
        Code:
        Exact Method Body:
        1
         for (String s : htmlTags) if (s.equals(htmlTag)) return true; return false;