Class TagNodePeekL1Inclusive


  • public class TagNodePeekL1Inclusive
    extends java.lang.Object
    TagNode Peek L1 (Sibling) Inclusive - Documentation.

    TagNodePeekL1Inclusive =>

    1. TagNode: This implies that only HTML TagNode's will be used for searching. The field TagNode.tok field is used as a search criteria. This public, final String field contains the name of the HTML Element - for instance, 'div', 'p', 'span', 'img', etc...
      InnerTag's - (a.k.a. 'attributes') - are not part of the search.
    2. Peek: This implies that BOTH the Vector-index / indices where a match occurred, AND the the HTMLNode at that index are SIMULTANEOUSLY returned by these methods - using the data-type classes NodeIndex and SubSection.
    3. L1: The term 'L1' is simply short for Level-1, and it refers to finding matches that occur inside or 'within' the bounds of a previous match. To skip-over or avoid matches that occur inside of another, previously identified and returned, match - use an 'L1' search. If a container or "branch" node from an HTML Vectoris wrapped inside another, the inner-container or "inner-branch" will not be included with the search results. This concept similar but not identical to (alludes to) the Java-Script term "sibling" vis-a-vis DOM (Document Object Model) Trees.
      IMPORTANT NOTE: The classes in this Java HTML JAR Library do not build DOM Trees
    4. Inclusive: The word "Inclusive" is used to indicate that all HTMLNode's between an opening and closing HTML-tag is requested. The concept is extremely similar to the Java-Script feature / "term" '.innerHTML', although in this (JavaHTML) JAR Library, no DOM Trees are ever constructed. This method will return all nodes between the first matching TagNode element, and its closing TagNode element pair.



    The letters L1 literally are just an acronym for "Level 1". When an "Level 1 Inclusive" Get or Find is needed, the user is actually requesting, for instance, only matching HTML-Tags that (if this were a DOM-Tree implementation, which it is not!), matches from the same tree-depth, specifically: a depth of 1-level in the tree will be returned in the result set.



    AN EXAMPLE: If there were an HTML-Page that included the following TagNode's and TextNode's
    
    <HTML>
    <HEAD><TITLE>Node SearchExample</TITLE></HEAD>
    <BODY>
    
    <B>In this example, we will see the difference between:</B>
    <UL>
    <LI>An 'Inclusive Search', Some HTML list-text here!</LI>
    <LI>Versus an 'L1 Inclusive Search', More HTML list-text</LI>
    </UL>
    
    <BR /><HR><BR />
    
    <DIV>How are you doing today?<DIV>(Please provide an answer in the form below)</DIV></DIV>
    <DIV>If you have any questions or complaints, please let us know!</DIV>
    
    </BODY></HTML>
    
    For the elements of the "Unordered List" (HTML <UL> tag) - an "Inclusive Search" for "<LI>" Tag's and an "L1 Inclusive Search" for "<LI>" Tag's would produce the exact same result set. HOWEVER An L1 Inclusive Search for HTML "<DIV>" Tag's would produce two sublists in the above HTML-Example, but an plain-old Inclusive Search for the same DIV, would produce three sublists!

    Example 1 (Inclusive-only, not L1) Results:
    
     // An ordinary "inclusive search" where the start-tag, end-tag - and everything between are returned
     // as two array-boundary end-points (specifically, a "DotPair").
     Vector<DotPair> sublists = TagNodeFindInclusive.all(page, "li");
    
     // sublists would contain the following array/vector boundaries as dotted-pairs:
     // sublists.elementAt(0):
     //      HTMLNode 0: TagNode.str = "<LI>";
     //      HTMLNode 1: TextNode.str = "An 'Inclusive Search', Some HTML list-text here!";
     //      HTMLNode 2: TagNode.str = "</LI>";
     
     // sublists.elementAt(1):
     //      HTMLNode 0: TagNode.str = "<LI>";
     //      HTMLNode 1: TextNode.str = "Versus an 'L1 Inclusive Search', More HTML list-text";
     //      HTMLNode 2: TagNode.str = "</LI>";
    


    Example 2 (Inclusive-only, not L1) Results:
    
     // Here, an "inclusive search" is performed.  Again, the start-tag, end-tag, and everything between them
     // are returned between the DotPair (array start/end boundaries).
     // inner matches which are not HTML tree-siblings will also be included
     Vector<DotPair> l1Sublists = TagNodeFindInclusive.all(page, "div");
    
     // sublists would contain the following array/vector boundaries as dotted-pairs:
     // sublists.elementAt(0):
     //      HTMLNode 0: TagNode.str = "<DIV>";
     //      HTMLNode 1: TextNode.str = "How are you doing today?";
     //      HTMLNode 2: TagNode.str = "</DIV>";
     //      HTMLNode 3: TagNode.str = "<DIV>";
     //      HTMLNode 4: TextNode.str = "(Please provide an answer in the form below)";
     //      HTMLNode 5: TagNode.str = "</DIV>";
     //      HTMLNode 6: TagNode.str = "</DIV>";
      
     // sublists.elementAt(1): (*** Note that these HTMLNode's are also included in the previous result set)
     //      HTMLNode 0: TagNode.str = "<DIV>";
     //      HTMLNode 1: TextNode.str = "(Please provide an answer in the form below)";
     //      HTMLNode 2: TagNode.str = "</DIV>";
     
     // sublists.elementAt(2):
     //      HTMLNode 0: TagNode.str = "<DIV>";
     //      HTMLNode 1: TextNode.str = "If you have any questions or complaints, please let us know!";
     //      HTMLNode 2: TagNode.str = "</DIV>";
    


    Example 3 (L1 Inclusive) Results:
    
     // Here, an "L1 inclusive (sibling) search" is performed.  Again, the start-tag, end-tag, and
     // everything between them are returned between the DotPair (array start/end boundaries), but inner
     // matches which are not HTML tree-siblings will be ignored.
     Vector<DotPair> l1Sublists = TagNodeFindL1Inclusive.all(page, "div");
    
     // sublists would contain the following array/vector boundaries as dotted-pairs:
     // sublists.elementAt(0):
     //      HTMLNode 0: TagNode.str = "<DIV>";
     //      HTMLNode 1: TextNode.str = "How are you doing today?";
     //      HTMLNode 2: TagNode.str = "</DIV>";
     //      HTMLNode 3: TagNode.str = "<DIV>";
     //      HTMLNode 4: TextNode.str = "(Please provide an answer in the form below)";
     //      HTMLNode 5: TagNode.str = "</DIV>";
     //      HTMLNode 6: TagNode.str = "</DIV>";
     
     // sublists.elementAt(1): 
     //      HTMLNode 0: TagNode.str = "<DIV>";
     //      HTMLNode 1: TextNode.str = "If you have any questions or complaints, please let us know!";
     //      HTMLNode 2: TagNode.str = "</DIV>";
    


    Another way to explain the "L1 Inclusive" or "Level 1 Inclusive" specification, is that the iterator-pointer that advances through the Java-Vector is advanced to the end of the closing-version of the HTML-tag, while a "plain old Inclusive" search-specification advances the loop-pointer or iterator-pointer to the very next HTMLNode whenever a match is found. This means that in the DIV example above, the "<DIV> ... </DIV> inside of a <DIV> ... </DIV> " (sometimes called a "sub-div", or a DIV element with a tree-depth of two, would not be returned in the iterator or the vector!)



    Methods Available

    Method Explanation
    all (...) Obtain all sub-lists which do not have any overlap (the meaning of 'L1') from the vectorized-html webpage that meet the criteria.

    Method Parameters

    Parameter Explanation
    Vector<? extends HTMLNode> html This represents any vectorized HTML page, sub-page, or list of partial-elements.
    int sPos, int ePos When these parameters are present, only HTMLNode's that are found between the specified Vector indices will be considered for matching with the search criteria.

    NOTE: In every situation where the parameters int sPos, int ePos are used, parameter 'ePos' will accept a negative value, but parameter 'sPos' will not. When 'ePos' is passed a negative-value, the internal LV ('Loop Variable Counter') will have its public final int end; field set to the length of the vectorized-html page that was passed. (html.size() of parameter Vector<HTMLNode> html).

    EXCEPTIONS: An IndexOutOfBoundsException will be thrown if:

    • If sPos is negative, or if sPos is greater-than or equal-to the size of the input Vector
    • If ePos is zero, or greater than the size of the input Vector.
    • If sPos is a larger integer than ePos
    String htmlTag When this parameter is present, only HTMLNode's which are both instances of class TagNode *and* have a TagNode.tok field whose value is equal to this parameter 'htmlTag', will be returned as matches.

    COMMON EXAMPLES: Some common examples of valid htmlTags are: a, div, img, table, tr, meta as well as all other valid HTML element-tokens.

    NOTE: This comparison is performed using a case-insensitive compare-method.

    EXCEPTIONS: If this parameter is not a valid HTML element, an HTMLTokException will be thrown.

    Return Values:

    1. Vector<Vector<HTMLNode>> This would be a "list of sub-lists" or an "array of sub-arrays" which are used when multiple results (multiple sub-lists) are needed to be returned to the calling procedure.
    2. A zero-length Vector<Vector<HTMLNode>> vector means no matches were found on the page or sub-page. Zero-length vectors are returned from any method where the possibility existed for multiple-matches being provided as a result-set.


    Static (Functional) API: The methods in this class are all (100%) defined with the Java Key-Word / Key-Concept 'static'. Furthermore, there is no way to obtain an instance of this class, because there are no public (nor private) constructors. Java's Spring-Boot, MVC feature is *not* utilized because it flies directly in the face of the light-weight data-classes philosophy. This has many advantages over the rather ornate Component Annotations (@Component, @Service, @AutoWired, etc... 'Java Beans') syntax:

    • The methods here use the key-word 'static' which means (by implication) that there is no internal-state. Without any 'internal state' there is no need for constructors in the first place! (This is often the complaint by MVC Programmers).
    • A 'Static' (Functional-Programming) API expects to use fewer data-classes, and light-weight data-classes, making it easier to understand and to program.
    • The Vectorized HTML data-model allows more user-control over HTML parse, search, update & scrape. Also, memory management, memory leakage, and the Java Garbage Collector ought to be intelligible through the 'reuse' of the standard JDK class Vector for storing HTML Web-Page data.

    The power that object-oriented programming extends to a user is (mostly) limited to data-representation. Thinking of "Services" as "Objects" (Spring-MVC, 'Java Beans') is somewhat 'over-applying' the Object Oriented Programming Model. Like most classes in the Java-HTML JAR Library, this class backtracks to a more C-Styled Functional Programming Model (no Objects) - by re-using (quite profusely) the key-word static with all of its methods, and by sticking to Java's well-understood class Vector

    Internal-State: A user may click on this class' source code (see link below) to view any and all internally defined fields class. A cursory inspection of the code would prove that this class has precisely zero internally defined global fields (Spaghetti). All variables used by the methods in this class are local fields only, and therefore this class ought to be though of as 'state-less'.

    View Actual Hi-Lited Code Files:






    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method
      static Vector<SubSection> all​(Vector<? extends HTMLNode> html, int sPos, int ePos, String htmlTag)
      static Vector<SubSection> all​(Vector<? extends HTMLNode> html, String htmlTag)
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait