Package Torello.HTML

Class DotPair

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, java.lang.Comparable<DotPair>, java.lang.Iterable<java.lang.Integer>

    public final class DotPair
    extends java.lang.Object
    implements java.io.Serializable, java.lang.Comparable<DotPair>, java.lang.Cloneable, java.lang.Iterable<java.lang.Integer>
    DotPair - Documentation.

    The purpose of this class is to keep the starting and ending points of an array sub-list together. In a much older computer language (LISP/Scheme) a 'dotted pair' is just two integers (numbers) that are glued to each other. Here, the two numbers are intended to represent Array Start and Array End Position values for the sub-list of a Vector.

    NOTE: Calling this class "Arraysub-listEndPoints" would be a lot more descriptive, but the name would be so long to type that instead, it is going to be called 'DotPair'

    IMPORTANT NOTE: For every one of the Find, Get and Remove node methods, the input parameters sPos, ePos are designed such that:

    • the "sPos" is inclusive, meaning that the Vector index denoted by the value of this parameter is included in the sub-list.
    • the "ePos" is exclusive, meaning that the Vector index denoted by the value of this parameter is NOT included in the sub-list.

    HOWEVER, HERE: in class DotPair

    • the "start" is inclusive, meaning that the Vector index denoted by the value of this class field is included in the sub-list.
    • the "end" is ALSO inclusive, meaning that the Vector index denoted by the value of this class field is ALSO included in the sub-list.

    Generally the "sPos, ePos" method parameters and a DotPair.start or DotPair.end field have exactly identical meanings - EXCEPT for the above noted difference.
    See Also:
    NodeIndex, SubSection, Serialized Form



    • Field Detail

      • serialVersionUID

        public static final long serialVersionUID
        This fulfils the SerialVersion UID requirement for all classes that implement Java's interface java.io.Serializable. Using the Serializable Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.
        See Also:
        Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        public static final long serialVersionUID = 1;
        
      • start

        public final int start
        This is intended to be the "starting index" into an sub-array of an HTML Vector of HTMLNode elements.
        Code:
        Exact Field Declaration Expression:
        1
        public final int start;
        
      • end

        public final int end
        This is intended to be the "ending index" into a sub-array of an HTML Vector of HTMLNode elements.
        Code:
        Exact Field Declaration Expression:
        1
        public final int end;
        
      • comp2

        public static java.util.Comparator<DotPair> comp2
        This is an "alternative Comparitor" that can be used for sorting instances of this class. It should work with the Collections.sort(List, Comparator) method in the standard JDK package java.util.*;

        NOTE: This simply compares the size of one DotPair to a second. The smaller shall be sorted first, and the larger (longer-in-length) DotPair shall be sorted later. If they are of equal size, whichever of the two has an earlier 'start' position in the Vector is considered first.
        See Also:
        CommentNode.body
        Code:
        Exact Field Declaration Expression:
        1
        2
        3
        4
        5
        public static Comparator<DotPair> comp2 = (DotPair dp1, DotPair dp2) ->
            {
                int ret = dp1.size() - dp2.size();
                return (ret != 0) ? ret : (dp1.start - dp2.start);
            };
        
    • Constructor Detail

      • DotPair

        public DotPair​(int start,
                       int end)
        This constructor takes two integers and saves them into the public member fields.
        Parameters:
        start - This is intended to store the starting position of a vectorized-webpage sub-list or subpage.
        end - This will store the ending position of a vectorized-html webpage or subpage.
        Throws:
        java.lang.IndexOutOfBoundsException - A negative 'start' or 'end' parameter-value will cause this exception throw.
        java.lang.IllegalArgumentException - A 'start' parameter-value that is larger than the 'end' parameter will cause this exception throw.
        See Also:
        NodeIndex, SubSection
    • Method Detail

      • hashCode

        public int hashCode()
        Implements the standard java 'hashCode()' method. This will provide a hash-code that is likely to avoid crashes.
        Overrides:
        hashCode in class java.lang.Object
        Returns:
        A hash-code that may be used for inserting 'this' instance into a hashed table, map or list.
        Code:
        Exact Method Body:
        1
         return this.start + (1000 * this.end);
        
      • size

        public int size()
        The purpose of this is to remind the user that the array bounds are inclusive at BOTH ends of the sub-list. Often, in many java.lang.String operations, the start-position is included in the results, but the end position is not.

        NOTICE: For a instance of 'DotPair', the intention is to include both the start and ending positions are both INCLUSIVE, meaning they are both included in the sub-list.
        Returns:
        The length of a sub-array that would be indicated by this dotted pair.
        Code:
        Exact Method Body:
        1
         return this.end - this.start + 1;
        
      • toString

        public java.lang.String toString()
        Java's toString() requirement.
        Overrides:
        toString in class java.lang.Object
        Returns:
        A string representing 'this' instance of DotPair.
        Code:
        Exact Method Body:
        1
         return "[" + start + ", " + end + "]";
        
      • equals

        public boolean equals​(java.lang.Object o)
        Java's public boolean equals(Object o) requirements.
        Overrides:
        equals in class java.lang.Object
        Parameters:
        o - This may be any Java Object, but only ones of 'this' type whose internal-values are identical will force this method to return TRUE.
        Returns:
        TRUE if (and only if) parameter 'o' is an instanceof DotPair and, also, both have equal start and ending field values.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         if (o instanceof DotPair)
         {
             DotPair dp = (DotPair) o;
             return (this.start == dp.start) && (this.end == dp.end);
         }
         else return false;
        
      • clone

        public DotPair clone()
        Java's interface Cloneable requirements. This instantiates a new DotPair with identical 'start', 'end' fields.
        Overrides:
        clone in class java.lang.Object
        Returns:
        A new DotPair whose internal fields are identical to this one.
        Code:
        Exact Method Body:
        1
         return new DotPair(this.start, this.end);
        
      • compareTo

        public int compareTo​(DotPair other)
        Java's interface Comparable<T> requirements. This is not the only comparison4 operation possible, but it does satisfy one reasonable requirement - SPECIFICALLY: which of two separate instances of DotPair start first.

        NOTE: If two DotPair instances begin at the same Vector-index, then the shorter of the two shall come first.
        Specified by:
        compareTo in interface java.lang.Comparable<DotPair>
        Parameters:
        other - Any other DotPair to be compared to 'this' DotPair
        Returns:
        An integer that fulfils Java's interface Comparable<T> public boolean compareTo(T t) method requirements.
        Code:
        Exact Method Body:
        1
        2
         int ret = this.start - other.start;
         return (ret != 0) ? ret : (this.size() - other.size());
        
      • iterator

        public java.util.PrimitiveIterator.OfInt iterator()
        This shall return an int Iterator (which is properly named class java.util.PrimitiveIterator.OfInt) that iterates integers beginning with the value in this.start and ending with the value in this.end.
        Specified by:
        iterator in interface java.lang.Iterable<java.lang.Integer>
        Returns:
        An Iterator that iterates 'this' instance of DotPair from the beginning of the range, to the end of the range. The Iterator returned will produce Java's primitive type int.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         return new PrimitiveIterator.OfInt()
         {
             private int cursor = start;
             public boolean hasNext()    { return this.cursor <= end; }
             public int nextInt()        { return cursor++;         }
         };
        
      • isInside

        public boolean isInside​(int index)
        This will test whether a specific index is contained (between dp.start and dp.end, inclusively.
        Parameters:
        index - This is any integer index value. It must be greater than zero.
        Returns:
        TRUE If the value of index is greater-than-or-equal-to the value stored in field 'start' and furthermore is less-than-or-equal-to the value of field 'end'
        Throws:
        java.lang.IndexOutOfBoundsException - If the value is negative, this exception will throw.
        Code:
        Exact Method Body:
        1
        2
        3
        4
         if (index < 0) throw new IndexOutOfBoundsException
             ("You have passed a negative index [" + index + "] here, but this is not allowed.");
        
         return (index >= start) && (index <= end);
        
      • enclosedBy

        public boolean enclosedBy​(DotPair other)
        This will test whether 'this' DotPair is completely enclosed by parameter DotPair 'other'.
        Parameters:
        other - Another DotPair. This parameter is expected to be a descriptor of the same vectorized-webpage as 'this' DotPair is. It is not mandatory, but if not, the comparison is likely meaningless.
        Returns:
        TRUE If (and only if) parameter 'other' encloses 'this'.
        Code:
        Exact Method Body:
        1
         return (other.start <= this.start) && (other.end >= this.end);
        
      • encloses

        public boolean encloses​(DotPair other)
        This will test whether 'this' DotPair is encloses, completely, parameter DotPair 'other'.
        Parameters:
        other - Another DotPair. This parameter is expected to be a descriptor of the same vectorized-webpage as 'this' DotPair is. It is not mandatory, but if not, the comparison is likely meaningless.
        Returns:
        TRUE If (and only if) parameter 'other' is enclosed completely by 'this'.
        Code:
        Exact Method Body:
        1
         return (this.start <= other.start) && (this.end >= other.end);
        
      • overlaps

        public boolean overlaps​(DotPair other)
        This will test whether parameter 'other' has any overlapping Vector-indices with 'this' DotPair.
        Parameters:
        other - Another DotPair. This parameter is expected to be a descriptor of the same vectorized-webpage as 'this' DotPair is. It is not mandatory, but if not, the comparison is likely meaningless.
        Returns:
        TRUE If (and only if) parameter 'other' and 'this' have any overlap.
        Code:
        Exact Method Body:
        1
        2
         return  ((this.start >= other.start)    && (this.start <= other.end)) ||
                 ((this.end >= other.start)      && (this.end <= other.end));
        
      • toVector

        public static java.util.Vector<HTMLNodetoVector​
                    (java.util.Vector<? extends HTMLNode> html,
                     DotPair dp)
        
        This method converts a sublist, represented by a "dotted pair", and converts it into a Vector of HTMLNode.

        NOTE: The DotPair dp parameter contains fields start, end, which simply represent the starting and ending indices into the HTML page Vector. This method cycles through that Vector, beginning with the dp.start field, and ending with the dp.end field. Each HTMLNode reference within the sublist is inserted into the returned Vector.
        Parameters:
        html - Any Vectorized-HTML Web-Page, or sub-page
        dp - Any sublist within that HTML page.
        Returns:
        A Vector version of the original sublist that was represented by passed parameter 'dp'
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         Vector<HTMLNode>    ret = new Vector<>();
         LV                  l   = new LV(html, dp.start, dp.end + 1);
        
         for (int i=l.start; i < l.end; i++) ret.addElement(html.elementAt(i));
        
         return ret;
        
      • toVectors

        public static java.util.Vector<java.util.Vector<HTMLNode>> toVectors​
                    (java.util.Vector<? extends HTMLNode> html,
                     java.util.Vector<DotPair> sublists)
        
        This will cycle through a "list of sublists" and call the method toVector(Vector<? extends HTMLNode> html, DotPair dp) on each sublist in the input parameter 'sublists' Those sublists will be collected into another Vector and returned.
        Parameters:
        html - Any Vectorized-HTML Web-Page, or sub-page
        sublists - A "List of sublists" within that HTML page.
        Returns:
        This method shall return a Vector containing vectors as sublists.
        Code:
        Exact Method Body:
        1
        2
        3
        4
         Vector<Vector<HTMLNode>> ret = new Vector<>();
        
         for (DotPair sublist : sublists) ret.addElement(toVector(html, sublist));
         return ret;
        
      • toSubSections

        public static java.util.Vector<SubSectiontoSubSections​
                    (java.util.Vector<? extends HTMLNode> html,
                     java.util.Vector<DotPair> sublists)
        
        This will cycle through a "list of sublists" and call the method toVector(Vector<? extends HTMLNode> html, DotPair dp) on each sublist in the input parameter 'sublists'. Those sublists will be collected into another Vector and returned.
        Parameters:
        html - Any Vectorized-HTML Web-Page, or sub-page
        sublists - A "List of sublists" within that HTML page.
        Returns:
        This method shall return a Vector containing vectors as sublists.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         Vector<SubSection> ret = new Vector<>();
        
         for (DotPair sublist : sublists)
             ret.addElement(new SubSection(sublist, toVector(html, sublist)));
        
         return ret;
        
      • toStream

        public static java.util.stream.IntStream toStream​
                    (java.lang.Iterable<DotPair> dpi,
                     boolean leastToGreatest)
        
        This method will convert a list of class HTML.DotPair instances to a Java java.util.stream.IntStream. The java.util.stream.IntStream that is generated by this method shall contain all "Vector-indexes" (integers) of whatever underlying vectorized-html page from which the provided DotPair sublists were created that are "inside" (according to the public boolean DotPair.isInside(int)) any and all of the 'DotPairs' instances that were passed via the 'Iterable' parameter.

        HINT: Many of the "Find" Methods available in the HTML.NodeSearch package return instances of Vector<DotPair>. These Vectors of DotPair are to be thought-of as "lists of sub-lists of a vectorized-html web-page." This method can help identify each and every integer-index (place in the html-Vector) that are "inside any of these passed sublists."

        PRIMARY POINT: Many of the sublists (a.k.a. "The DotPair's of the input-parameter 'dpi'") will often overlap. Furthermore, many will have spaces/gaps between them. This method shall return an 'IntStream' of integers all of which are guaranteed to be members of at least one, but possibly multiple, of these DotPair sublists.

        STALE-DATA: Try to keep in mind, always, that when writing code that modifies vectorized-HTML, the moment any node is inserted or deleted all Vector indexes present in a programmer's code-memory or data-structure-memory become stale or "invalid!" There are myriad ways to handle this issue, many of which are beyond the scope of a Java Documentation Page.
        Parameters:
        dpi - This may be any source for a class 'Dotpair' instance which implements the public interface java.lang.Iterable<Dotpair> interface.
        leastToGreatest - When this parameter receives a TRUE value, the results that are returned from this IntStream will be sorted least to greatest. To generated an IntStream that produces results that are sorted from greatest to least, pass FALSE to this parameter.
        Returns:
        A java java.util.stream.IntStream of the integers in that are members of this Iterable<DotPair>
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
         Iterator<DotPair>   iter    = dpi.iterator();
         TreeSet<DotPair>    ts      = new TreeSet<>();
        
         while (iter.hasNext()) ts.add(iter.next());
             // The tree-set will add the "DotPair" to the tree - and keep them sorted,
             // since that's what "TreeSet" does.
        
         Iterator<DotPair>   tsIter  = leastToGreatest ? ts.iterator() : ts.descendingIterator();
        
         IntStream.Builder   builder = IntStream.builder();
         DotPair             dp      = null;
        
         if (leastToGreatest)
             while (tsIter.hasNext())
                 for (int i=(dp=tsIter.next()).start; i <= dp.end; i++)
                     builder.add(i);
             // We are building a "forward-index" stream... DO AS MUCH SORTING... AS POSSIBLE!
        
         else
             while (tsIter.hasNext())
                 for (int i=(dp=tsIter.next()).end; i >= dp.start; i--)
                     builder.add(i);
             // we are building a "reverse-index" stream... Make sure to add the sub-lists in
             // reverse-order.
        
         if (leastToGreatest)
             return builder.build().sorted().distinct();
             // We have added them in order (mostly!!) - VERY-TRICKY, and this is the whole point... 
             // MULTIPLE, OVERLAPPING DOTPAIRS
             // We need to sort because the DotPair sublists have been added in "sorted order" but
             // the overall list is not sorted!
        
         else
             return builder.build().map(i -> -i).sorted().map(i -> -i).distinct();
             // Here, the exact same argument holds, but also, when "re-sorting" we have to futz
             // around with the fact that Java's 'IntStream' class does not have a specialized
             // reverse-sort() (or alternate-sort()) method... (Kind of another JDK bug).
        
      • iterator

        public static java.util.PrimitiveIterator.OfInt iterator​
                    (java.lang.Iterable<DotPair> dpi,
                     boolean leastToGreatest)
        
        Convenience Method. Invokes toStream(Iterable, boolean)
        Code:
        Exact Method Body:
        1
         return toStream(dpi, leastToGreatest).iterator();
        
      • toPosArray

        public static int[] toPosArray​(java.lang.Iterable<DotPair> dpi,
                                       boolean leastToGreatest)
        Convenience Method. Invokes toStream(Iterable, boolean)
        Code:
        Exact Method Body:
        1
         return toStream(dpi, leastToGreatest).toArray();