Package Torello.HTML

Interface URLFilter

  • All Superinterfaces:
    java.util.function.Predicate<java.net.URL>, java.io.Serializable
    Functional Interface:
    This is a functional interface and can therefore be used as the assignment target for a lambda expression or method reference.

    @FunctionalInterface
    public interface URLFilter
    extends java.util.function.Predicate<java.net.URL>, java.io.Serializable
    URLFilter - Documentation.

    The purpose of this filter is to simply give a name to the concept of "skipping" or "avoiding" some of the links that are listed on a web-page. While crawling through HTML from a news-site, there will eventually be links that are irrelevant. Implement this FunctionalInterface, or use a lambda-expression to create a URL-filter to skip certain web-address URL's and/or image-URL's.

    WRAPPING class StrFilter: Using the class StrFilter to quickly build URLFilter instances will make the job of testing and filtering the String's in a URL a lot easier, and get done more quickly with more error-checking.
    See Also:
    StrFilter



    • Field Detail

      • serialVersionUID

        static final long serialVersionUID
        This fulfils the SerialVersion UID requirement for all classes that implement Java's interface java.io.Serializable. Using the Serializable Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.

        Functional Interfaces are usually not thought of as Data Objects that need to be saved, stored and retrieved; however, having the ability to store intermediate results along with the lambda-functions that helped get those results can make debugging easier.
        See Also:
        Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        public static final long serialVersionUID = 1;
        
      • imagesKEEP

        static final URLFilter imagesKEEP
        This URLFilter will KEEP any Image URL's whose name ends with the standard image filenames.

        WARNING: There are occasions where an Image-URL is "handled" by a web-server internally, and the actual URL itself does not look like an image file-name at all. This has the inconvenient implication for this (factory-generated) Predicate that it might return erroneous results. An actual image file that does not end with '.jpg' or '.bmp' could be rejected, and a URL that happens to end with these String's but is not an image, might also be kept.
        See Also:
        StrCmpr.endsWithXOR_CI(String, String[])
        Code:
        Exact Field Declaration Expression:
        1
        2
        3
        4
        5
        public static final URLFilter imagesKEEP = (URL url) ->
            {
                return StrCmpr.endsWithXOR_CI
                    (url.toString().trim(), ".jpg", ".jpeg", ".gif", ".png", ".bmp");
            };
        
      • imagesREJECT

        static final URLFilter imagesREJECT
        This URLFilter will REJECT any Image URL's whose name ends with the standard image filenames.

        WARNING: There are occasions where an Image-URL is "handled" by a web-server internally, and the actual URL itself does not look like an image file-name at all. This has the inconvenient implication for this (factory-generated) Predicate that it might return erroneous results. An actual image file that does not end with '.jpg' or '.bmp' could be kept, and a URL that happens to end with these String's but is not an image, could be rejected.
        See Also:
        StrCmpr.endsWithNAND_CI(String, String[])
        Code:
        Exact Field Declaration Expression:
        1
        2
        3
        4
        5
        public static final URLFilter imagesREJECT = (URL url) ->
            {
                return StrCmpr.endsWithNAND_CI
                    (url.toString().trim(), ".jpg", ".jpeg", ".gif", ".png", ".bmp");
            };
        
    • Method Detail

      • test

        boolean test​(java.net.URL url)
        FUNCTIONAL-INTERFACE BOOLEAN METHOD: This is the method that fulfils this functional-interface 'test(...)' method.

        This method will receive a URL. The purpose of this method is to provide an easy means to filter certain URL's from a URL-generating list.

        PRECISE NOTE: This method should return FALSE if the passed URL should be skipped. A return value of TRUE implies that the URL is not to be ignored or passed over, but rather 'kept.'

        NOTE: This behavior is compatible with the Java Stream's method "filter(Predicate<...>)".
        Specified by:
        test in interface java.util.function.Predicate<java.net.URL>
        Parameters:
        url - This is a URL that will be checked against the constraints specified by 'this' filter.
        Returns:
        When implementing this method, returning TRUE must mean that the URL has passed the filter's test-requirements (and will subsequently be retained by whatever code is carrying out the filter operation).
      • and

        default URLFilter and​(URLFilter other)
        This is the standard-java Predicate method 'and'. If a user wants to apply two URLFilters, this method will use a lambda-expression to create a new URLFilter that "logically-AND's" the 'other' Predicate with 'this' Predicate.
        Parameters:
        other - Some other URLFilter - one that does some other test.
        Returns:
        A new Java-Predicate that performs both tests ('this' and 'other'), and returns the logical-AND.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         // FAIL-FAST: Check that the user-provided-parameters would not cause exceptions once
         // the lambda-predicate is invoked.
         if (other == null) throw new NullPointerException
             ("The parameter 'other' to URLFilter.and(other) was null.");
        
         return (URL url) -> this.test(url) && other.test(url);
        
      • or

        default URLFilter or​(URLFilter other)
        This is the standard-java Predicate method 'or'. If a user wants to apply two URLFilters, this method will use a lambda-expression to create a new URLFilter that "logically-OR's" the 'other' Predicate with 'this' Predicate.
        Parameters:
        other - Some other URLFilter - one that does some other test.
        Returns:
        A new Java-Predicate that performs both tests ('this' and 'other'), and returns the logical-OR.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         // FAIL-FAST: Check that the user-provided-parameters would not cause exceptions once
         // the lambda-predicate is invoked.
         if (other == null) throw new NullPointerException
             ("The parameter 'other' to URLFilter.or(other) was null.");
        
         return (URL url) -> this.test(url) || other.test(url);
        
      • negate

        default URLFilter negate()
        This is the standard-java predicate method 'not'. This method will use a lambda-expression to create a new URLFilter that is the "logical-NOT" of the original test (a.k.a. 'this' Predicate).
        Specified by:
        negate in interface java.util.function.Predicate<java.net.URL>
        Returns:
        A new Java-Predicate that performs negates the results of 'this' Predicate.
        Code:
        Exact Method Body:
        1
         return (URL url) -> ! this.test(url);
        
      • filter

        default int filter​(java.lang.Iterable<java.net.URL> urls)
        This is similar to the java streams function filter(Predicate<>). Elements that do not meet the criteria specified by this (factory-generated) URLFilter - specifically, if an element of the input-parameter 'urlList' would evaluate to FALSE - then that element shall be removed from the list.
        Parameters:
        urls - An Iterable of URL's which the user would like filtered using 'this' filter.
        Returns:
        The number of elements that were removed from parameter 'urls' based on the results of the URLFilter.test() of 'this' instance.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
        7
        8
         int             removeCount = 0;
         Iterator<URL>   iter        = urls.iterator();
        
         // If the filter test returns FALSE, then remove the URL from the collection.
         // Increment the removeCount Counter.
         while (iter.hasNext()) if (! test(iter.next())) { removeCount++; iter.remove(); }
        
         return removeCount;
        
      • fromStrFilter

        static URLFilter fromStrFilter​(StrFilter sf)
        This wraps a StrFilter inside of a URLFilter. The String-comparison that is performed will use the full-path-name of the URL.

        StrFilter NOTE: The class 'StrFilter' can be used in conjunction with the class-specific filters, for instance, this class 'URLFilter'
        Parameters:
        sf - This is a String Predicate that has (usually, but not required) been built by one of the many String-Filter Factory-Build static-methods of class StrFilter. The Predicate's that are constructed via the build methods of StrFilter call the standard method java.lang.Object.toString() on the objects they receive for testing.
        Returns:
        FileNodeFilter This will return an instance of a URLFilter that will test the url as a String.
        See Also:
        StrFilter
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         if (sf == null) throw new NullPointerException(
             "The String-Filter Predicate Parameter 'sf' in static-factory builder method " +
             "'fromStrFilter' was passed a null value."
         );
        
         return (URL url) -> sf.test(url);