- All Superinterfaces:
- Functional Interface:
- This is a functional interface and can therefore be used as the assignment target for a lambda expression or method reference.
@FunctionalInterface public interface LinksGet extends java.util.function.BiFunction<java.net.URL,java.util.Vector<HTMLNode>,java.util.Vector<java.lang.String>>, java.io.Serializable
LinksGet - Documentation.
class ScrapeURLsis asked to retrieve all news-paper web-site
Article URL's, it can do so by passing an instance passed of this
class 'LinksGet'. Passing a non-null reference of this
class LinksGetis not mandatory, but it can help for news-sites where identifying which
URL'sare pointing newspaper articles and which
URL'sare pointing to advertisements, or other extraneous, non-news locatons. If an instance of this class is not passed to the
ScrapeURLsclass, then the class will retrieve all
URL'sfound on the page. Remember, it is not mandatory to pass a
ScrapeURLs, and even if 'extraneous' links are retrieved, the programmer may still pass a
ScrapeURLsto ensure advertisements, and other off-topic pages are avoided.
PRIMARY USE: Sites in which
Article URL'sare located in very well understood and specified areas (or, rather "sections"} of the news-site main-pages should make use of this class. The
URLFiltermechanism can require that only
Article URL'sthat match certain regular-expressions will pass the
Article URLscrape-logic. This
class LinksGetallows a user to specify areas and locations on the page for finding the links - regardless of the structure or properties of the web-page itself. There is are two example of links-getters, below, used for news-site scraping.
Unlike the functional-interface
'ArticleGet', this class does not provide any simple or straight-forward factory-methods for generating an instances of
LinksGet. In fact, even using this class might seem "Redundant" in the parameter list, since the parameter "URLFilter" can accomplish the act of filtering which URL's are included, and which are not. Knowing how every news-site on the internet functions is beyond the scope of this project - and this class shall remain intact and included among the parameters to the method
ScrapeURLs.get(...)- even though it is usually easier to implement one of the factory instances of
This example is used for scraping the Spanish (from Spain, not Mexico) news-site "ABC.ES." The Java Lambda Syntax
->is used to construct this Functional Interface:
This example is used for scraping the Chinese Government Website
'www.Gov.CN'The Java Lambda-Expression Syntax
->is used to construct this
static final long serialVersionUIDThis fulfils the SerialVersion UID requirement for all classes that implement Java's
interface java.io.Serializable. Using the
SerializableImplementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.
Functional Interfaces are usually not thought of as Data Objects that need to be saved, stored and retrieved; however, having the ability to store intermediate results along with the lambda-functions that helped get those results can make debugging easier.
- See Also:
- Constant Field Values
- Exact Field Declaration Expression:
applyFUNCTIONAL-INTERFACE METHOD: This is the method that fulfills this
functional-interface 'apply'method. The purpose of this method is to retrieve all of the relevant HTML Anchor Elements from a news-website.
- Specified by:
URLof a section of a newspaper, or content, website.
page- The download of that
URLinto a vectorized-html page.
- A list of all the
TagNode'sthat have relevant