Class NewsSite

  • All Implemented Interfaces:
    java.io.Serializable

    public class NewsSite
    extends java.lang.Object
    implements java.io.Serializable
    NewsSite - Documentation.

    This class is intended to allow a programmer to store the entire list of object references necessary to download a day's news-content from a news website. This class may be serialized, and saved to disk.
    See Also:
    Serialized Form



    • Constructor Detail

      • NewsSite

        public NewsSite​(java.lang.String siteName,
                        Country country,
                        java.lang.String siteURLAsStr,
                        LC languageCode,
                        java.lang.String description,
                        java.util.Vector<java.net.URL> sectionURLs,
                        URLFilter filter,
                        LinksGet linksGetter,
                        ArticleGet articleGetter,
                        StrFilter bannerAndAddFinder)
        Simple constructor for this data-class.
        Parameters:
        siteName - This site's name
        country - The country-of-origin for this news web-site.
        siteURLAsStr - The primary URL for the news web-site.
        languageCode - If this site uses a non-English system, the 'languageCode' parameter can keep track of the language.
        description - Brief Description of the site.
        sectionURLs - This should list the primary news-sections on the web-site. News sections include lists such as "Life", "Health", "Business", "World News", "Sports" - but this list could actually include just about anything.
        filter - If, when scraping a section, there are URL's that need to be filtered, this parameter can help filtering non-Article, non-news links. As explained in the class ScrapeURL's, this is often a simple one-lined lambda-expression that identifies which URL's match a Regular-Expression Pattern.
        linksGetter - This is a 'getter', which also is often just a one line regular-expression lambda for retrieving the links from a section web-page.
        articleGetter - This should implement the ArticleGet interface.
        bannerAndAddFinder - Filter for finding repetitive ads or banners.
    • Method Detail

      • sectionURLsIter

        public java.util.Iterator<java.net.URL> sectionURLsIter()
        Retrieves the Section URL's (life, comedy, sports, business, world) for this news-site
        Returns:
        An Iterator<URL> of the different sections for a particular news-site.
        Code:
        Exact Method Body:
        1
         return new RemoveUnsupportedIterator<URL>(sectionURLs.iterator());
        
      • sectionURLsVec

        public java.util.Vector<java.net.URL> sectionURLsVec()
        Retrieves the Section URL's (life, comedy, sports, business, world) for this news-site
        Returns:
        A Vector<URL> of the different sections for a particular news-site.
        Code:
        Exact Method Body:
        1
         return (Vector<URL>) sectionURLs.clone();