Class Article

  • All Implemented Interfaces:
    java.io.Serializable

    public class Article
    extends java.lang.Object
    implements java.io.Serializable
    Article - Documentation.

    This class will store the results from downloading / scraping a news-article from a news-site. Instances of this class are produced by calls to the class ScrapeArticles. These results can be saved to a vector, or stored to the File-System for later use. Internally they can contain the original News-Site Article Web-page, and the paired down Article-Body Web-Page.
    See Also:
    Serialized Form



    • Field Detail

      • serialVersionUID

        protected static final long serialVersionUID
        This fulfils the SerialVersion UID requirement for all classes that implement Java's interface java.io.Serializable. Using the Serializable Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.
        See Also:
        Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        protected static final long serialVersionUID = 1;
        
      • wasErrorDownload

        public final boolean wasErrorDownload
        This should inform the user that an error occurred when downloading an article. If this field, after instantiation is TRUE, all other fields in this class should be thought of as "irrelevant."
        Code:
        Exact Field Declaration Expression:
        1
        public final boolean                wasErrorDownload;
        
      • url

        public final java.net.URL url
        This is the article's URL from the news website.
        Code:
        Exact Field Declaration Expression:
        1
        public final URL                    url;
        
      • titleElement

        public final java.lang.String titleElement
        This is the title that was scraped from the main page. The title is the content of the <TITLE>...</TITLE> element on the article HTML page.
        Code:
        Exact Field Declaration Expression:
        1
        public final String                 titleElement;
        
      • originalPage

        public final java.util.Vector<HTMLNode> originalPage
        This is the original, and complete, HTML vectorized-page download. It contains the original, un-modified, article download.
        Code:
        Exact Field Declaration Expression:
        1
        public final Vector<HTMLNode>       originalPage;
        
      • articleBody

        public final java.util.Vector<HTMLNode> articleBody
        This is the pared down article-body. It is what is retrieved from class ArticleGet
        Code:
        Exact Field Declaration Expression:
        1
        public final Vector<HTMLNode>       articleBody;
        
      • imageURLs

        public final java.util.Vector<java.net.URL> imageURLs
        The image-URL's that were found in the news-article. The easiest way to think about this field is that the following instructions were called on the article-body after downloading the article:


        1
        2
        3
        4
        5
         Vector<TagNode> imageNodes  = TagNodeGet.all(article, TC.OpeningTags, "img");
         Vector<URL>     imageURLs   = Links.resolveSRCs(imageNodes, articleURL);
         
         // The results of the above call are stored in this field / Vector<URL>.
         
        
        Code:
        Exact Field Declaration Expression:
        1
        public final Vector<URL>            imageURLs;
        
      • imagePosArr

        public final int[] imagePosArr
        This list contains the "Image Positions" inside the vectorized-article for each image that was found inside the article. The easiest way to think about this field is that the following instructions were called on the article-body after downloading that article:


        1
        2
          int[] imagePosArr = TagNodeFind.all(page, TC.OpeningTags, "img");
         
        
        Code:
        Exact Field Declaration Expression:
        1
        public final int[]                  imagePosArr;
        
      • originalPageStats

        public final PageStats originalPageStats
        This contains an instance of class PageStats that has been generated out of an original Newspaper Article Page.

        Java Line of Code:
        1
        2
         this.originalPageStats = new PageStats(originalPage);
         
        
        Code:
        Exact Field Declaration Expression:
        1
        public final PageStats              originalPageStats;
        
      • processedArticleStats

        public final PageStats processedArticleStats
        This contains an instance of class PageStats that has been generated from the post-processed Newspaper Article.

        Java Line of Code:
        1
        2
         this.processedArticleStats = new PageStats(articleBody);
         
        
        Code:
        Exact Field Declaration Expression:
        1
        public final PageStats              processedArticleStats;
        
    • Constructor Detail

      • Article

        public Article​(java.net.URL url,
                       java.lang.String titleElement,
                       java.util.Vector<HTMLNode> originalPage,
                       java.util.Vector<HTMLNode> articleBody,
                       java.util.Vector<java.net.URL> imageURLs,
                       int[] imagePosArr)