Interface Pause

  • All Superinterfaces:
    java.io.Serializable

    public interface Pause
    extends java.io.Serializable
    Pause - Documentation.

    This interface allows a user to stop a download of a large number of URL's, and restart the download without beginning at the very beginning of the article URL list. This interface is only included as a separate interface, rather than as some simple static methods inside the downloader class to allow a user to specify where the state shall be saved. If the Java Virtual Machine is halted during the download process while iterating hundreds of articles, saving the intermediate state is beneficial. This interface allows a user to identify where that state shall be saved.

    The interface Pause provides a simple factory method which returns an implementation of the interface Pause that uses just a file-name, and the file-system to save intermediate state. If this is unacceptable to the user, writing a non-file-system dependant implementation of interface Pause should be easy. The only requirements made herein are saving and retrieving, when requested, three integer "state-parameters."

    Also, there is no means provided for actually halting the downloading process. This 'Pause' interface 'Pause' does not actually stop the program, but rather merely saves the intermediate and relevant counter information, and vector-index information to a small file. This enables the downloader, when requested, to start the article-download process where it left-off - if the user halted the process manually, or the process crashed during the download.

    IMPORTANT: If there are hundreds and hundreds of articles to download - which can occur if a first-time scrape of a news website is being performed, the best way to "halt the downloading" is simply to just "Press Control-C" on the keyboard. The last successful download will be in the "State Backup Monitor" - which is what this class is.



    • Field Detail

      • serialVersionUID

        static final long serialVersionUID
        This fulfils the SerialVersion UID requirement for all classes that implement Java's interface java.io.Serializable. Using the Serializable Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.
        See Also:
        Constant Field Values
        Code:
        Exact Field Declaration Expression:
        1
        public static final long serialVersionUID = 1;
        
    • Method Detail

      • saveState

        void saveState​(java.util.Vector<java.util.Vector<DownloadResult>> results,
                       int outerCounter,
                       int innerCounter,
                       int successCounter)
                throws PauseException
        This method needs to save the current download state. The three integers provided are all that the download logic needs in order to identify which newspaper article URL's have already downloaded - and, therefore, where to begin the download process after a pause or break. The instance of Vector that is required by this method's parameter list contain the "Download Results" for each news-Article in the URL list.
        Parameters:
        results - This is the two dimensional Vector that contains instances of 'DownloadResult'. Each news-Article in each section of a newspaper website has a specific location in this two dimensional Vector. As the downloader retrieves (or fails) to scrape news-Article's, the result of the scrape (or scrape-attempt) are inserted into this 2-D Vector.
        outerCounter - This is the outer-Vector index of the last URL downloaded.
        innerCounter - This is the inner-Vector index of the last URL downloaded.
        successCounter - This is how many of the URL's that were downloaded without throwing any exceptions.
        Throws:
        PauseException
      • loadState

        Ret4<java.util.Vector<java.util.Vector<DownloadResult>>,​java.lang.Integer,​java.lang.Integer,​java.lang.Integer> loadState
                    ()
                throws PauseException
        
        This method loads the state of the downloader. This can be helpful if the user wishes to "pause" the download when long-lists of article URL's are being retrieved. Also, if the downloader exits due to an exception, the state of download is maintained.
        Returns:
        An instance of Ret4<Vector<Vector<DownloadResult>>, Integer, Integer, Integer>

        • Ret4.a - The current state of the "Return Vector". This two dimensional Vector fills up with instances of enumerated-type DownloadResult.
        • Ret2.b - The outer-Vector index of the last attempted newspaper article URL download.
        • Ret2.c - The inner-Vector index of the last attempted newspaper article URL download.
        • Ret2.d - The number of article URL's that have successfully downloaded.
        Throws:
        PauseException
      • initialize

        void initialize()
                 throws PauseException
        If the Pause implementation needs initialization, it ought to implement this method.

        IMPORTANT: The initialize process should ensure that a call to loadState() will return a Ret4 data-structure whose integer fields are all equal to zero. These fields are counters, and when download begins, if they are not-zero, then many news-articles will not be scraped.

        ALSO: On initialization, the value for the 2-D Vector in the Ret4 data-structure need only be present - it does not matter what values have been inserted into it, nor the sizes of the sub-Vector's. Do note that it's values will be clobbered by the downloader if / when the downloader determines that the download process is starting at the beginning.
        Throws:
        PauseException - This exception is thrown if the implementation of this interface fails to init or load.
      • getFSInstance

        static Pause getFSInstance​(java.lang.String saveFileName)
                            throws PauseException
        This method is a static-factory method that returns an instance of this interface Pause that uses the file-system for saving the state to a user-specified file-name.
        Parameters:
        saveFileName - This is just the name of the data-file where state shall be saved. This state contains only two integers, and is, therefore, an extremely small data-file.
        Returns:
        A functioning instance of this interface - one that uses a flat file for saving state.
        Throws:
        PauseException
        Code:
        Exact Method Body:
        1
         return new PauseFS(saveFileName);