public class ForeignNewsArticle extends java.lang.Object
Foreign News Article - Documentation.
This class will easily translate the contents of a news-article that is any any language that may be translated using the Google Cloud Server Translate API into English. This does a very simple rendition of translation. It expects the user of this class to "pick out the article content" and providing that vectorized-HTML sub-page to the
processArticle(...)method of this class.
This class will:
- Translate the text from the native-language to English.
- Generate a side-by-side article with both original-language and English article content
- Save the page as an
"index.html"file in the user-specified directory
- Download any photos present in the HTML
- Re-name the photo file-names, after downloading them to a local user-specified directory.
- Update the page HTML
<IMG SRC="...">nodes accordingly with the new image names.
In order to Translate a Foreign Language News Article into English or Spanish - this is the only class that is really needed. It does a "simple-translation" using the Google Cloud Server Translate API.
IMPERATIVE: This class makes calls to the GCSTAPI, and therefore, Google is going to want an "API Key" so that it can bill your account for the translations. It has been explained that this Java package is not going to eat your API-key, but indeed it is going to expect one for these classes to work.
class GCSTAPIhas a field simple called
public static String keythat needs to be set to a valid GCS Translate API Key, because otherwise the API calls will fail. You may read more about this on Google's website, and in the class Torello.Languages.GCSTAPI.
FINALLY: This class makes calls to
class ImageScraper, which uses a Time-Out monitor-thread to prevent locking up when downloading images. However, when your program exists, it may sit idle for anywhere between 1 second and 1 minute, because the Java JRE does not automatically kill all threads - even when program flow exits and terminates.
To solve this problem immediately, call:
Static (Functional) API: The methods in this class are all
(100%)defined with the Java Key-Word / Key-Concept
'static'. Furthermore, there is no way to obtain an instance of this class, because there are no
private) constructors. Java's Spring-Boot, MVC feature is *not* utilized because it flies directly in the face of the light-weight data-classes philosophy. This has many advantages over the rather ornate Component Annotations (
@Component, @Service, @AutoWired, etc... 'Java Beans') syntax:
- The methods here use the key-word
'static'which means (by implication) that there is no internal-state. Without any 'internal state' there is no need for constructors in the first place! (This is often the complaint by MVC Programmers).
'Static'(Functional-Programming) API expects to use fewer data-classes, and light-weight data-classes, making it easier to understand and to program.
Vectorized HTMLdata-model allows more user-control over HTML parse, search, update & scrape. Also, memory management, memory leakage, and the Java Garbage Collector ought to be intelligible through the 'reuse' of the standard JDK
class Vectorfor storing HTML Web-Page data.
The power that
object-oriented programmingextends to a user is (mostly) limited to data-representation. Thinking of "Services" as "Objects" (Spring-MVC, 'Java Beans') is somewhat 'over-applying' the
Object OrientedProgramming Model. Like most classes in the
Java-HTML JAR Library, this class backtracks to a more C-Styled
Functional Programming Model(no Objects) - by re-using (quite profusely) the key-word
staticwith all of its methods, and by sticking to Java's well-understood
Static Field: The methods in this class do not create any internal state that is maintained - but there is a single
private & staticfield defined. This field is instantiated only once during the
Class Loaderphase (and only if this class shall be used), and serves as a
data 'lookup'field (like a static constant). View this class' source-code in the link provided below to see internally used data.
The internal field is a
public static final Stringthat stores the HTML Header Page portion of the returned
Fields Modifier and Type Field
All Methods Static Methods Concrete Methods Modifier and Type Method
processArticle(Vector<HTMLNode> articleBody, URL url, String title, LC srcLang, Appendable log, String targetDirectory)
public static final java.lang.String HEADERThis is the HTML page header that is appended to the output page.
- Exact Field Declaration Expression:
public static Ret3<java.util.Vector<java.lang.String>,java.util.Vector<java.lang.String>,java.lang.String> processArticle (java.util.Vector<HTMLNode> articleBody, java.net.URL url, java.lang.String title, LC srcLang, java.lang.Appendable log, java.lang.String targetDirectory) throws java.io.IOExceptionThis will download and translate a news article from a foreign news website. All that you need to do is provide the main "Article-Body" of the article, and some information - and calls to Google Cloud Server Translate API will be handled by the code.
IMPORTANT NOTE: This class makes calls to the GCSTAPI, which is an acronym meaning the Google Cloud Server Translate API. This server expects you to pay Google for the services that it provides. The translations are not free - but they are not too expensive either. You must be sure to set the
class GSCTAPI -> String keyfield in order for the GGCS Translate API Queries to succeed.
Your Directory Will Contain:
- Article Photos, stored by number as they appear in the article
index.html- Article Body with Translations
articleBody- This should have the content of the article from the vectorized HTML page. Read more about cleaning an HTML news article in the class ArticleGet.
url- This article's URL to be scraped. This is used, only, for including a link to the articles original page on the output index.html file.
title- This is needed because obtaining the title can be done in myraid ways. If it is kept as an "external option" - this provides more leeway to the coder/programmer.
srcLang- This is just the "two character" language code that Google Cloud Server expects to see.
log- This logs progress to terminal out. Null may be passsed, in which case output will not be displayed. Any implementation of
java.lang.Appendablewill suffice. Make note that the 'Appendable' interface allows / requires heeding IOException's for it's 'append(...)' methods.
targetDirectory- This is the directory where the image-files and 'index.html' file will be stored.
- This will return an instance of:
Ret3<Vector<String>, Vector<String>, String>
This vector contains a list of sentences, or sentence-fragments, in the original language of the news or article.
This vector contains a list of sentences, or sentence-fragments, in the target language, which is english.
This array of strings contains a list of filenames, one for each image that was present on the original news or article page, and therefore downloaded.