Package Torello.Java

Class LFEC.GCSSB

  • Enclosing class:
    LFEC

    public static class LFEC.GCSSB
    extends java.lang.Object
    GCSSB: Google Cloud Server Storage Bucket - Documentation


    The following public static (inner) class provides the exact same set of features that class LFEC provides - namely data-file loading with failing print's and system halts - but for files that are stored on the Google Cloud Server Storage Bucket Infrastructure.

    The set classes of classes and methods that are 'exported' by the HTML search and scrape package, and the other packages in this Torello Jar File are neither dependent upon, nor do they mandate or require use of Google Cloud Services, the 'Cloud Shell' development-environment that google provides (for free), nor any of the Storage-Buckets or storage-systems that Google Corporation operates. Literally all of the development work for package HTML was done on Google's Cloud Shell, but hopefully it will all run - in the spirit of Java's write-once, run-anywhere platform proclamations - anywhere that the JAR files are loaded. If you are planning to utilize Google's Cloud Server to either develop your code, or to host your web-site, the method & classes that link to its servers in this class-infrastructure might seem invaluable. The "Storage Buckets" system, for instance, which they offer are pretty cheap - I have about 20 to 30 Gigabytes there right now, and have paid probably $1.50 per month to them...

    USING GCS: In order to use the Google Cloud Services platform, you will have to do one of the following two means of communicating commands and instructions to their servers:

    1. Make json calls to the appropriate server-names using whatever "usual http-connect methods" that you employ, and parse the json response object. Parsing json-response objects can be really easy if you know how to use regular-expressions, but the "Jackson" library also does this for free. Download a copy of the "Jackson" java jar-file, and read the "Jackson JSON Interpreter JavaDoc's" on the internet. (Type: Jackson Java JSON parse at a "google search prompt").
    2. ... or ... spend however long it takes to read, figure-out, understand (or interpret) Google's GCS Java-Library and write java-code using their JSON-free java-library making direct calls to Google's GCS servers.


    HERE, JAVA: The methods in this class do not use the "JSON" version of communicating with Google-Servers, instead, they use the java class-libraries that Google Exports. I do not know which the "official version" of it's jar-file is, but you may download an "unofficial-google version" from my web-site, which does work. I have not tampered with it, or altered the classes in any-way, other than to remove some of the complicated "extras" that were added.

    DOWNLOAD HERE: This is a download of file "GCS.jar" which has the classes and methods needed to access the translate API, the Storage Bucket API, and the Authorization classes "o-auth 2.0". If you make any calls to the methods here that facilitate access to the Storage Buckets, you will need to include some version of Google's GCS jar file. This is the jar-file I created, if you don't trust Torello.Directory, please don't use it:


    I am not really able to make any proclamations for or against about the contents of this particular jar-file. I am, personally, not a fan of the Java-Enabled versions of its services that are provided by Google, primarily because I have almost no idea how to use them (Google-Java is very poorly documented, and both the classes and the methods they offer are not well thought as API's - but rather they just expose their own internal stuff). One may usually figure GCS Services using the JSON, and just go with making JSON calls to-and-from GCS. Anyway, in the above java-jar file many (but not all) of the classes necessary to avoid falling back on JSON calls to GCS are available. Using Google's jar-files, it is possible to make "plain old java method calls" instead of using JSON. Most Importantly: You do not need this jar file included in your classpath for any of the methods or classes in this scrape package at all - unless - you have a GCS account, and want to use their Storage Buckets. If so, the two methods in class 'Programming' that use the term: 'GCSSB' an acronym that means "Google Cloud Server Storage Buckets"

    FINALLY: The following code snippet should explain how to provide an "o-auth-2.0" to obtain an instance of the class Storage needed to begin communicating with Google Cloud Server Storage Buckets:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    import com.google.cloud.storage.*;
    import com.google.auth.oauth2.*;
    ...
         *
    String   PROJECT_ID        = "The name you gave your project in which your storage buckets reside";
    String   PATH_TO_JSON_KEY  = "The JSON key is generated when you create a 'Service Account' to your project.";
         *
    Storage  storage           = StorageOptions
                                    .newBuilder()
                                    .setProjectId(PROJECT_ID)
                                    .setCredentials(GoogleCredentials.fromStream(new FileInputStream(PATH_TO_JSON_KEY)))
                                    .build()
                                    .getService();
    


    SPECIAL NOTE: I have never needed my GoogleCredentials "o auth 2.0" key until I started using the google-java jar-libraries for connecting with Storage Buckets directly.

    • When writing to my web-domain, using the command line program GSUTIL was usually much easier and eliminated the strict dependency on Google's web-hosting platform that would be mandated if using their Java-Libraries.
    • When translating Chinese Government web-publications using the Translate API, a Google "API Key" usually sufficed. An "API Key" is a 30 to 40 characters string that identifies the billable account to the server. Just don't share it on the web, and it will work fine.
    • Even the Vision / OCR (Optical Character Recognition) API's seemed to work without oauth2.0. I did have to make HTTP connections and interpret JSON, rather than using bona-fide java method invocations, but it was all fine. If you are going to use java-methods with the storage-buckets, get an "o-auth-2.0" key, and save the JSON file to your file-system. Keep that key private, and the previous code is how to "login to google" from java and start saving storing files to the cloud using their java jar-files.


    It should be interesting to note, that often, this method invocation works - and I don't actually know why, and cannot explain it. This leaves out the authentication portion from the Storage object, and sometimes still allows file access to the files in your storage-buckets. You may play around with it:
    1
    2
    3
    4
    import import com.google.cloud.storage.*;
    ...
    
    private static final Storage storage = StorageOptions.getDefaultInstance().getService();
    


    NOTE: The easiest way to use this class is to make sure to import the following line in your import package's (and classes!) section of your class-file. When you include the complete path name to a static-inner class in a Java 'import package' section of your java-class-definition ('.java' file), then through-out the remainder of your code, you may make calls to this class directly, without being required to type/enter the entire class-name on each and every line.
    1
    2
    3
    4
    5
    6
    7
    // When this line is included, Using class' static method is done as follows:
    // GCSSB.loadFileToString(storage, "mybucket", "myFile.txt");
    // If this import-statement is not used, then (as for all static/internal classes), you must use
    // the entire class-name:
    // LFEC.GCSSB.loadFileToString( ... );
    
    import Torello.Java.LFEC.GCSSB;
    


    Static (Functional) API: The methods in this class are all (100%) defined with the Java Key-Word / Key-Concept 'static'. Furthermore, there is no way to obtain an instance of this class, because there are no public (nor private) constructors. Java's Spring-Boot, MVC feature is *not* utilized because it flies directly in the face of the light-weight data-classes philosophy. This has many advantages over the rather ornate Component Annotations (@Component, @Service, @AutoWired, etc... 'Java Beans') syntax:

    • The methods here use the key-word 'static' which means (by implication) that there is no internal-state. Without any 'internal state' there is no need for constructors in the first place! (This is often the complaint by MVC Programmers).
    • A 'Static' (Functional-Programming) API expects to use fewer data-classes, and light-weight data-classes, making it easier to understand and to program.
    • The Vectorized HTML data-model allows more user-control over HTML parse, search, update & scrape. Also, memory management, memory leakage, and the Java Garbage Collector ought to be intelligible through the 'reuse' of the standard JDK class Vector for storing HTML Web-Page data.

    The power that object-oriented programming extends to a user is (mostly) limited to data-representation. Thinking of "Services" as "Objects" (Spring-MVC, 'Java Beans') is somewhat 'over-applying' the Object Oriented Programming Model. Like most classes in the Java-HTML JAR Library, this class backtracks to a more C-Styled Functional Programming Model (no Objects) - by re-using (quite profusely) the key-word static with all of its methods, and by sticking to Java's well-understood class Vector

    Internal-State: A user may click on this class' source code (see link below) to view any and all internally defined fields class. A cursory inspection of the code would prove that this class has precisely zero internally defined global fields (Spaghetti). All variables used by the methods in this class are local fields only, and therefore this class ought to be though of as 'state-less'.



    • Constructor Summary

      Constructors 
      Constructor
      GCSSB()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method
      static String loadFileToString​(com.google.cloud.storage.Storage storage, String bucket, String completeFileName)
      static Vector<String> loadFileToVector​(com.google.cloud.storage.Storage storage, String bucket, String completeFileName, boolean includeNewLine)
      static <T> T readObjectFromFile​(com.google.cloud.storage.Storage storage, String bucket, String completeFileName, boolean zip, Class<T> returnClass)
      static void writeFile​(CharSequence fileAsStr, com.google.cloud.storage.Storage storage, String bucket, String completeFileName, boolean ASCIIorUTF8)
      static void writeObjectToFile​(Object o, com.google.cloud.storage.Storage storage, String bucket, String completeFileName, boolean zip)
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

    • Method Detail

      • readObjectFromFile

        public static <T> T readObjectFromFile​
                    (com.google.cloud.storage.Storage storage,
                     java.lang.String bucket,
                     java.lang.String completeFileName,
                     boolean zip,
                     java.lang.Class<T> returnClass)
        
        This will read a Java Serialized java.lang.Object from a location in a Google Cloud Server Storage Bucket.

        NOTE: This uses Java's variable-type parameter syntax. The variable-type parameter used is the Class<T> returnClass parameter. To easily provide a value for this Class parameter, simple type the name of the java 'class' whose type is expected to return from reading the data-file followed by the word 'class'. For example, to read a java.util.Vector from a data-file, pass Vector.class to this parameter. It is a 'little-known' fact that each and every instance of java.lang.Object has a Class class field inside.
        Parameters:
        storage - This must be an instance of Google Cloud Server's class Storage. The description at the top of this class should elucidate how to obtain such an Object-instance. If the explanation is not very clear, or if it is not working (any more), please just go to a google-search-bar and look for information about the Google Cloud Server Storage Buckets Java API, class 'Storage'. API's have been known to change, once in a while.
        bucket - The bucket name of the bucket from a Google Cloud Server account.
        completeFileName - This String-parameter needs to be the complete String representation of the directory-name plus the file-name of the location where the java serialized Object file was saved, or will be saved.

        GCS Storage-Buckets: The bucket-name as-a-string should **not** be included as a part of this String-parameter. However, both the file-name, and the directory-name where this file is residing must be present.
        zip - When this parameter is TRUE the serialized object will be run through Java's java.util.zip.GZIPInputStream and GZIPOutputStream when reading/writing the serialized-Object. If this parameter is FALSE, GZIP Compression will not be used when serializing or de-serializing the Object.
        returnClass - This is the type expected to be found by Java in the Serialized Object Data-File. If an Object is read from this location, but it does not have the type indicated by this parameter, the program will also halt, and an explanatory exception message will be printed to the console/terminal.
        Returns:
        A de-serialized java java.lang.Object that has been read from a GCS Storage Bucket, and cast to the type denoted by parameter returnClass.
        See Also:
        FileRW.readObjectFromFile(String, boolean), FileRW.readObjectFromFileNOCNFE(String, boolean), LFEC.readObjectFromFile(String, boolean, Class), LFEC.ERROR_EXIT(String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
         try
         {
             byte[]                  bArr    = storage.get(bucket, completeFileName).getContent();
             ByteArrayInputStream    bis     = new ByteArrayInputStream(bArr);
             ObjectInputStream       ois     = zip   ? new ObjectInputStream(new GZIPInputStream(bis))
                                                     : new ObjectInputStream(bis);
             Object                  ret     = ois.readObject();
        
             if (! returnClass.isInstance(ret))
                 ERROR_EXIT(
                     "Serialized Object read from GCS Storage Bucket: " + bucket + "\n" +
                     "And file-name: " + completeFileName + "\n" +
                     "Using expected (" + (zip ? "zip-compression" : "no-compression") + ")\n" +
                     "Didn't have an object with class-name: " + returnClass + "\n" +
                     "But rather with className: " + ret.getClass().getName()
                 );
        
             ois.close(); bis.close();
             return returnClass.cast(ret);
         }
         catch (Exception e)
         {
             ERROR_EXIT(
                 "Serialized Object read from GCS Storage Bucket: " + bucket + "\n" +
                 "And file-name: " + completeFileName + "\n" +
                 "Using expected (" + (zip ? "zip-compression" : "no-compression") + ")\n" +
                 "And Expected class-name: " + returnClass + "\n" +
                 "Experienced an Exception: \n" + EXCC.toString(e)
             );
        
             return null; // Cannot reach this statement
         }
        
      • loadFileToString

        public static java.lang.String loadFileToString​
                    (com.google.cloud.storage.Storage storage,
                     java.lang.String bucket,
                     java.lang.String completeFileName)
        
        This merely loads a text-file from Google's Storage Bucket infrastructure into a String. Make sure to check that the file you are loading does indeed have text-content.
        Parameters:
        storage - This must be an instance of Google Cloud Server's class Storage. The description at the top of this class should elucidate how to obtain such an Object-instance. If the explanation is not very clear, or if it is not working (any more), please just go to a google-search-bar and look for information about the Google Cloud Server Storage Buckets Java API, class 'Storage'. API's have been known to change, once in a while.
        bucket - The bucket name of the bucket from a Google Cloud Server account.
        completeFileName - This String-parameter needs to be the complete String representation of the directory-name plus the file-name of the location where the java serialized Object file was saved, or will be saved.

        GCS Storage-Buckets: The bucket-name as-a-string should **not** be included as a part of this String-parameter. However, both the file-name, and the directory-name where this file is residing must be present.
        Returns:
        The text file on Google Cloud Server's Storage Bucket file/directory returned as a java.lang.String
        Code:
        Exact Method Body:
        1
         return new String(storage.get(bucket, completeFileName).getContent());
        
      • loadFileToVector

        public static java.util.Vector<java.lang.String> loadFileToVector​
                    (com.google.cloud.storage.Storage storage,
                     java.lang.String bucket,
                     java.lang.String completeFileName,
                     boolean includeNewLine)
        
        This merely loads a text-file from Google's Storage Bucket infrastructure into a String. Make sure to check that the file you are loading does indeed have text-content.
        Parameters:
        storage - This must be an instance of Google Cloud Server's class Storage. The description at the top of this class should elucidate how to obtain such an Object-instance. If the explanation is not very clear, or if it is not working (any more), please just go to a google-search-bar and look for information about the Google Cloud Server Storage Buckets Java API, class 'Storage'. API's have been known to change, once in a while.
        bucket - The bucket name of the bucket from a Google Cloud Server account.
        completeFileName - This String-parameter needs to be the complete String representation of the directory-name plus the file-name of the location where the java serialized Object file was saved, or will be saved.

        GCS Storage-Buckets: The bucket-name as-a-string should **not** be included as a part of this String-parameter. However, both the file-name, and the directory-name where this file is residing must be present.
        includeNewLine - This tells the method to include, or not-include, a '\n' (newline) character to each String.
        Returns:
        The text file on Google Cloud Server's Storage Bucket file/directory until as a Vector of String's.
        See Also:
        loadFileToString(Storage, String, String)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
         String          s   = loadFileToString(storage, bucket, completeFileName);
         Vector<String>  ret = new Vector<>();
        
         int pos     = 0;
         int delta   = includeNewLine ? 1 : 0;
         int lastPos = 0;
        
         while ((pos = s.indexOf('\n')) != -1)
         {
             ret.add(s.substring(lastPos, pos + delta));
             lastPos = pos + 1;
         }
        
         if (lastPos < s.length()) ret.add(s.substring(lastPos));
        
         return ret;
        
      • writeFile

        public static void writeFile​(java.lang.CharSequence fileAsStr,
                                     com.google.cloud.storage.Storage storage,
                                     java.lang.String bucket,
                                     java.lang.String completeFileName,
                                     boolean ASCIIorUTF8)
        This will write the contents of a java 'CharSequence' - includes String, StringBuffer & StringBuilder to a file on Google Cloud Server's storage bucket system.
        Parameters:
        storage - This must be an instance of Google Cloud Server's class Storage. The description at the top of this class should elucidate how to obtain such an Object-instance. If the explanation is not very clear, or if it is not working (any more), please just go to a google-search-bar and look for information about the Google Cloud Server Storage Buckets Java API, class 'Storage'. API's have been known to change, once in a while.
        bucket - The bucket name of the bucket from a Google Cloud Server account.
        completeFileName - This String-parameter needs to be the complete String representation of the directory-name plus the file-name of the location where the java serialized Object file was saved, or will be saved.

        GCS Storage-Buckets: The bucket-name as-a-string should **not** be included as a part of this String-parameter. However, both the file-name, and the directory-name where this file is residing must be present.
        ASCIIorUTF8 - When writing java String's the file-system, it is generally not to important to worry about whether java has stored an 'ASCII' encoded String, or a String encoded using 'UTF-8'. Most foreign-language news-sites require the latter ('UTF-8'), but any site that is strictly English can get by with plain old ASCII.

        IMPORTANT: When this boolean is TRUE, this method will attempt to presume the character-sequence you have passed is in ASCII, and write it that way. When this boolean is set to FALSE, this method will attempt to write the String of byte's as a 'UTF-8' encoded character-set.

        ALSO: I have not made any allowance for Unicode or Unicode little endian, because I have never used them with either the Chinese or Spanish sites I scrape. UTF-8 has been the only other character set I encounter.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
        7
        8
         BlobInfo blobInfo = BlobInfo.newBuilder
             (BlobId.of(bucket, completeFileName)).setContentType("text/plain").build();
        
         byte[] file = ASCIIorUTF8
                 ? fileAsStr.toString().getBytes()
                 : fileAsStr.toString().getBytes(java.nio.charset.Charset.forName("UTF-8"));
        
         Blob blob = storage.create(blobInfo, file);
        
      • writeObjectToFile

        public static void writeObjectToFile​
                    (java.lang.Object o,
                     com.google.cloud.storage.Storage storage,
                     java.lang.String bucket,
                     java.lang.String completeFileName,
                     boolean zip)
                throws java.io.IOException
        
        This will write a Java Serializable Object to a location in a Google Cloud Server Storage Bucket.
        Parameters:
        storage - This must be an instance of Google Cloud Server's class Storage. The description at the top of this class should elucidate how to obtain such an Object-instance. If the explanation is not very clear, or if it is not working (any more), please just go to a google-search-bar and look for information about the Google Cloud Server Storage Buckets Java API, class 'Storage'. API's have been known to change, once in a while.
        o - This may be any Serializable Java Object. Serializable Java Objects are ones which implement the interface java.io.Serializable.
        bucket - The bucket name of the bucket from a Google Cloud Server account.
        completeFileName - This String-parameter needs to be the complete String representation of the directory-name plus the file-name of the location where the java serialized Object file was saved, or will be saved.

        GCS Storage-Buckets: The bucket-name as-a-string should **not** be included as a part of this String-parameter. However, both the file-name, and the directory-name where this file is residing must be present.
        zip - When this parameter is TRUE the serialized object will be run through Java's java.util.zip.GZIPInputStream and GZIPOutputStream when reading/writing the serialized-Object. If this parameter is FALSE, GZIP Compression will not be used when serializing or de-serializing the Object.
        Throws:
        java.io.IOException
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
         BlobId                  blobId      = BlobId.of(bucket, completeFileName);
         BlobInfo                blobInfo    = BlobInfo.newBuilder(blobId).setContentType("text/plain").build();
         ByteArrayOutputStream   baos        = new ByteArrayOutputStream();
         ObjectOutputStream      oos         = zip
                                                 ? new ObjectOutputStream(new GZIPOutputStream(baos))
                                                 : new ObjectOutputStream(baos);
        
         oos.writeObject(o); oos.flush(); baos.flush(); oos.close();
        
         byte[]                  bArr        = baos.toByteArray();
         Blob                    blob        = storage.create(blobInfo, bArr);