public class SplashBridge extends java.lang.Object
Splash Bridge - Documentation.
This class is more like the
MIMEclass in the Java Package, because this class is really only here to provide an a good-example for contacting an already-up-and-running Splash Server. NOTE: In the MIME class, there are just lists of software-tools all of which were once very useful - but the class itself doesn't do anything at all. This class, also, does nothing at all - other than download a copy of the Wikipedia Page for Christopher Columbus. Since these JavaDoc Pages all contain the source code for the method bodies that implement the methods, please review how to arrange the proper request into a URL when polling a Splash HTTP Server.
FIRST: Running the development on an instance of Google Cloud Shell, getting a Splash Server up and running seemed to work on the first try. The commands for starting Splash are documented on their main documentation web-page: https://splash.readthedocs.io/en/stable/install.html. I typed the two commands expected - because Google already has the required "docker" program on their system - and the HTTP Server started right up.
SECOND: Splash is claiming to be a more light-weight alternative to the Selenium Package for both polling a web-server and executing and running any Java-Script methods available on the page. The API that they export seems to be in the "Lua" language, HOWEVER since making calls to the server only requires an HTTP Connection AND SINCE the responses that a running Splash HTTP Server will return are just standard HTTP HTML responses, including an example here in this package seems reasonable. Making calls to an HTTP server is handled very well in Java already, and this package is great at parsing HTML results.
FINALLY: Not being a user of Selenium or Splash for intricate or complex Java-Script interactions with a web-page, there is no formal explanation of what is "buggy" about this external software tool. Generally, when scraping foreign news sources, there is no Java-Script at all to worry about! However, there have been quite a few times when gathering stories, from Wikipedia for example, the web-scrape was not returning the same output that was sent to a desktop web-browser. This 'Splash API' appears to be able to wait for all possible Java-Script functions to execute before returning HTML to Java - which warrants a "Bridge Class" in this package. Actually making calls to individual methods on the page will require some knowledge of the Lua Programming Language, or changing to Selenium altogether. However, since this is mostly a REST/JSON API, making API calls to the HTTP Server - even when requesting Lua Scripts to execute should not be difficult from a Java Class, if the Splash Documentation is correct.
Fields Modifier and Type Field
public static final java.lang.String SPLASH_URLOnce the
Splash HTTP Serveris running (which requires the
Dockerloading and installation tool, all one has to do is prepend this
URL, and the
Splash Script Executorwill be invoked on the HTML and Script that is received from that
- See Also:
- Constant Field Values
- Exact Field Declaration Expression:
UNIX or DOS Shell Command:
Install Docker. Make sure Docker version >= 17 is installed. Pull the image: $ sudo docker pull scrapinghub/splash Start the container: $ sudo docker run -it -p 8050:8050 --rm scrapinghub/splash
Here is an (approximate) commentary about how to run the
Splash HTTP Serveron a
Is there a Microsoft Windows version of the Splash HTTP Server (May, 2016)?
Can't find any mentioning in docs; And
bin/also appears not meant for
Splashshould work fine in
Microsoft Windowsif executed in a
Splash APIinstall instructions should be the same, once the
Docker Installeris installed.
Docker Installer Installation Instructionsfor info on how to install
java.io.IOException- If there are any
HTTPerrors when downloading or processing the HTML.
- Exact Method Body: