Package Torello.HTML

Class Escape


  • public final class Escape
    extends java.lang.Object
    Escape (HTML Page Escape Sequences) - Documentation.

    There are dozens of "Escaped HTML" symbols in the HTML language. This class helps convert from an "escaped character" to the underlying/actual UTF-8 or ASCII 'char' (or in-the-reverse / vice-versa).

    Static (Functional) API: The methods in this class are all (100%) defined with the Java Key-Word / Key-Concept 'static'. Furthermore, there is no way to obtain an instance of this class, because there are no public (nor private) constructors. Java's Spring-Boot, MVC feature is *not* utilized because it flies directly in the face of the light-weight data-classes philosophy. This has many advantages over the rather ornate Component Annotations (@Component, @Service, @AutoWired, etc... 'Java Beans') syntax:

    • The methods here use the key-word 'static' which means (by implication) that there is no internal-state. Without any 'internal state' there is no need for constructors in the first place! (This is often the complaint by MVC Programmers).
    • A 'Static' (Functional-Programming) API expects to use fewer data-classes, and light-weight data-classes, making it easier to understand and to program.
    • The Vectorized HTML data-model allows more user-control over HTML parse, search, update & scrape. Also, memory management, memory leakage, and the Java Garbage Collector ought to be intelligible through the 'reuse' of the standard JDK class Vector for storing HTML Web-Page data.

    The power that object-oriented programming extends to a user is (mostly) limited to data-representation. Thinking of "Services" as "Objects" (Spring-MVC, 'Java Beans') is somewhat 'over-applying' the Object Oriented Programming Model. Like most classes in the Java-HTML JAR Library, this class backtracks to a more C-Styled Functional Programming Model (no Objects) - by re-using (quite profusely) the key-word static with all of its methods, and by sticking to Java's well-understood class Vector

    Static Fields: The methods in this class do not create any internal state that is maintained - however there are a few private & static fields defined. These fields are instantiated only once during the Class Loader phase (and only if this class shall be used), and serve as data 'lookup' fields (static constants). View this class' source-code in the link provided below to see internally used data.

    The defined internal fields include 3 Java Regular-Expressions for matching escaped HTML String's, and a java.util.Hashtable (stored in the JAR) for looking up escape-String definitions.



    • Method Detail

      • printHTMLEsc

        public static void printHTMLEsc()
        Print's the HTML Escape Character lookup table to System.out. This is useful for debugging.

        View Escape Codes: The list included within the page attached (below) is a complete list of all text-String HTML Escape Sequences that are known to this class. This list, does not include any Code Point, Hex or Decimal Number sequences.

        All HTML Escape Sequences
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
         Enumeration<String> e = htmlEscChars.keys();
         while (e.hasMoreElements())
         {
             String tag = e.nextElement();
             System.out.println("&" + tag + "; ==> " + htmlEscChars.get(tag));
         }
        
      • escHTMLToChar

        public static char escHTMLToChar​(java.lang.String escHTML)
        Converts a single String from an HTML-escape sequence into the appropriate character.

        &[escape-sequence]; ==> actual ASCII or UniCode character.
        Parameters:
        escHTML - An HTML escape sequence.
        Returns:
        the ASCII or Unicode character represented by this escape sequence.

        This method will return '0' if the input it does not represent a valid HTML Escape sequence.
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
         if (! escHTML.startsWith("&") || ! escHTML.endsWith(";")) return (char) 0;
        
         String  s = escHTML.substring(1, escHTML.length() - 1);
        
         // Temporary Variable.
         int     i = 0;
        
         // Since the EMOJI Escape Sequences use Code Point, they cannot, generally be
         // converted into a single Character.  Skip them.  
         if (HEX_CODE.matcher(s).find()) 
             if ((i = Integer.parseInt(s.substring(2), 16)) < Character.MAX_VALUE)
                 return (char) i;
             else
                 return 0;
        
         // Again, deal with Emoji's here...  Parse the integer, and make sure it is a
         // character in the standard UNICODE range.
         if (DEC_CODE.matcher(s).find()) 
             if ((i = Integer.parseInt(s.substring(1))) < Character.MAX_VALUE)
                 return (char) i;
             else
                 return 0;
        
         // Now check if the provided Escape String is listed in the htmlEscChars Hashtable.
         Character c = htmlEscChars.get(s);
        
         // If the character was found in the table that lists all escape sequence characters,
         // then return it.  Otherwise just return ASCII zero.
         return (c != null) ? c.charValue() : 0;
        
      • replaceAll_HEX

        public static java.lang.String replaceAll_HEX​(java.lang.String str)
        Will return a String with all of the HTML escape sequences removed and replaced with their actual ASCII/UniCode characters!

        For Instance:
        Substring from Input:Web-Browser Converts To:
        &#xAA;'ª' within a browser
        &#x67;'g' within a browser
        &#x84;'„' within a browser
        Parameters:
        str - any String that contains the HTML Escape Sequence &#x[HEXADECIMAL VALUE]; This like the C++ 'Ord()' function, except in HTML.
        Returns:
        a String, with all of the hexadecimal escape sequences removed and replaced with ASCII UniCode Characters.
        See Also:
        replaceAll_DEC(String str), StrReplace.r(String, String[], char[])
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
         // This is the RegEx Matcher from the top.  It matches string's that look like: &#x\d+;
         Matcher m = HEX_CODE.matcher(str);
        
         // Save the escape-string regex search matches in a TreeMap.  We need to use a
         // TreeMap because it is much easier to check if a particular escape sequence has already
         // been found.  It is easier to find duplicates with TreeMap's.
         TreeMap<String, Character> escMap = new TreeMap<>();
        
         while (m.find())
         {
             // Use Base-16 Integer-Parse
             int i = Integer.valueOf(m.group(1), 16);
        
             // Do not un-escape EMOJI's... It makes a mess - they are sequences of characters
             // not single characters.
             if (i > Character.MAX_VALUE) continue;
        
             // Retrieve the Text Information about the HTML Escape Sequence
             String text = m.group();
        
             // Check if it is a valid HTML 5 Escape Sequence.
             if (! escMap.containsKey(text)) escMap.put(text, Character.valueOf((char) i));
         }
                
         // Build the matchStr's and replaceChar's arrays.  These are just the KEY's and
         // the VALUE's of the TreeMap<String, Character> which was just built.
         // NOTE: A TreeMap is used *RATHER THAN* two parallel arrays in order to avoid keeping
         //       duplicates when the replacement occurs.
        
         String[]    matchStrs       = escMap.keySet().toArray(new String[escMap.size()]);
         char[]      replaceChars    = new char[escMap.size()];
        
         // Lookup each "ReplaceChar" in the TreeMap, and put it in the output "replaceChars"
         // array.  The class StrReplace will replace all the escape squences with the actual
         // characters.
         for (int i=0; i < matchStrs.length; i++) replaceChars[i] = escMap.get(matchStrs[i]);
        
         return StrReplace.r(str, matchStrs, replaceChars);
        
      • replaceAll_DEC

        public static java.lang.String replaceAll_DEC​(java.lang.String str)
        This method functions the same as replaceAll_HEX(String) - except it replaces only HTML Escape sequences that are represented using decimal (base-10) values. 'replaceAll_HEX(...)' works on hexadecimal (base-16) values.

        For Instance:
        Substring from Input:Web-Browser Converts To:
        &#48;'0' in your browser
        &#64;'@' in your browser
        &#123;'{' in your browser
        &#125;'}' in your browser
        Parameters:
        str - any String that contains the HTML Escape Sequence &#[DECIMAL VALUE];. If this parameter does not contain this sequence this method will return the same String. The short example delineates the difference between an HTML escape-sequence that employs Base-10 numbers, and one using Base-16 numbers

        Note the Difference:
        • &#x[hex base-16 value]; There is an 'x' as the third character in the String
        • &#[decimal base-10 value]; There is no 'x' in the escape-sequence String!
        Returns:
        a String, with all of the decimal escape sequences removed and replaced with ASCII UniCode Characters.
        See Also:
        replaceAll_HEX(String str), StrReplace.r(String, String[], char[])
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
         // This is the RegEx Matcher from the top.  It matches string's that look like: &#\d+;
         Matcher m = DEC_CODE.matcher(str);
        
         // Save the escape-string regex search matches in a TreeMap.  We need to use a
         // TreeMap because it is much easier to check if a particular escape sequence has already
         // been found.  It is easier to find duplicates with TreeMap's.
         TreeMap<String, Character> escMap = new TreeMap<>();
        
         while (m.find())
         {
             // Use Base-10 Integer-Parse
             int i = Integer.valueOf(m.group(1));
        
             // Do not un-escape EMOJI's... It makes a mess - they are sequences of characters
             // not single characters.
             if (i > Character.MAX_VALUE) continue;
        
             // Retrieve the Text Information about the HTML Escape Sequence
             String text = m.group();
        
             // Check if it is a valid HTML 5 Escape Sequence.
             if (! escMap.containsKey(text)) escMap.put(text, Character.valueOf((char) i));
         }
                
         // Build the matchStr's and replaceChar's arrays.  These are just the KEY's and
         // the VALUE's of the TreeMap<String, Character> which was just built.
         // NOTE: A TreeMap is used *RATHER THAN* two parallel arrays in order to avoid keeping
         //       duplicates when the replacement occurs.
        
         String[]    matchStrs       = escMap.keySet().toArray(new String[escMap.size()]);
         char[]      replaceChars    = new char[escMap.size()];
        
         // Lookup each "ReplaceChar" in the TreeMap, and put it in the output "replaceChars"
         // array.  The class StrReplace will replace all the escape sequences with the actual
         // characters.
         for (int i=0; i < matchStrs.length; i++) replaceChars[i] = escMap.get(matchStrs[i]);
        
         return StrReplace.r(str, matchStrs, replaceChars);
        
      • replaceAll_TEXT

        public static java.lang.String replaceAll_TEXT​(java.lang.String str)
        Replaces all HTML Escape Sequences that contain text-word escape-sequences.

        For Instance:
        ASCII or UNICODE:Can be Escaped Using:
        " (double-quote)&quot; (in HTML)
        & (ampersand)&amp; (in HTML)
        < (less-than)&lt; (in HTML)
        > (greater-than)&gt; (in HTML

        View Escape Codes: The list included within the page attached (below) is a complete list of all text-String HTML Escape Sequences that are known to this class. This list, does not include any Code Point, Hex or Decimal Number sequences.

        All HTML Escape Sequences
        Parameters:
        str - any String that contains HTML Escape Sequences that need to be converted to their ASCII-UniCode character representations.
        Returns:
        a String, with all of the decimal escape sequences removed and replaced with ASCII UniCode Characters.
        Throws:
        java.lang.IllegalStateException
        See Also:
        replaceAll_HEX(String str), StrReplace.r(String, boolean, String[], Torello.Java.Function.IntTFunction)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
         // We only need to find which escape sequences are in this string.
         // use a TreeSet<String> to list them.  It will
         Matcher                 m        = TEXT_CODE.matcher(str);
         TreeMap<String, String> escMap   = new TreeMap<>();
        
         while (m.find())
         {
             // Retrieve the Text Information about the HTML Escape Sequence
             String text     = m.group();
             String sequence = text.substring(1, text.length() - 1);
        
             // Check if it is a valid HTML 5 Escape Sequence.
             if ((! escMap.containsKey(text)) && htmlEscChars.containsKey(sequence))
                 escMap.put(text, sequence);
         }
                
         // Convert the TreeSet to a String[] array... and use StrReplace
         String[] escArr = new String[escMap.size()];
        
         return StrReplace.r(
             str, false, escMap.keySet().toArray(escArr),
             (int i, String sequence) -> htmlEscChars.get(escMap.get(sequence))
         );
        
      • replaceAll

        @Deprecated
        public static java.lang.String replaceAll​(java.lang.String s)
        Deprecated.
        Calls all of the HTML Escape Sequence convert/replace String functions at once.
        Parameters:
        s - This may be any Java String which may (or may not) contain HTML Escape sequences.
        Returns:
        a new String where all HTML escape-sequence substrings have been replaced with their natural character representations.
        See Also:
        replaceAll_DEC(String), replaceAll_HEX(String), replaceAll_TEXT(String)
        Code:
        Exact Method Body:
        1
         return replaceAll_HEX(replaceAll_DEC(replaceAll_TEXT(s)));
        
      • replace

        public static java.lang.String replace​(java.lang.String s)
        This is an optimized HTML String-replacement method. It will substitute all HTML Escape Sequences with the actual characters they represent.

        Emoji Note: In keeping with the other methods in this class, if there are any HTML Emoji Escape Sequences, these shall not be replaced. Emoji's work on the principle of Code-Point, and though replacing such escape sequences is not difficult, because they work in the Code Point space, their substitutions are never single character representations (there are always at least two Java char's for one Code Point). There is an alternate method that can substitute the actual Java char's for a Code Point escape sequence.

        Code Point Note: For those familiar with Code Point, this method just skips any escaped sequence that uses either the Base 10 or the Base 16 representations where the parsed number is larger than Character.MAX_VALUE. It is important to remember that all Java String's are simply char arrays that are wrapped in an java.lang.String instance. Since the Primitive Type 'char' is fundamentally a 16-bit character, no character can be converted if it is larger than this value. Although Code Point works just fine in Java, it is left as a separate method in this class.

        FINALLY: Most standard web-pages use very little of the more advanced escape sequences. Emoji's are somewhat popular, but this issue isn't about whether the 'Code Point' based escape-sequences can be converted or handled, but rather it is about whether or not your really want to leave the comfortable world of HTML Escape Sequences for your Code Point related characters. Once a Code Point sequence has been un-escaped, it will only be visible in text-editors / viewers that are capable of rendering Code Point's or Emoji's (and not all text editors can do this!)
        Parameters:
        s - This may be any Java String which may (or may not) contain HTML Escape sequences.
        Returns:
        a new String where all HTML escape-sequence substrings have been replaced with their natural character representations.
        Code:
        Exact Method Body:
          1
          2
          3
          4
          5
          6
          7
          8
          9
         10
         11
         12
         13
         14
         15
         16
         17
         18
         19
         20
         21
         22
         23
         24
         25
         26
         27
         28
         29
         30
         31
         32
         33
         34
         35
         36
         37
         38
         39
         40
         41
         42
         43
         44
         45
         46
         47
         48
         49
         50
         51
         52
         53
         54
         55
         56
         57
         58
         59
         60
         61
         62
         63
         64
         65
         66
         67
         68
         69
         70
         71
         72
         73
         74
         75
         76
         77
         78
         79
         80
         81
         82
         83
         84
         85
         86
         87
         88
         89
         90
         91
         92
         93
         94
         95
         96
         97
         98
         99
        100
        101
        102
        103
        104
        105
        106
        107
        108
        109
        110
         // The primary optimization is to do this the "C" way (As in The C Programming Language)
         // The String to Escape is converted to a character array, and the characters are shifted
         // as the Escape Sequences are replaced.  This is all done "in place" without creating
         // new substring's in memory.
         char[] c = s.toCharArray();
        
         // These two pointers are kept as the "Source Character" - as in the next character to
         // "Read" ... and the "Destination Character" - as in the next location to write.
         int sourcePos   = 0;
         int destPos     = 0;
        
         while (sourcePos < c.length)
        
             // All Escape Sequences begin with the Ampersand Symbol.  If the next character
             // does not begin with the Ampersand, we should skip and move on.  Copy the next source
             // character to the next destination location, and continue the loop.
             if (c[sourcePos] != '&')
             { c[destPos++]=c[sourcePos++];  continue; }
            
             // Here, an Ampersand has been found.  Now check if the character immediately 
             // following the Ampersand is a Pound Sign.  If it is a Pound Sign, that implies
             // this escape sequence is simply going to be a number.
             else if ((sourcePos < (c.length-1)) && (c[sourcePos + 1] == '#'))
             {
                 int     evaluatingPos   = sourcePos + 1;
                 boolean isHex           = false;
        
                 // If the Character after the Pound Sign is an 'X', it means that the number
                 // that has been escaped is a Base 16 (Hexadecimal) number.
                 // IMPORTANT: Check to see that the Ampersand wasn't the last char in the String
                 if (evaluatingPos + 1 < c.length)
                     if (c[evaluatingPos + 1] == 'x')
                     { isHex = true; evaluatingPos++; }
        
                 // Keep skipping the numbers, until a non-digit character is identified.
                 while ((++evaluatingPos < c.length) && Character.isDigit(c[evaluatingPos]));
        
                 // If the character immediately after the last digit isn't a ';' (Semicolon),
                 // then this entire thing is NOT an escaped HTML character.  In this case, make
                 // sure to copy the next source-character to the next destination location in the
                 // char[] array...  Then continue the loop to the next 'char' (after Ampersand)
                 if ((evaluatingPos == c.length) || (c[evaluatingPos] != ';'))
                 { c[destPos++]=c[sourcePos++];  continue; }
        
                 int escapedChar;
                 try
                 { 
                     // Make sure to convert 16-bit numbers using the 16-bit radix using the
                     // standard java parse integer way.
                     escapedChar = isHex
                         ? Integer.parseInt(s.substring(sourcePos + 3, evaluatingPos), 16)
                         : Integer.parseInt(s.substring(sourcePos + 2, evaluatingPos));
                 }
                 // If for whatever reason java was unable to parse the digits in the escape
                 // sequence, then copy the next source-character to the next destination-location
                 // and move on in the loop.
                 catch (NumberFormatException e)
                 { c[destPos++]=c[sourcePos++];  continue; }
        
                 // If the character was an Emoji, then it would be a number greater than
                 // 2^16.  Emoji's use Code Points - which are multiple characters used up
                 // together.  Their escape sequences are always characters larger than 65,535.
                 // If so, just copy the next source-character to the next destination location, and
                 // move on in the loop.
                 if (escapedChar > Character.MAX_VALUE)
                 { c[destPos++]=c[sourcePos++];  continue; }
        
                 // Replace the next "Destination Location" with the (un) escaped char.
                 c[destPos++] = (char) escapedChar;
        
                 // Skip the entire HTML Escape Sequence by skipping to the location after the
                 // position where the "evaluation" (all this processing) was occurring.  This
                 // just happens to be the next-character immediately after the semi-colon
                 sourcePos = evaluatingPos + 1;  // will be pointing at the ';' (semicolon)
             }
        
             // An Ampersand was just found, but it was not followed by a '#' (Pound Sign).  This
             // means that it is not a "numbered" (to invent a term) HTML Escape Sequence.  Instead
             // we shall check if there is a valid Escape-String (before the next semi-colon) that
             // can be identified in the Hashtable 'htmlEscChars'
             else if (sourcePos < (c.length - 1))
             {
                 // We need to create a 'temp variable' and it will be called "evaluating position"
                 int evaluatingPos = sourcePos;
        
                 // All text (non "Numbered") HTML Escape String's are comprised of letter or digits
                 while ((++evaluatingPos < c.length) && Character.isLetterOrDigit(c[evaluatingPos]));
        
                 // If the character immediately after the last letter or digit is not a semi-colon,
                 // then there is no way this is an HTML Escape Sequence.  Copy the next source to
                 // the next destination location, and continue with the loop.
                 if ((evaluatingPos == c.length) || (c[evaluatingPos] != ';'))
                 { c[destPos++]=c[sourcePos++];  continue; }
        
                 // Get the replacement character from the lookup table.
                 Character replacement = htmlEscChars.get(s.substring(sourcePos + 1, evaluatingPos));
        
                 // The lookup table will return null if there this was not a valid escape sequence.
                 // If this was not a valid sequence, just copy the next character from the source
                 // location, and move on in the loop.
                 if (replacement == null)
                 { c[destPos++]=c[sourcePos++];  continue; }
        
                 c[destPos++] = replacement;
                 sourcePos = evaluatingPos + 1;
             }
             else
             { c[destPos++]=c[sourcePos++];  continue; }
        
         return new String(c, 0, destPos);
        
      • escChar

        public static java.lang.String escChar​(char c,
                                               boolean use16BitEscapeSequence)
        This method shall simply escape any char into an HTML Escape String.
        Input 'char'Returned String's
        '中' (Middle / China) "&#20013;" (Base 10)
        "&#x4E2D;" (Base 16)
        '日' (Japan / Sun) "&#26085;" (Base 10)
        "&#x65E5;" (Base 16)
        'Ñ' (Spanish Tilda) "&#209;" (Base 10)
        "&#xD1;" (Base 16)
        'ñ' (Lower-Case Tilda) "&#241;" (Base 10)
        "&#xF1;" (Base 16)
        '☃' (Snowman Glyph) "&#9731;" (Base 10)
        "&#x2603;" (Base 16)

        IMPORTANT NOTE: The java primitive 'char' type, which, again, is a 16-bit (2^16 65,535) type essentially equates to the primary plane (plane 0) of the 17 UNICODE planes. This is also known as the Basic Multi-Lingual Plane. Here, likely any foreign language character, needed by a programmer (including all Chinese Character Glyphs) are easily found with a bit of searching. Any modern web-browser can display these characters, if they are escaped using an the HTML Escape Sequences returned by this method.

        ALSO: As an aside, if a programmer includes the HTML Element: <META CHARSET="utf-8"> in the <HEAD>...</HEAD> portion of an HTML Page, it becomes easy to include such characters (from the Multi-Lingual Plane) without even needing to use escape-sequences for the characters. Any web-browser which knows before-hand that non-ASCII characters (higher than character #255 / 0xFF) are being transmitted, will interpret them using UTF-8. In this case escaping the char's them becomes unnecessary.
        Parameters:
        c - Any Java Character. Note that the Java Primitive Type 'char' is a 16-bit type. This parameter equates to the UNICODE Characters 0x0000 up to 0xFFFF.
        use16BitEscapeSequence - If the user would like the returned, escaped, String to use Base 16 for the escaped digits, pass TRUE to this parameter. If the user would like to retrieve an escaped String that uses standard Base 10 digits, then pass FALSE to this parameter.
        Returns:
        The passed character parameter 'c' will be converted to an HTML Escape Sequence. For instance if the character 'ᡃ', which is the Chinese Character for I, Me, Myself were passed to this method, then the String "&#25105;" would be returned.

        If the parameter 'use16BitEscapeSequence' had been passed TRUE, then this method would, instead, return the String "&#x6211;".
        Code:
        Exact Method Body:
        1
        2
        3
         return use16BitEscapeSequence
             ? "&#" + ((int) c) + ";"
             : "&#x" + Integer.toHexString((int) c).toUpperCase() + ";";
        
      • escCodePoint

        public static java.lang.String escCodePoint​(int codePoint,
                                                    boolean use16BitEscapeSequence)
        This method shall simply escape any Code Point point integer into an HTML Escape String. Below is a list of a few examples of Code Points commonly used. As stated, most of the Basic Multi Lingual Plane - which is Plane 0 of the UNICODE Space fits into the 16-bit java Primitive Type 'char'. For such situations, "Code Points" have very little application to software. Essentially, Java's 16-bit 'char' primitive type gives that to the programmer "for free" - without needing to think past, again, Java's primitive-type 'char'.

        Although "Code Points" were developed decades ago, today, one of the most common uses for them are the Emoji's being used on numerous web-sites. It is important to note that not all Emoji's will fit into a single Code Point, and, as such, equating a "Code Point" with an "Emoji" is actually incorrect. However, for the more complicated Emoji's available, all that is really going on is that sequences of code points are being sent and interpreted by the web-browser - as a single glyph or character-image.

        Escaping Emoji's: Just as with foreign language characters, the code-points themselves (Without being escaped) can be included directly into a text file, as long as the HTML file indicates that non-ASCII, or UTF-8 data is being transmitted. In such cases, to avoid using these escape sequences at all, just include the usual Java char's in the meta tag in the HTML <HEAD>...</HEAD> section, as follows: <META CHARSET="utf-8">.
        Input Code Point (int)Returned String's
        😀 (Grinning Face)
        (128512)
        "&#128512;" (Base 10)
        "&#x1F600;" (Base 16)
        👍 (Thumb's Up)
        (128077)
        "&#128077;" (Base 10)
        "&#x1F44D;" (Base 16)
        🌮 (Taco)
        (127790)
        "&#127790;" (Base 10)
        "&#x1F32E;" (Base 16)
        'A' (Upper-Case A)
        (ASCII# 65)
        "&#65;" (Base 10)
        "&#x41;" (Base 16)
        '0' (Number Zero)
        (ASCII# 48)
        "&#48;" (Base 10)
        "&#x30;" (Base 16)
        '中' (Middle-China)
        (20013)
        "&#20013;" (Base 10)
        "&#x4E2D;" (Base 16)
        'ü' (German Umlaut)
        (252)
        "&#252;" (Base 10)
        "&#xFC;" (Base 16)
        'Ñ' (Spanish Tilda)
        (209)
        "&#209;" (Base 10)
        "&#xD1;" (Base 16)

        AGAIN: If the '.html' files you are providing to a web-browser indicate the <META CHARSET="utf-8">, it is not necessary to provide HTML escape sequences for an Emoji, or any 'Code Point' at all. Instead, if the text-editor you are using to edit your '.html' files can handle code points, they may be included directly into the 'html' file itself.

        IMPORTANT: There are numerous Emoji's that are represented by sequences of code-points, AND NOT just a single code point integer. In such cases, providing HTML escape sequences will actually prevent the browser from rendering the "conglomerate" Emoji.

        The Emoji's below do not need to be escaped, (because they are sequences of code points, rather than just single code points). Instead, their code points must be included directly into the '.html' file itself - or they will not be properly rendered by the web-browser...
        EmojiCode Point Sequence
        👁️‍🗨️

        "Eye in Speech"
        U+1F441 U+200D U+1F5E8 ==>

        👁 (Eye - 0x1F441;) +

        GLUE (0X200D;) +

        🗨 (Speech Bubble - 0x1F5E8)
        👉🏿

        "Index-Finger Pointing, Dark Hand"
        "U+1F449 U+1F3FF" ==>

        👉 (Index Finger Pointing - U+1F449) +

        Dark Skin Color - U+1F3FF
        Parameters:
        codePoint - This will take any integer. It will be interpreted as a UNICODE code point.

        NOTE: Java uses 16-bit values for it's primitive 'char' type. This is also the "first plane" of the UNICODE Space and actually referred to as the Basic Multi Lingual Plane. Any value passed to this method that is lower than 65,535 would receive the same escape-String that it would from a call to the method escChar(char, boolean).
        use16BitEscapeSequence - If the user would like the returned, escaped, String to use Base 16 for the escaped digits, pass TRUE to this parameter. If the user would like to retrieve an escaped String that uses standard Base 10 digits, then pass FALSE to this parameter.
        Returns:
        The code point will be converted to an HTML Escape Sequence, as a java.lang.String. For instance if the code point for "the snowman" glyph (character ☃), which happens to be represented by a code point that is below 65,535 (and, incidentally, does "fit" into a single Java 'char') - this method would return the String "&#9731;".

        If the parameter 'use16BitEscapeSequence' had been passed TRUE, then this method would, instead, return the String "&#x2603;".
        Throws:
        java.lang.IllegalArgumentException - Java has a method for determining whether any integer is a valid code point. Not all of the integers "fit" into the 17 Unicode "planes". Note that each of the planes in 'Unicode Space' contain 65,535 (or 2^16) characters.
        Code:
        Exact Method Body:
        1
        2
        3
        4
        5
        6
        7
        8
        9
         if (! Character.isValidCodePoint(codePoint)) throw new IllegalArgumentException(
             "The integer you have passed to this method [" + codePoint + "] was deemed an " +
             "invalid Code Point after a call to: [java.lang.Character.isValidCodePoint(int)].  " + 
             "Therefore this method is unable to provide an HTML Escape Sequence."
         );
        
         return use16BitEscapeSequence
             ? "&#" + codePoint + ";"
             : "&#x" + Integer.toHexString(codePoint).toUpperCase() + ";";
        
      • hasHTMLEsc

        public static boolean hasHTMLEsc​(char c)
        Check the internal Escape Sequence Lookup Table. If there is an escape sequence String associated with the char provided to this method, then return TRUE. If there is no such Escape Sequence in the Lookup Table associated with parameter 'c', then return FALSE.

        The Lookup Table can identify whether char parameter 'c' has an associated HTML Escape Sequence, or not. Escape sequences are always short, text-String's that were selected by the w3C (long ago, in the 1990's).

        Returns TRUE if there is an associated String escape-sequence for char-parameter 'c' parameter, and FALSE otherwise. Please review the brief sample table below:
        Input Character:Method Return Value:
        '&' (ampersand) TRUE
        'A' (letter-A) FALSE
        '<' (less-than-symbol) TRUE
        '9' (number-9) FALSE
        '>' (less-than-symbol) TRUE

        View Escape Codes: The list included within the page attached (below) is a complete list of all text-String HTML Escape Sequences that are known to this class. This list, does not include any Code Point, Hex or Decimal Number sequences.

        All HTML Escape Sequences
        Parameters:
        c - Any ASCII or UNICODE Character
        Returns:
        TRUE if there is a String escape sequence for this character, and FALSE otherwise.
        See Also:
        htmlEsc(char), htmlEscSeq
        Code:
        Exact Method Body:
        1
         return htmlEscSeq.get(Character.valueOf(c)) != null;
        
      • htmlEsc

        public static java.lang.String htmlEsc​(char c)
        Check the internal Escape Sequence Lookup Table. If there is an escape sequence String associated with the char provided to this method, then return it.

        For Instance:
        Input Character:Method Return Value:
        '&' "amp"
        'A' (letter-A) null
        '<' (less-than-symbol) "lt"
        '9' (number-9) null
        '>' (greater-than-symbol) "gt"

        View Escape Codes: The list included within the page attached (below) is a complete list of all text-String HTML Escape Sequences that are known to this class. This list, does not include any Code Point, Hex or Decimal Number sequences.

        All HTML Escape Sequences
        Parameters:
        c - Any ASCII or UNICODE Character
        Returns:
        The String that is used by web-browsers to escape this ASCII / Uni-Code character - if there is one saved in the internal Lookup Table. If the character provided does not have an associated HTML Escape String, then 'null' is returned.

        NOTE: The entire escape-String is not provided, just the inner-characters. The leading '&' (Ampersand) and the trailing ';' (Semi-Colon) are not appended to the returned String.
        See Also:
        hasHTMLEsc(char), htmlEscSeq
        Code:
        Exact Method Body:
        1
         return htmlEscSeq.get(Character.valueOf(c));