Class Verbs


  • public class Verbs
    extends java.lang.Object
    Verbs (Spanish) - Documentation.

    The primary use of this class is to facilitate adding HTML <SPAN DATA-RV="regular_verb"> elements, and also <SPAN DATA-IV="irregular_verb"> elements to a page of text. Primarily this can be of value because there are Java-Script zIndex based popup windows that may be easily added by incorporating the simple Java-Script files provided to your Spanish Language Pages.

    For More Information, please view the pages @ SpanishNewsBoard.com to view the concept of "Verb Conjugation Popup Windows."



    • Field Detail

      • skip

        public static java.lang.String[] skip
        This software is not perfect. Human language is a new order of issues. There are many features that could be added to make a better translator, but I have been busy writing an HTML Scrape Package instead. When you see this array, what it means is that these words are extremely common words in Spanish, but usually, in about 80% to 90% of cases, aren't verbs. A "Lexical Analysis" could probably figure out much better when a word is guaranteed to be verb, but for now, these words are "just skipped" and never identified as verbs at all.

        NOTE: You may change this at your discretion, just re-assign the array.
        Code:
        Exact Field Declaration Expression:
        1
        2
        public static String[] skip =
            { "como", "casa", "para", "uno", "una", "cosa", "nada", "entre", "dallas" };
        
    • Method Detail

      • loadConjugations

        public static void loadConjugations()
        Loads the Conjugations String into memory. This must be in memory before working with Verb-Spans.
        See Also:
        LFEC.loadFile_JAR(Class, String)
        Code:
        Exact Method Body:
        1
         conjugations = LFEC.loadFile_JAR(Torello.Data.DataFileLoader.class, CONJUGATIONS);
        
      • releaseConjugations

        public static void releaseConjugations()
        Releases the memory for the (rather large) Java-String containing the verb conjugations. Calls gc().
        See Also:
        GC()
        Code:
        Exact Method Body:
        1
         conjugations = null; GC();
        
      • loadIrregularInfinitives

        public static void loadIrregularInfinitives()
        Loads the list of Irregular Infinitives, from .JAR. This must be in memory before working with Verb-Spans.
        See Also:
        LFEC.readObjectFromFile_JAR(Class, String, boolean, Class)
        Code:
        Exact Method Body:
        1
        2
         irregularInfinitives = (TreeSet<String>) LFEC.readObjectFromFile_JAR
             (Torello.Data.DataFileLoader.class, IRREG_INFINITIVES, true, TreeSet.class);
        
      • releaseIrregularInfinitives

        public static void releaseIrregularInfinitives()
        Releases the memory for the (rather large) TreeSet of Irregular Infinitives. Calls gc().
        See Also:
        GC()
        Code:
        Exact Method Body:
        1
         irregularInfinitives.clear(); irregularInfinitives = null; GC();
        
      • releaseInfinitives

        public static void releaseInfinitives()
        Releases the memory for the TreeSet of infinitives. Calls gc().
        See Also:
        GC()
        Code:
        Exact Method Body:
        1
         infinitives.clear(); infinitives = null; GC();
        
      • releaseDefinitions

        public static void releaseDefinitions()
        Releasees the memory for the TreeMap of definitions. Calls gc().
        See Also:
        GC()
        Code:
        Exact Method Body:
        1
         definitions.clear(); definitions = null; GC();
        
      • infinitives

        public static java.util.Iterator<java.lang.String> infinitives()
        Generates an iterator of Spanish Verb Infinitives. Items may not be removed via the iterator's 'remove()' method.
        Returns:
        An iterator of all Spanish Verbs loaded into the infinitives TreeSet.
        See Also:
        RemoveUnsupportedIterator
        Code:
        Exact Method Body:
        1
         return new RemoveUnsupportedIterator<String>(infinitives.iterator());
        
      • irregularInfinitives

        public static java.util.Iterator<java.lang.String> irregularInfinitives()
        Generates an iterator of Spanish Irregular-Verbs in Infinitive Form. Items may not be removed via the iterator's 'remove()' method.
        Returns:
        An iterator of all Irregular Spanish Verbs loaded into the irregular-infinitives TreeSet.
        See Also:
        RemoveUnsupportedIterator
        Code:
        Exact Method Body:
        1
         return new RemoveUnsupportedIterator<String>(irregularInfinitives.iterator());
        
      • getDefinition

        public static java.lang.String getDefinition​
                    (java.lang.String infinitiveInLowerCase)
        
        Gets the quick-definition of a Spanish Verb.
        EXPECTATIONS:

        • The "definitions" data file must already be loaded into memory
        • To be precise, loadIDefinitions() needs to have been called!
        • word MUST be in lower-case Spanish - otherwise results might be inaccurate!
        • TRY: ES.toLowerCaseSpanish(String) to make sure.
        Parameters:
        infinitiveInLowerCase - This may be any Spanish Verb - as long as it is in the infinitive form.
        Returns:
        Will return the string stored as the value in the TreeMap<String, String> definitions, and null if this infinitive is not found in the dictionary.
        See Also:
        ES.toLowerCaseSpanish(String)
        Code:
        Exact Method Body:
        1
         return definitions.get(infinitiveInLowerCase);
        
      • getInfinitive

        public static java.lang.String getInfinitive​
                    (java.lang.String wordInLowerCase)
        
        Get the infinitive form of a verb-string.
        EXPECTATIONS:

        • The "conjugations" data file must already be loaded into memory
        • To be precise, loadIConjugations() needs to have been called!
        • word MUST be in lower-case Spanish - otherwise results might be inaccurate!
        • TRY: ES.toLowerCaseSpanish(String) to make sure.
        Parameters:
        wordInLowerCase - This can be any word (in Spanish... or any language for that matter).

        It is expected to be a conjugated form of a Spanish verb. If it is... The original infinitive form of that verb will be returned.
        Returns:
        • Returns the Infinitive of a verb - if the word passed is a direct conjugation of that verb.
        • Returns null if there are no matching verbs conjugations in private static String conjugations
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
         // Eliminates common words that aren't verbs - but conjugate .. "para" "como"
         // for (int k=0; k < skip.length; k++) if (wtlc.equals(skip[k])) return null;
        
         // GREP through the conjugations data file (stored in String: conjugations)
         int pos = conjugations.indexOf(" " + wordInLowerCase + ",");
         if (pos == -1) 	if (wordInLowerCase.charAt(wordInLowerCase.length() - 1) == 'r')
             pos = conjugations.indexOf("\n" + wordInLowerCase + ":");
        
         // the post-increment (++) is for the infinitive case match.
         // Specifically, the first character, in this (the infinitive) case, would be a 
         // newline '\n'.. and a '\n' character is exactly what the loop which follows is
         // grep'ing for...
         if (pos == -1) return null; else pos++;
                
         // There *WAS* a match in the conjugations data file. - get infinitive and return
         while ((conjugations.charAt(--pos) != '\n') && (pos > 0));
         return conjugations.substring(pos + 1, conjugations.indexOf(':', pos + 1));
        
      • isIrregular

        public static boolean isIrregular​(java.lang.String infinitiveInLowerCase)
        Checks if a word is an irregular verb.

        EXPECTATIONS:

        • The "irregular infinitives" data file must already be loaded into memory
        • To be precise, loadIrregularInfinitives() needs to have been called!
        • word MUST be in lower-case Spanish - otherwise results will be inaccurate!
        • TRY: ES.toLowerCaseSpanish(String) to make sure
        Parameters:
        infinitiveInLowerCase - This may be any Spanish Verb - as long as it is in the infinitive form. This word must have been converted to lower case, and if not, it will likely return null.
        Returns:
        Will return TRUE if this verb is contained by the list of irregular-verbs Will return FALSE otherwise.
        See Also:
        ES.toLowerCaseSpanish(String)
        Code:
        Exact Method Body:
        1
         return irregularInfinitives.contains(infinitiveInLowerCase);
        
      • addSpanishVerbSpans

        public static void addSpanishVerbSpans​
                    (java.util.Vector<HTMLNode> page,
                     java.util.TreeSet<java.lang.String> regularVerbsFound,
                     java.util.TreeSet<java.lang.String> irregularVerbsFound,
                     java.util.TreeSet<java.lang.String> wordsNotFound)
        
        This will call the "addSpanishVerbSpans" on each TextNode found in the page Vector.
        Parameters:
        regularVerbsFound - If this parameter isn't null, than any and all regular verbs found within the text will be added to this TreeSet. If this parameter is null, it will be ignored.
        irregularVerbsFound - If this parameter isn't null, than any irregular-verbs found in this text will be added to this TreeSet. If this parameter is null, it will be ignored.
        wordsNotFound - All words that are found, and aren't verbs are entered into this TreeSet, if this parameter is not null. If this parameter is null, it will be ignored.
        See Also:
        addSpanishVerbSpans(String, TreeSet, TreeSet, TreeSet)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
         HTMLNode n;
         for (int i=0; i < page.size(); i++)
             if ((n = page.elementAt(i)) instanceof TextNode)
             {
                Vector<HTMLNode> withSpans = addSpanishVerbSpans
                         (n.str, regularVerbsFound, irregularVerbsFound, wordsNotFound);
        
                page.removeElementAt(i);
                page.addAll(i, withSpans);
                i += withSpans.size() - 1;   
                     // Trust me, this is right!
                     // If "withSpans.size() == 1" (a.k.a. "no-change"), then should do: i += 0;
                     // If "withSpans.size() == 2" (increased by 1), then should do: i += 1;
             }
        
      • addSpanishVerbSpans

        public static java.util.Vector<HTMLNodeaddSpanishVerbSpans​
                    (java.lang.String text,
                     java.util.TreeSet<java.lang.String> regularVerbsFound,
                     java.util.TreeSet<java.lang.String> irregularVerbsFound,
                     java.util.TreeSet<java.lang.String> wordsNotFound)
        
        The purpose of this class is to go through the Spanish Verbs in an HTML page, and replace
        Parameters:
        regularVerbsFound - If this parameter isn't null, than any and all regular verbs found within the text will be added to this TreeSet. If this parameter is null, it will be ignored.
        irregularVerbsFound - If this parameter isn't null, than any irregular-verbs found in this text will be added to this TreeSet. If this parameter is null, it will be ignored.
        wordsNotFound - All words that are found, and aren't verbs are entered into this TreeSet, if this parameter is not null. If this parameter is null, it will be ignored.
        Returns:
        An html sub-page (as a Vector) where each found Spanish-Verb has been surrounded by an HTML <SPAN> element that indicates the regularity of the verb, and it's infinitive-form conjugation.
        See Also:
        ES.onlyLanguageChars(String), ES.toLowerCaseSpanish(String), HTMLPage.getPageTokens(CharSequence, boolean)
        Code:
        Exact Method Body:
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
         boolean keepRV  = regularVerbsFound     != null;    // Keep list of found regular-verbs in the tree-set
         boolean keepIV  = irregularVerbsFound   != null;    // Keep list of found irregular-verbs in the tree set
         boolean keepNV  = wordsNotFound         != null;    // Keep list of words that weren't verbs in the tree-set
        
         StringBuilder outSB = new StringBuilder();
        
         // Splits the string by spaces
         String[] words = text.split(" ");
                
         for (int j=0; j < words.length; j++)
         {
             // Sometimes it is the empty string or just white-space
             String trim = words[j].trim();
             if (trim.length() == 0)
                 { outSB.append(" " + words[j]); continue; }
                    
             // Eliminates leading and trailing punctuation & HTML tags
             Matcher m = P1.matcher(trim);
        
             if (! m.find())
                 { outSB.append(" " + words[j]); continue; }
        
             String pre  = m.group(2);
             String word = m.group(3);
             String post = m.group(4);
        
             if (! ES.onlyLanguageChars(word)) System.out.println
                 ("ORIG: [" + words[j] + "], " + pre + ", " + word + ", " + post);
        
             if (word            == null)    { outSB.append(" " + words[j]); continue; }
             if (pre             == null)    pre = "";
             if (post            == null)    post = "";
             if (word.length()   == 0)       { outSB.append(" " + words[j]); continue; }
        
             String lc = ES.toLowerCaseSpanish(word);
        
             // Skip the "ultra-common" non-verbs that look just like verbs.
             for (String w : skip) if (lc.equals(w)) continue;
        
             String infinitive=  getInfinitive(lc);
        
             if (infinitive == null)
                 { if (keepNV) wordsNotFound.add(lc); continue; 	}
             else
                 { if (keepRV) regularVerbsFound.add(infinitive); }
        
             outSB.append(" " + pre + "<SPAN CLASS=\"");
        
             if (isIrregular(infinitive))
                 { outSB.append('I'); if (keepIV) irregularVerbsFound.add(infinitive); }
             else
                 { outSB.append('R'); }
        
             outSB.append("V\" DATA-V=\"" + infinitive + "\">" + word + "</SPAN>" + post);
         }
        
         outSB.append('\n');
        
         return HTMLPage.getPageTokens(outSB, false);