script Package¶
indic_scripts Module¶
-
indicnlp.script.indic_scripts.ALL_PHONETIC_DATA= None¶ Phonetic data for Tamil
-
indicnlp.script.indic_scripts.ALL_PHONETIC_VECTORS= None¶ Phonetic vector for Tamil
-
indicnlp.script.indic_scripts.PHONETIC_VECTOR_LENGTH= 38¶ Start offset for the phonetic feature vector in the phonetic data vector
-
indicnlp.script.indic_scripts.TAMIL_PHONETIC_DATA= None¶ Phonetic vector for all languages except Tamil
-
indicnlp.script.indic_scripts.TAMIL_PHONETIC_VECTORS= None¶ Length of phonetic vector
-
indicnlp.script.indic_scripts.in_coordinated_range_offset(c_offset)[source]¶ Applicable to Brahmi derived Indic scripts
-
indicnlp.script.indic_scripts.init()[source]¶ To be called by library loader, do not call it in your program
-
indicnlp.script.indic_scripts.is_indiclang_char(c, lang)[source]¶ Applicable to Brahmi derived Indic scripts Note that DANDA and DOUBLE_DANDA have the same Unicode codepoint for all Indic scripts
-
indicnlp.script.indic_scripts.lcsr(srcw, tgtw, slang, tlang)[source]¶ compute the Longest Common Subsequence Ratio (LCSR) between two strings at the character level.
srcw: source language string tgtw: source language string slang: source language tlang: target language
-
indicnlp.script.indic_scripts.lcsr_any(srcw, tgtw)[source]¶ LCSR computation if both languages have the same script
-
indicnlp.script.indic_scripts.lcsr_indic(srcw, tgtw, slang, tlang)[source]¶ compute the Longest Common Subsequence Ratio (LCSR) between two strings at the character level. This works for Indic scripts by mapping both languages to a common script
srcw: source language string tgtw: source language string slang: source language tlang: target language
english_script Module¶
-
indicnlp.script.english_script.ENGLISH_PHONETIC_DATA= None¶ Phonetic vector for English
-
indicnlp.script.english_script.ENGLISH_PHONETIC_VECTORS= None¶ Length of phonetic vector
-
indicnlp.script.english_script.ID_ARPABET_MAP= {}¶ Phonetic data for English
-
indicnlp.script.english_script.PHONETIC_VECTOR_LENGTH= 38¶ Start offset for the phonetic feature vector in the phonetic data vector