script Package¶
indic_scripts
Module¶
-
indicnlp.script.indic_scripts.
ALL_PHONETIC_DATA
= None¶ Phonetic data for Tamil
-
indicnlp.script.indic_scripts.
ALL_PHONETIC_VECTORS
= None¶ Phonetic vector for Tamil
-
indicnlp.script.indic_scripts.
PHONETIC_VECTOR_LENGTH
= 38¶ Start offset for the phonetic feature vector in the phonetic data vector
-
indicnlp.script.indic_scripts.
TAMIL_PHONETIC_DATA
= None¶ Phonetic vector for all languages except Tamil
-
indicnlp.script.indic_scripts.
TAMIL_PHONETIC_VECTORS
= None¶ Length of phonetic vector
-
indicnlp.script.indic_scripts.
in_coordinated_range_offset
(c_offset)[source]¶ Applicable to Brahmi derived Indic scripts
-
indicnlp.script.indic_scripts.
init
()[source]¶ To be called by library loader, do not call it in your program
-
indicnlp.script.indic_scripts.
is_indiclang_char
(c, lang)[source]¶ Applicable to Brahmi derived Indic scripts Note that DANDA and DOUBLE_DANDA have the same Unicode codepoint for all Indic scripts
-
indicnlp.script.indic_scripts.
lcsr
(srcw, tgtw, slang, tlang)[source]¶ compute the Longest Common Subsequence Ratio (LCSR) between two strings at the character level.
srcw: source language string tgtw: source language string slang: source language tlang: target language
-
indicnlp.script.indic_scripts.
lcsr_any
(srcw, tgtw)[source]¶ LCSR computation if both languages have the same script
-
indicnlp.script.indic_scripts.
lcsr_indic
(srcw, tgtw, slang, tlang)[source]¶ compute the Longest Common Subsequence Ratio (LCSR) between two strings at the character level. This works for Indic scripts by mapping both languages to a common script
srcw: source language string tgtw: source language string slang: source language tlang: target language
english_script
Module¶
-
indicnlp.script.english_script.
ENGLISH_PHONETIC_DATA
= None¶ Phonetic vector for English
-
indicnlp.script.english_script.
ENGLISH_PHONETIC_VECTORS
= None¶ Length of phonetic vector
-
indicnlp.script.english_script.
ID_ARPABET_MAP
= {}¶ Phonetic data for English
-
indicnlp.script.english_script.
PHONETIC_VECTOR_LENGTH
= 38¶ Start offset for the phonetic feature vector in the phonetic data vector