script Package

indic_scripts Module

indicnlp.script.indic_scripts.ALL_PHONETIC_DATA = None

Phonetic data for Tamil

indicnlp.script.indic_scripts.ALL_PHONETIC_VECTORS = None

Phonetic vector for Tamil

indicnlp.script.indic_scripts.PHONETIC_VECTOR_LENGTH = 38

Start offset for the phonetic feature vector in the phonetic data vector

indicnlp.script.indic_scripts.TAMIL_PHONETIC_DATA = None

Phonetic vector for all languages except Tamil

indicnlp.script.indic_scripts.TAMIL_PHONETIC_VECTORS = None

Length of phonetic vector

indicnlp.script.indic_scripts.get_offset(c, lang)[source]
indicnlp.script.indic_scripts.get_phonetic_feature_vector(c, lang)[source]
indicnlp.script.indic_scripts.get_phonetic_feature_vector_offset(offset, lang)[source]
indicnlp.script.indic_scripts.get_phonetic_info(lang)[source]
indicnlp.script.indic_scripts.get_property_value(v, prop_name)[source]
indicnlp.script.indic_scripts.get_property_vector(v, prop_name)[source]
indicnlp.script.indic_scripts.in_coordinated_range(c, lang)[source]
indicnlp.script.indic_scripts.in_coordinated_range_offset(c_offset)[source]

Applicable to Brahmi derived Indic scripts

indicnlp.script.indic_scripts.init()[source]

To be called by library loader, do not call it in your program

indicnlp.script.indic_scripts.invalid_vector()[source]
indicnlp.script.indic_scripts.is_anusvaar(v)[source]
indicnlp.script.indic_scripts.is_consonant(v)[source]
indicnlp.script.indic_scripts.is_dependent_vowel(v)[source]
indicnlp.script.indic_scripts.is_halant(v)[source]
indicnlp.script.indic_scripts.is_indiclang_char(c, lang)[source]

Applicable to Brahmi derived Indic scripts Note that DANDA and DOUBLE_DANDA have the same Unicode codepoint for all Indic scripts

indicnlp.script.indic_scripts.is_misc(v)[source]
indicnlp.script.indic_scripts.is_nukta(v)[source]
indicnlp.script.indic_scripts.is_plosive(v)[source]
indicnlp.script.indic_scripts.is_supported_language(lang)[source]
indicnlp.script.indic_scripts.is_valid(v)[source]
indicnlp.script.indic_scripts.is_vowel(v)[source]
indicnlp.script.indic_scripts.lcsr(srcw, tgtw, slang, tlang)[source]

compute the Longest Common Subsequence Ratio (LCSR) between two strings at the character level.

srcw: source language string tgtw: source language string slang: source language tlang: target language

indicnlp.script.indic_scripts.lcsr_any(srcw, tgtw)[source]

LCSR computation if both languages have the same script

indicnlp.script.indic_scripts.lcsr_indic(srcw, tgtw, slang, tlang)[source]

compute the Longest Common Subsequence Ratio (LCSR) between two strings at the character level. This works for Indic scripts by mapping both languages to a common script

srcw: source language string tgtw: source language string slang: source language tlang: target language

indicnlp.script.indic_scripts.offset_to_char(off, lang)[source]

Applicable to Brahmi derived Indic scripts

indicnlp.script.indic_scripts.or_vectors(v1, v2)[source]
indicnlp.script.indic_scripts.xor_vectors(v1, v2)[source]

english_script Module

indicnlp.script.english_script.ENGLISH_PHONETIC_DATA = None

Phonetic vector for English

indicnlp.script.english_script.ENGLISH_PHONETIC_VECTORS = None

Length of phonetic vector

indicnlp.script.english_script.ID_ARPABET_MAP = {}

Phonetic data for English

indicnlp.script.english_script.PHONETIC_VECTOR_LENGTH = 38

Start offset for the phonetic feature vector in the phonetic data vector

indicnlp.script.english_script.enc_to_offset(c)[source]
indicnlp.script.english_script.enc_to_phoneme(ph)[source]
indicnlp.script.english_script.get_phonetic_feature_vector(p, lang)[source]
indicnlp.script.english_script.get_phonetic_info(lang)[source]
indicnlp.script.english_script.in_range(offset)[source]
indicnlp.script.english_script.init()[source]

To be called by library loader, do not call it in your program

indicnlp.script.english_script.invalid_vector()[source]
indicnlp.script.english_script.offset_to_phoneme(ph_id)[source]
indicnlp.script.english_script.phoneme_to_enc(ph)[source]
indicnlp.script.english_script.phoneme_to_offset(ph)[source]

phonetic_sim Module

indicnlp.script.phonetic_sim.cosine(v1, v2)[source]
indicnlp.script.phonetic_sim.create_similarity_matrix(sim_func, slang, tlang, normalize=True)[source]
indicnlp.script.phonetic_sim.dice(v1, v2)[source]
indicnlp.script.phonetic_sim.dotprod(v1, v2)[source]
indicnlp.script.phonetic_sim.equal(v1, v2)[source]
indicnlp.script.phonetic_sim.jaccard(v1, v2)[source]
indicnlp.script.phonetic_sim.sim1(v1, v2, base=5.0)[source]
indicnlp.script.phonetic_sim.softmax(v1, v2)[source]