transliterate Package

sinhala_transliterator Module

class indicnlp.transliterate.sinhala_transliterator.SinhalaDevanagariTransliterator[source]

Bases: object

A Devanagari to Sinhala transliterator based on explicit Unicode Mapping

static devanagari_to_sinhala(text)[source]
devnag_sinhala_map = {'ऀ': 'ං', 'ँ': 'ං', 'ं': 'ං', 'ः': 'ඃ', 'ऄ': '\u0d84', 'अ': 'අ', 'आ': 'ආ', 'इ': 'ඉ', 'ई': 'ඊ', 'उ': 'උ', 'ऊ': 'ඌ', 'ऋ': 'ඍ', 'ऌ': 'ඏ', 'ऍ': 'ඈ', 'ऎ': 'එ', 'ए': 'ඒ', 'ऐ': 'ඓ', 'ऒ': 'ඔ', 'ओ': 'ඕ', 'औ': 'ඖ', 'क': 'ක', 'ख': 'ඛ', 'ग': 'ග', 'घ': 'ඝ', 'ङ': 'ඞ', 'च': 'ච', 'छ': 'ඡ', 'ज': 'ජ', 'झ': 'ඣ', 'ञ': 'ඤ', 'ट': 'ට', 'ठ': 'ඨ', 'ड': 'ඩ', 'ढ': 'ඪ', 'ण': 'ණ', 'त': 'ත', 'थ': 'ථ', 'द': 'ද', 'ध': 'ධ', 'न': 'න', 'ऩ': 'න', 'प': 'ප', 'फ': 'ඵ', 'ब': 'බ', 'भ': 'භ', 'म': 'ම', 'य': 'ය', 'र': 'ර', 'ल': 'ල', 'ळ': 'ළ', 'व': 'ව', 'श': 'ශ', 'ष': 'ෂ', 'स': 'ස', 'ह': 'හ', 'ा': 'ා', 'ि': 'ි', 'ी': 'ී', 'ु': 'ු', 'ू': 'ූ', 'ृ': 'ෘ', 'ॆ': 'ෙ', 'े': 'ේ', 'ै': 'ෛ', 'ॉ': 'ෑ', 'ॊ': 'ො', 'ो': 'ෝ', 'ौ': 'ෞ', '्': '්'}
sinhala_devnag_map = {'ං': 'ं', 'ඃ': 'ः', '\u0d84': 'ऄ', 'අ': 'अ', 'ආ': 'आ', 'ඇ': 'ऍ', 'ඈ': 'ऍ', 'ඉ': 'इ', 'ඊ': 'ई', 'උ': 'उ', 'ඌ': 'ऊ', 'ඍ': 'ऋ', 'ඏ': 'ऌ', 'එ': 'ऎ', 'ඒ': 'ए', 'ඓ': 'ऐ', 'ඔ': 'ऒ', 'ඕ': 'ओ', 'ඖ': 'औ', 'ක': 'क', 'ඛ': 'ख', 'ග': 'ग', 'ඝ': 'घ', 'ඞ': 'ङ', 'ඟ': 'ङ', 'ච': 'च', 'ඡ': 'छ', 'ජ': 'ज', 'ඣ': 'झ', 'ඤ': 'ञ', 'ඥ': 'ञ', 'ඦ': 'ञ', 'ට': 'ट', 'ඨ': 'ठ', 'ඩ': 'ड', 'ඪ': 'ढ', 'ණ': 'ण', 'ඬ': 'ण', 'ත': 'त', 'ථ': 'थ', 'ද': 'द', 'ධ': 'ध', 'න': 'न', '\u0db2': 'न', 'ඳ': 'न', 'ප': 'प', 'ඵ': 'फ', 'බ': 'ब', 'භ': 'भ', 'ම': 'म', 'ය': 'य', 'ර': 'र', 'ල': 'ल', 'ව': 'व', 'ශ': 'श', 'ෂ': 'ष', 'ස': 'स', 'හ': 'ह', 'ළ': 'ळ', '්': '्', 'ා': 'ा', 'ැ': 'ॉ', 'ෑ': 'ॉ', 'ි': 'ि', 'ී': 'ी', 'ු': 'ु', 'ූ': 'ू', 'ෘ': 'ृ', 'ෙ': 'ॆ', 'ේ': 'े', 'ෛ': 'ै', 'ො': 'ॊ', 'ෝ': 'ो', 'ෞ': 'ौ'}
static sinhala_to_devanagari(text)[source]

unicode_transliterate Module

class indicnlp.transliterate.unicode_transliterate.ItransTransliterator[source]

Bases: object

Transliterator between Indian scripts and ITRANS

static from_itrans(text, lang)[source]

TODO: Document this method properly TODO: A little hack is used to handle schwa: needs to be documented TODO: check for robustness

static to_itrans(text, lang_code)[source]
class indicnlp.transliterate.unicode_transliterate.UnicodeIndicTransliterator[source]

Bases: object

Base class for rule-based transliteration among Indian languages.

Script pair specific transliterators should derive from this class and override the transliterate() method. They can call the super class ‘transliterate()’ method to avail of the common transliteration

static transliterate(text, lang1_code, lang2_code)[source]

convert the source language script (lang1) to target language script (lang2)

text: text to transliterate lang1_code: language 1 code lang1_code: language 2 code

indicnlp.transliterate.unicode_transliterate.init()[source]

To be called by library loader, do not call it in your program

acronym_transliterator Module

class indicnlp.transliterate.acronym_transliterator.LatinToIndicAcronymTransliterator[source]

Bases: object

LATIN_ALPHABET = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
LATIN_TO_DEVANAGARI_TRANSTABLE = {97: 'ए', 98: 'बी', 99: 'सी', 100: 'डी', 101: 'ई', 102: 'एफ', 103: 'जी', 104: 'एच', 105: 'आई', 106: 'जे', 107: 'के', 108: 'एल', 109: 'एम', 110: 'एन', 111: 'ओ', 112: 'पी', 113: 'क्यू', 114: 'आर', 115: 'एस', 116: 'टी', 117: 'यू', 118: 'वी', 119: 'डब्ल्यू', 120: 'एक्स', 121: 'वाय', 122: 'जेड'}
static generate_latin_acronyms(num_acronyms, min_len=2, max_len=6, strategy='random')[source]

generate Latin acronyms in lower case

static get_transtable()[source]
static transliterate(w, lang)[source]

script_unifier Module

class indicnlp.transliterate.script_unifier.AggressiveScriptUnifier(common_lang='hi', nasals_mode='to_nasal_consonants')[source]

Bases: object

transform(text, lang)[source]
class indicnlp.transliterate.script_unifier.BasicScriptUnifier(common_lang='hi', nasals_mode='do_nothing')[source]

Bases: object

transform(text, lang)[source]
class indicnlp.transliterate.script_unifier.NaiveScriptUnifier(common_lang='hi')[source]

Bases: object

transform(text, lang)[source]