document-understanding
2024.10
true
- Overview
- Getting started
- Building models
- Consuming models
- Model Details
- 1040 - ML package
- 1040 Schedule C - ML package
- 1040 Schedule D - ML package
- 1040 Schedule E - ML package
- 1040x - ML package
- 3949a - ML package
- 4506T - ML package
- 709 - ML package
- 941x - ML package
- 9465 - ML package
- ACORD125 - ML package
- ACORD126 - ML package
- ACORD131 - ML package
- ACORD140 - ML package
- ACORD25 - ML package
- Bank Statements - ML package
- Bills Of Lading - ML package
- Certificate of Incorporation - ML package
- Certificate of Origin - ML package
- Checks - ML package
- Children Product Certificate - ML package
- CMS 1500 - ML package
- EU Declaration of Conformity - ML package
- Financial Statements - ML package
- FM1003 - ML package
- I9 - ML package
- ID Cards - ML package
- Invoices - ML package
- Invoices Australia - ML package
- Invoices China - ML package
- Invoices Hebrew - ML package
- Invoices India - ML package
- Invoices Japan - ML package
- Invoices Shipping - ML package
- Packing Lists - ML package
- Payslips - ML package
- Passports - ML package
- Purchase Orders - ML package
- Receipts - ML Package
- Remittance Advices - ML package
- UB04 - ML package
- Utility Bills - ML package
- Vehicle Titles - ML package
- W2 - ML package
- W9 - ML package
- Public endpoints
- Supported languages
- Insights dashboards
- Data and security
- Licensing and Charging Logic
- How to
Document Understanding Modern Projects User Guide
Last updated Nov 20, 2024
OCR
Each OCR engine is
tailored to deliver efficient and effective optical character recognition, regardless of
your specific needs or deployment. This page provides information on the supported
languages for UiPath® OCR engines:
- UiPath Document OCR: default UiPath OCR, which receives regular updates and improvements. You can use it on either GPU or CPU, delivering the same level of accuracy in both cases.
- UiPath Document OCR_CPU: specially optimized to run on CPU.
- UiPath Extended Languages OCR: capable of processing documents in over 200 languages, especially in Chinese, Korean, Vietnamese, Thai, major Indian languages, and languages that use the Cyrilic or Greek alphabets.
Tip: Choosing the right
OCR engine for your documents is simple. By default, use the UiPath Document OCR,
which benefits from regular updates and improvements. If this doesn't support your
document language or it's not performing as expected, switch to one of our other OCR
engines, like the UiPath Extended Languages OCR.
Language (Language Code) | UiPath Document OCR and UiPath Document OCR_CPU | UiPath Extended Languages OCR | Chinese, Japanese, Korean OCR |
---|---|---|---|
Adyghe (ADY) | |||
Afar (AA) | |||
Afrikaans (AFR) | |||
Akan (AK) | |||
Albanian (SQI) | |||
Algonquin (ALQ) | |||
Angika (Devanagari) (ANP) | |||
Arabic (ARA) | |||
Asturian (AST) | |||
Asu (ASA) | |||
Avaric (AV) | |||
Awadhi-Hindi (Devanagari) (AWA) | |||
Aymara (AYM) | |||
Azerbaijani (Latin) (AZ) | |||
Bafia (KSF) | |||
Bagheli (BFY) | |||
Bambara (BM) | |||
Bashkir (BA) | |||
Basque (EU) | |||
Belarusian (Cyrilic) (BE, BE-CYRL) | |||
Belarusian (Latin) (BE, BE-LATN) | |||
Bemba (BEM) | |||
Bena (BEZ) | |||
Bhojpuri-Hindi (Devanagari) (BHO) | |||
Bikol (BIK) | |||
Bislama (BI) | |||
Bodo (Devanagari) (BRX) | |||
Bosnian (Latin) (BS) | |||
Brajbha (BRA) | |||
Breton (BR) | |||
Bulgarian (BG) | |||
Bundeli (BNS) | |||
Buryat (Cyrilic) (BUA) | |||
Catalan (CA) | |||
Cebuano (CEB) | |||
Chamling (RAB) | |||
Chamorro (CH) | |||
Chechen (CE) | |||
Chhattisgarhi (Devanagari) (HNE) | |||
Chiga (CGG) | |||
Chinese - Simplified (ZH-Hans) | |||
Chinese - Traditional (Hant) | |||
Choctaw (CHO) | |||
Chukot (CKT) | |||
Chuvash (CV) | |||
Cornish (KW) | |||
Corsican (CO) | |||
Cree (CR) | |||
Creek (MUS) | |||
Crimean Tatar (Latin) (CRH) | |||
Croatian (HR) | |||
Crow (CRO) | |||
Czech (CS) | |||
Danish (DA) | |||
Dargwa (DAR) | |||
Dari (PRS) | |||
Dhimal (Devanagari) (DHI) | |||
Dogri (Devanagari) (DOI) | |||
Duala (DUA) | |||
Dungan (DNG) | |||
Dutch (NL) | |||
Efik (EFI) | |||
English (EN) | |||
Erzya (Cyrilic) (MYV) | |||
Estonian (ET) | |||
Faroese (FO) | |||
Fijian (FJ) | |||
Filipino (FIL) | |||
Finnish (FI) | |||
Fon (FON) | |||
French (FR) | |||
Friulian (FUR) | |||
Ga (GAA) | |||
Gaelic - Irish (GA) | |||
Gaelic - Scottish (GD) | |||
Gagauz (Latin) (GAG) | |||
Galician (GL) | |||
Ganda (LG) | |||
Gayo (GAY) | |||
German (DE) | |||
Gilbertese (GIL) | |||
Gondi (Devanagari) (GON) | |||
Greek (EL) | |||
Greenlandic (KL) | |||
Guarani (GN) | |||
Gurung (Devanagari) | |||
Gusii (GUZ) | |||
Haitian Creole (HT) | |||
Halbi (Devanagari) (HLB) | |||
Hani (HNI) | |||
Haryanvi (BGC) | |||
Hawaiian (HAW) | |||
Hebrew (HE) | |||
Herero (HZ) | |||
Hiligaynon (HIL) | |||
Hindi (HI) | |||
Hmong Daw (Latin) (MWW) | |||
Ho (Devanagari) (HOC) | |||
Hungarian (HU) | |||
Iban (IBA) | |||
Icelandic (IS) | |||
Igbo (IG) | |||
Iloko (ILO) | |||
Inari Sami (SMN) | |||
Indonesian (ID) | |||
Ingush (INH) | |||
Interlingua (IA) | |||
Inuktitut (Latin) (IU) | |||
Italian (IT) | |||
Japanese (JA) | |||
Jaunsari (Devanagari) (JNS) | |||
Javanese (JV) | |||
Jola-Fonyi (DYO) | |||
Kabardian (KBD) | |||
Kabuverdianu (KEA) | |||
Kachin (Latin) (KAC) | |||
Kalenjin (KLN) | |||
Kalmyk (XAL) | |||
Kangri (Devanagari) (XNR) | |||
Kanuri (KR) | |||
Karachay-Balkar (KRC) | |||
Kara-Kalpak (Cyrilic) (KAA-CYR) | |||
Kara-Kalpak (Latin) (KAA) | |||
Kashubian (CSB) | |||
Kazakh (Cyrilic) (KK-CYR) | |||
Kazakh (Latin) (KK-LATN) | |||
Khakas (KJH) | |||
Khaling (KLR) | |||
Khasi (KHA) | |||
K'iche' (QUC) | |||
Kikuyu (KI) | |||
Kildin Sami (SJD) | |||
Kinyarwanda (RW) | |||
Komi (KV) | |||
Kongo (KN) | |||
Korean (KO) | |||
Korku (KFQ) | |||
Koryak (KPY) | |||
Kosraean (KOS) | |||
Kpelle (KPE) | |||
Kuanyama (KJ) | |||
Kumyk (Cyrilic) (KUM) | |||
Kurdish (Arabic) (KU-ARAB) | |||
Kurdish (Latin) (KU-LATN) | |||
Kurukh (Devanagari) (KRU) | |||
Kyrgyz (Cyrilic) (KY) | |||
Lak (LBE) | |||
Lakota (LKT) | |||
Latin (LA) | |||
Latvian (LV) | |||
Lezghian (LEX) | |||
Lingala (LN) | |||
Lithuanian (LT) | |||
Lower Sorbian (DSB) | |||
Lozi (LOZ) | |||
Lule Sami (SMJ) | |||
Luo (Kenya and Tanzania) (LUO) | |||
Luxembourgish (LB) | |||
Luyia (LUY) | |||
Macedonian (MK) | |||
Machame (JMC) | |||
Madurese (MAD) | |||
Mahasu Pahari (Devanagari) (BFZ) | |||
Makhuwa-Meetto (MGH) | |||
Makonde (KDE) | |||
Malagasy (MG) | |||
Malay (Latin) (MS) | |||
Maltese (MT) | |||
Malto (Devanagari) (KMJ) | |||
Mandinka (MNK) | |||
Manx (GV) | |||
Maori (MI) | |||
Mapundungun (ARN) | |||
Marathi (MR) | |||
Mari (Russia) (CHM) | |||
Masai (MAS) | |||
Mende (Sierra Leone) (MEN) | |||
Meru (MER) | |||
Meta' (MGO) | |||
Minangkabau (MIN) | |||
Mohawk (MOH) | |||
Mongolian (Cyrilic) (MN) | |||
Mongondow (MOG) | |||
Montenegrin (Cyrilic) (CNR-CYRL) | |||
Montenegrin (Latin) (CNR-LATN) | |||
Morisyen (MFE) | |||
Mundang (MUA) | |||
Nahuatl (NAH) | |||
Navajo (NV) | |||
Ndonga (NG) | |||
Neapolitan (NAP) | |||
Nepali (NE) | |||
Ngomba (JGO) | |||
Niuean (NIU) | |||
Nogay (NOG) | |||
North Ndebele (ND) | |||
Northern Sami (Latin) (SME) | |||
Norwegian (NO) | |||
Nyanja (NY) | |||
Nyankole (NYN) | |||
Nzima (NZI) | |||
Occitan (OC) | |||
Ojibway (OJ) | |||
Oromo (OM) | |||
Ossetic (OS) | |||
Pampanga (PAM) | |||
Pangasinan (PAG) | |||
Papiamento (PAP) | |||
Pashto (PS) | |||
Pedi (NSO) | |||
Persian (FA) | |||
Polish (PL) | |||
Portuguese (PT) | |||
Punjabi (Arabic) (PA) | |||
Quechua (QU) | |||
Ripurian (KSH) | |||
Romanian (RO) | |||
Romansh (RM) | |||
Rundi (RN) | |||
Russian (RU) | |||
Rwa (RWK) | |||
Sadri (Devanagari) (SCK) | |||
Sakha (SAH) | |||
Samburu (SAQ) | |||
Samoan (Latin) (SM) | |||
Sango (SG) | |||
Sangu (Gabon) | |||
Sanskrit (Devanagari) (SA) | |||
Santali (Devanagari) (SAT) | |||
Scots (SCO) | |||
Sena (SEH) | |||
Serbian (Cyrilic) (SR-CYRL) | |||
Serbian (Latin) (SR, SR-LATN)) | |||
Shambala (KSB) | |||
Shona (SN) | |||
Siksika (BLA) | |||
Sirmauri (Devanagari) (SRX) | |||
Skolt Sami (SMS) | |||
Slovak (SK) | |||
Slovenian (SL) | |||
Soga (XOG) | |||
Somali (Arabic) (SO) | |||
Somali (Latin) (SO-LATN) | |||
Songhai (SON) | |||
South Ndebele (NR) | |||
Southern Altai (ALT) | |||
Southern Sami (SMA) | |||
Southern Sotho (ST) | |||
Spanish (ES) | |||
Sundanese (SU) | |||
Swahili (Latin) (SW) | |||
Swati (SS) | |||
Swedish (SV) | |||
Tabassaran (TAB) | |||
Tachelhit (SHI) | |||
Tahitian (TY) | |||
Taita (DAV) | |||
Tajik (Cyrilic) (TG) | |||
Tamil (TA) | |||
Tatar (Cyrilic) (TT-CYRL) | |||
Tatar (Latin) (TT) | |||
Teso (TEO) | |||
Tetum (TET) | |||
Thai (TH) | |||
Thangmi (THF) | |||
Tok Pisin (TPI) | |||
Tongan (TO) | |||
Tsonga (TS) | |||
Tswana (TN) | |||
Turkish (TR) | |||
Turkmen (Latin) (TK) | |||
Tuvan (TYV) | |||
Udmurt (UDM) | |||
Uighur (Cyrilic) (UG-CYRL) | |||
Ukranian (UK) | |||
Upper Sorbian (HSB) | |||
Urdu (UR) | |||
Uyghur (Arabic) (UG) | |||
Uzbek (Arabic) (UZ-ARAB) | |||
Uzbek (Cyrilic) (UZ-CYRL) | |||
Uzbek (Latin) (UZ) | |||
Vietnamese (VI) | |||
Volapuk (VO) | |||
Vunjo (VUN) | |||
Walser (WAE) | |||
Welsh (CY) | |||
Western Frisian (FY) | |||
Wolof (WO) | |||
Xhosa (XH) | |||
Yucatec Maya (YUA) | |||
Zapotec (ZAP) | |||
Zarma (DJE) | |||
Zhuang (ZA) | |||
Zulu (ZU) |
Alphabet | UiPath Document OCR | |
---|---|---|
Hebrew | א ב ג ד ה ו ז ח ט י ך כ ל ם מ ן נ ס ע ף פ ץ צ ק ר ש ת ₪ | |
Latin | A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý ß à á â ã ä å æ ç è é ê ë ì í î ï ñ ò ó ô õ ö ø ù ú û ü ý Ā ā Ă ă Ą ą Ć ć Ċ ċ Č č Ď ď Đ đ Ē ē Ė ė Ę ę Ě ě Ğ ğ Ġ ġ Ħ ħ Ī ī Ĭ ĭ Į į İ ı Ĺ ĺ Ľ ľ Ł ł Ń ń Ň ň Ŋ ŋ Ō ō Ő ő Œ œ Ŕ ŕ Ř ř Ś ś Š š Ť ť Ŧ ŧ Ū ū Ŭ ŭ Ů ů Ų ų Ź ź Ż ż Ž ž Ə Ǵ ǵ Ș ș Ț ț ə μ | |
Other characters | ! " # $ % & \ ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ [ \ \ ] ^ _ { | } ~ £ ¥ § © ® ° ¿ € ≤ ≥ |