Sketch Engine
Sketch Engine | Corpus Multilingual Support Natural Language Processing [For Overseas Product Procurement, Unipos]
Freely analyze multilingual and large-scale corpora! The perfect tool for language research, translation, and NLP.
Ketch Engine is a professional language processing tool for building, searching, and analyzing corpora. In addition to 800 existing corpora that support over 100 languages, including Japanese, users can upload their own corpora to create practical language data that cannot be obtained from commercial dictionaries or textbooks. One of its features, "Word Sketch," automatically extracts and classifies the grammatical and collocational characteristics of specific words and presents them visually. Moreover, its intuitive GUI allows for advanced language analysis without the need for programming knowledge, making it useful across a wide range of fields from specialized language research to educational settings. Main Uses: - Linguistics and Vocabulary Research: Analyzing usage trends and grammatical structures from real data - Natural Language Processing and AI Development: Extracting training data and improving the accuracy of language models - Translation and Interpretation: Useful for searching examples to confirm natural translations and usages - Language Education and Material Development: Creating vocabulary and grammar materials using practical example sentences - Dictionary Compilation and Publishing: Editing dictionaries with high accuracy based on the meanings and usages of words.
Inquire About This Product
basic information
Supported languages: Supports over 100 languages including Japanese, enabling multilingual comparisons Included corpus: Over 800, totaling approximately 1 trillion words (up to 80 billion words scale) Main features: - Word Sketch (visualization of co-occurrence information) - Concordancer (contextual search) - Thesaurus (extraction of semantically similar words) - Word/N-gram frequency list generation - Corpus construction and automatic tagging Custom corpus construction: Supports text upload and web collection Supported formats: TXT, XML, TEI, JSON (UTF-8 recommended) Search function: Advanced search supporting regular expressions, part-of-speech tags, and CQL Operating environment: Cloud-based (no installation required), Chrome recommended License: Annual contract (academic/commercial) Developer: Lexical Computing CZ s.r.o. (Czech Republic)
Price range
Delivery Time
Applications/Examples of results
【Expected Uses】 - Academic language research and paper writing using large-scale corpora - Extraction of training data for natural language processing (NLP) development - Example search and verification of natural phrasing in translation work - Creation of teaching materials and analysis of frequently used expressions in language education - Support for dictionary compilation and vocabulary database construction
Line up(1)
Model number | overview |
---|---|
Sketch Engine | Please let us know your preferred license type |
Company information
Tegara Corporation is forming a research and development platform that integrates specialized product procurement and sales, information provision, and support services for researchers and developers nationwide. In the field of research and development, where speed holds value, Tegara's mission is to assist customers in accelerating their research and development efforts, thereby contributing to the advancement of research and development in Japan and around the world. To remain a reliable partner for researchers and developers, our company continuously hones new technologies and strengthens our support system every day.