Sketch Engine

Added to bookmarks
Bookmarks list

Bookmark has been removed
Bookmarks list

You can't add any more bookmarks

By registering as a member, you can increase the number of bookmarks you can save and organize them with labels.
Free membership registration

Sketch Engine | Corpus Multilingual Support Natural Language Processing [For Overseas Product Procurement, Unipos]

last updated：Jul 11, 2025

テガラ本社

Official site

Freely analyze multilingual and large-scale corpora! The perfect tool for language research, translation, and NLP.

Ketch Engine is a professional language processing tool for building, searching, and analyzing corpora. In addition to 800 existing corpora that support over 100 languages, including Japanese, users can upload their own corpora to create practical language data that cannot be obtained from commercial dictionaries or textbooks. One of its features, "Word Sketch," automatically extracts and classifies the grammatical and collocational characteristics of specific words and presents them visually. Moreover, its intuitive GUI allows for advanced language analysis without the need for programming knowledge, making it useful across a wide range of fields from specialized language research to educational settings. Main Uses: - Linguistics and Vocabulary Research: Analyzing usage trends and grammatical structures from real data - Natural Language Processing and AI Development: Extracting training data and improving the accuracy of language models - Translation and Interpretation: Useful for searching examples to confirm natural translations and usages - Language Education and Material Development: Creating vocabulary and grammar materials using practical example sentences - Dictionary Compilation and Publishing: Editing dictionaries with high accuracy based on the meanings and usages of words.

Related Link - https://www.unipos.net/products/sketch-engine/

Inquire About This Product

basic information

Supported languages: Supports over 100 languages including Japanese, enabling multilingual comparisons Included corpus: Over 800, totaling approximately 1 trillion words (up to 80 billion words scale) Main features: - Word Sketch (visualization of co-occurrence information) - Concordancer (contextual search) - Thesaurus (extraction of semantically similar words) - Word/N-gram frequency list generation - Corpus construction and automatic tagging Custom corpus construction: Supports text upload and web collection Supported formats: TXT, XML, TEI, JSON (UTF-8 recommended) Search function: Advanced search supporting regular expressions, part-of-speech tags, and CQL Operating environment: Cloud-based (no installation required), Chrome recommended License: Annual contract (academic/commercial) Developer: Lexical Computing CZ s.r.o. (Czech Republic)

Price range

Please contact us for details

Delivery Time

Please contact us for details

Applications/Examples of results

【Expected Uses】 - Academic language research and paper writing using large-scale corpora - Extraction of training data for natural language processing (NLP) development - Example search and verification of natural phrasing in translation work - Creation of teaching materials and analysis of frequently used expressions in language education - Support for dictionary compilation and vocabulary database construction