German Wiktionary as clean JSON

Our next, in the series of “making data available” is an extract of German Wiktionary…

“What is so special about that?”, you may ask… Well, you can of course always download a snapshot of Wiktionary. But the format they have is absolutely useless for anything else than using it for … Wiktionary.

So, we have written a tool that converts it computer-processable format.

The data structure is like this:

{
   'word': {
      'is_verb': [true | false],
      'is_toponym': [true | false],
      'word_type': {
         'lang': 'Deutsch',
         'type': '...', # e.g. "Deklinierte Form" or "Konjugierte Form" or [Substantiv, Nominativ, ...]
         'gender': [m | f]
      },
      'base_form': <base-string>
      'parents': [...],
      'definitions': [...],
      'sub_terms': [...],
      'grammar_attributes': [...] OR "...",
      'declinations': [
        {
           'genus': [m | f | n]
        },
        {
           'case': [Stamm | Nominativ ...],
           'sing_plur': [singular | plural],
           'decl': <decl of this case>
         },
         ...
      ]
      'synonyms': [...],
      'antonyms': [...],
      'word_connections': [...],
      'similars': [...],
      'examples': [...],
   },
   ...
}

The total JSON-file unpacked is about 465MB. Compressed size is 49MB. Please download it from our data-server.

Overall, the JSON-file contains over 600,000 keywords (terms).

Please read the LICENSE-file first and see if it applies to you.

NB: Over time, we will check whether we can extract even more information from the Wiktionary. Also, we will update the data from time to time with newer snapshots of Wiktionary.