Making Data Available (cont’d series)

In the series of making data available for machine learning, we have some really good news.

The last couple of weeks we have been very, very busy creating training data. This is just a heads-up to let you know that we haven’t forgotten you guys.

So far, we have generated training data for speech recognition and speech synthesis for multiple languages. See the list below.

We are currently in final QA, testing and documentation. After that… you know what’s awaiting you… You’ll find everything on our data-server. We should have everything available by begin of March.

  • German:
    • Female: 23h 58m
    • Male:   36h 33m
    • Special: ~15h
  • English (US):
    • Female: 36h 43m
    • Male:  38h 23m
  • Spanish (Spain):
    • Female: 10h 36m
    • Male 1:  55h 05m
    • Male 2: 17h 19m
    • Mixed: 25h 33m
  • Italian:
    • Female: 08h 23m
    • Male:  31h 45m
    • Mixed:  87h 53m
  • Russian:
    • Female: 16h 03m
    • Male 1:  20h 45m
    • Male 2: 09h 58m
  • Ukrainian:
    • Female: 10h 28m
    • Male 1:  11h 26m
    • Male 2: 02h 50m
    • Male 3: 18h 53m

Total size: ~60 GB so far…

There will be detailed documentation and we’ll present some really interesting insights during a MeetUp in Munich in March. Stay tuned…