We still have a long way to go (and also figure out why we had a dead-lock during Multi-GPU-training). Because of that deadlock, we are currently training it on a single GPU while we analyze the deadlock-problem.
Regardless, we have generated a few samples.
The first sample is a 3 second audio after 9 epochs. As you can hear, it does sound like something but we are not there yet.
The second sample is a 30-second audio after 11 epochs. It is getting a little better but there is too much pause. This is also due to a bug.
The third sample is already 60 seconds long after 12 epochs. This one already sounds more like something, but we are not there yet (btw: keep listening until the end).
The fourth sample is also 60 seconds long, after 13 epochs. Hmm, it is getting worse. Well, let’s keep training.
Actually, ‘max_epoch’ is set for ‘1,000’ but we’ll never get there (at the current rate, that would be 6,000 hours = 250 days!!)
In the next weeks, we’ll analyze the deadlock problem and see if we can increase the training speed by distributing across four GPUs across two servers…
In any case, it is already promising. Stay tuned…
(BTW: http://data.m-ailabs.bayern/ will be where we will publish training data, samples, and more)…