rf5

Machine Learning rf5 • 1 year ago • 100%

Voice Conversion With Just Nearest Neighbors https://arxiv.org/abs/2305.18975

TL;DR: want to convert your voice to another person's voice? Or even to a whisper? Or a dog barking? Or to any other random speech clip? Give our new method a try: [https://bshall.github.io/knn-vc](https://bshall.github.io/knn-vc) Longer version: our research team kept seeing new voice conversion methods getting more complex and becoming harder to reproduce. So, we tried to see if we could make a top-tier voice conversion model that was extremely simple. So, we made kNN-VC, where our entire conversion model is just k-nearest neighbors on WavLM features. And, it turns out, this does as well if not better than very complex any-to-any voice conversion methods. What's more, since k-nearest neighbors has no parameters, we can use anything as the reference, even clips of dogs barking, music, or references from other languages. I hope you enjoy our research! We provide a quick-start notebook, code, and audio samples, and vocoder checkpoints [https://bshall.github.io/knn-vc/](https://bshall.github.io/knn-vc/)