Adversarial one-shot voice conversion using disentangled representations

dc.contributorGraduate Program in Computer Engineering.
dc.contributor.advisorGürgen, Fikret.
dc.contributor.authorYeşilkanat, Ali.
dc.date.accessioned2023-03-16T10:04:42Z
dc.date.available2023-03-16T10:04:42Z
dc.date.issued2020.
dc.description.abstractIn this thesis, a new adversarial one-shot voice conversion (VC) method is introduced by enhancing one of the latest variational autoencoder based one-shot VC methods. The proposed method utilizes acoustic features as Mel-spectrograms and relies on disentangled representations by separating speaker and content representations of the spoken content. An adversarial loss and perceptual loss are combined in order to increase the quality of generated Mel-spectrograms. We train a speaker classi er by utilizing the architecture of a well-known model in the computer vision area, to be able to adapt perceptual loss during the training of the VC model. We conduct experiments on the Voice Cloning Toolkit dataset and evaluate the proposed approach in terms of Global Variance and MOSNet, a humanoid opinion score simulator. Experimental results indicate that our approach improves VC quality remarkably.
dc.format.extent30 cm.
dc.format.pagesxiv, 64 leaves ;
dc.identifier.otherCMPE 2020 Y47
dc.identifier.urihttps://digitalarchive.library.bogazici.edu.tr/handle/123456789/12430
dc.publisherThesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2020.
dc.subject.lcshVoice output communication aids.
dc.subject.lcshSpeech processing systems.
dc.titleAdversarial one-shot voice conversion using disentangled representations

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
b2714181.035534.001.PDF
Size:
3.49 MB
Format:
Adobe Portable Document Format

Collections