These are random source-target pairs from the valiadtion set, and the Griffin-Lim Algorithm is used for mel spectrogram inversion. v-swap (z-swap) translates (preserves) instrument timbre and preserves (translates) pitch.
For each source-target pair, the baseline TS-DSAE comes first followed by the proposed J-DSAE.
NOTE: Start with a low volume on a headphone.
source | recon. | v-swap | z-swap | recon. | target |
---|---|---|---|---|---|