All the models seem to fail at correctly preserving the shape of the objects in most of the cases. Color and size are slightly better preserved for dis-VAE, SVG-LP and MTC-VAE, while β-TCVAE presents the worst results. SVG-LP presents the best shape preservation but, on the other side, fails to render the motion of the driving video.