Iterative Text-based Editing of Talking-heads Using Neural Retargeting

Supplemental Materials

[back to index]

Comparison to Text-based Editing of Talking-head Video

We compare to Text-based Editing of Talking-head Video [Fried et al. 2019] using the same amount of target video (< 5 minutes) as well as over 12 times (1 hour) the amount of target video.
Notice the jumpiness and erroneous mouth motions in the Fried et al. results.

Fried et al. (< 5 min) Fried et al. (> 1 hr) Ours (< 5 min)
Fried et al. (< 5 min) Fried et al. (> 1 hr) Ours (< 5 min)
Fried et al. (< 5 min) Fried et al. (> 1 hr) Ours (< 5 min)
Fried et al. (< 5 min) Fried et al. (> 1 hr) Ours (< 5 min)
Fried et al. (< 5 min) Fried et al. (> 1 hr) Ours (< 5 min)