Video Diffusion Alignment via Reward Gradient

Mihir Prabhudesai^* Zheyang Qin^* Russell Mendonca^* Katerina Fragkiadaki Deepak Pathak
Carnegie Mellon University

Paper Code Demo

Abstract

We have made significant progress towards building foundational video diffusion models. As these models are trained using large-scale unsupervised data, it has become crucial to adapt these models to specific downstream tasks, such as video-text alignment or ethical video generation. Adapting these models via supervised fine-tuning requires collecting target datasets of videos, which is challenging and tedious. In this work, we instead utilize pre-trained reward models that are learned via preferences on top of powerful discriminative models. These models contain dense gradient information with respect to generated RGB pixels, which is critical to be able to learn efficiently in complex search spaces, such as videos. We show that our approach can enable alignment of video diffusion for aesthetic generations, similarity between text context and video, as well long horizon video generations that are 3X longer than the training sequence length. We show our approach can learn much more efficiently in terms of reward queries and compute than previous gradient-free approaches for video generation.

Aesthetic and HPS Reward

Before

After

A person singing on a stage with twinkling lights and a dreamy backdrop.

Before

After

A rabbit playing an upright piano in a quaint cafe while patrons enjoy their coffee.

Before

After

The dog is wearing a green shirt and digging a hole.

Before

After

A person playing a guitar by a campfire under a starry sky.

Before

After

The fox is wearing a red hat and playing with leaves.

Before

After

A person running through a field with a kite soaring high in the sky.

Before

After

The raccoon is wearing a red coat and holding a snowball.

Before

After

The bear is wearing a yellow shirt and painting a picture.

Before

After

The panda is wearing a blue hat and holding a balloon.

Before

After

The fox is wearing a green shirt and sitting in a garden.

Before

After

The rabbit is wearing a red shirt and watering plants.

Before

After

A person picking fresh vegetables in a garden with morning dew glistening.

Before

After

A teacher writing on a chalkboard in a classroom, explaining a lesson to students.

Before

After

A person writing poetry in a cozy room with a fireplace crackling nearby.

Before

After

A dog playing a slide guitar on a porch during a gentle rainstorm.

PickScore Reward

Before

After

A peaceful deer eating grass in a thick forest, with sunlight filtering through the trees.

Before

After

A strong lion and a graceful lioness resting together in the shade of a big tree on a wide grassland.

Before

After

Hedgehog in a blue dress.

Before

After

Raccoon in a black jacket.

Before

After

A dog strumming an acoustic guitar by a lakeside campfire under the stars-origin.

HPS Reward

Before

After

Raccoon eating watermelon.

Before

After

A turtle playing chess.

Before

After

Monkey eating watermelon.

Before

After

A fox washing the dishes

Before

After

Rabbit riding a skateboard.

Before

After

A turtle washing the dishes

Before

After

A lizard riding a bike

Before

After

A goose playing chess.

Before

After

A hedgehog playing chess.

Object Removal Reward

Removing books with the YOLOS object detection model.

Before

After

A book and a cup of hot chocolate on a windowsill with a snowy view.

Before

After

A book and a cup of tea on a blanket in a sunflower field.

Before

After

A book and a cup of coffee on a rustic wooden table in a cabin.

Before

After

A book and a cup of herbal tea on a bedside table.

V-JEPA Reward

Improve temporal consistency for Stable Video Diffusion, an image-to-video model.

Before

After

Before

After

Before

After

Before

After

Before

After

Before

After

Aesthetic and ViCLIP Reward

Improve text-video alignment for VideoCrafter2.

Before

After

A raccoon strumming a ukulele while sitting on a log by a campfire.

Before

After

A rabbit hitting the drums with its paws in a garden with blooming flowers.

Before

After

A raccoon strumming a guitar while perched on a tree branch.

Before

After

A dog playing a piano with its paws in a living room illuminated by fairy lights.

Before

After

A cat strumming a tiny guitar on a windowsill overlooking a garden.