MMAudio can generate Audio from any Video

There are many video diffusion open source models in the market but when it comes to generating audio you do not have much options. MMAudio can solve your problem in generating synchronized audios from your referenced video with inputted text.

The model, is trained on a large dataset based on audio-text and audio-visuals gives you a better synchronized output with your inputted video.

The model is released by the group of University of Illinois Urbana-Champaign, Sony AI, and Sony Group Corporation that is registered under MIT license which means it can used for research and commercial purpose. To get a detailed understanding, you can go through their research paper.

Table of Contents:

Installing custom nodes

1. Get ComfyUI installed into the machine and the older user should update it from the Manager.

2. Inside "ComfyUI/custom_nodes" folder, open command prompt by tying "cmd" on folder address bar. Clone the repository using the following commands provided below:

~~git clone https://github.com/kijai/ComfyUI-MMAudio.git~~

3. Next you need to install the required dependencies. For normal ComfyUI users, type this into command prompt:

~~pip install -r requirements.txt~~

For ComfyUI portable user, go inside the ComfyUI_windows_portable folder open the command prompt like before, and type this command:

~~python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI-MMAudio\requirements.txt~~

4. Now just download the syncformers, Clip, VAE, and base models from Hugging Face repository.

Get any of the two (fp16 or Fp32) variant shown above. Make sure if you are using fp16 model then use the fp16 based vae and clip models.

Move inside "ComfyUI/models" folder. Create a new "mmaudio" folder. After downloading, save these models inside this folder.

5. Then you also need Nvidia bigvganv2 (used with 44k mode) that get automatically downloaded from Hugging face repository when you run the workflow for the first time. All the files get saved into "ComfyUI/models/mmaudio/nvidia" folder. To get the real-time status you can switch to ComfyUI's backend terminal.

You can also try manually downloading all the files from the respective repository but it has lots of files which is a tedious task for a newbie. So, it's not recommended.

6. Restart and refresh ComfyUI to take effect.

Workflow

1. Get the workflow (mmaudio_test.json file) inside your "ComfyUI/custom_nodes/ComfyUI-MMAudio/examples" folder.

2. Drag and drop into ComfyUI canvas.

3. Follow the steps given below:

(a) Load mmaudio model into the MMAudio node.

(b) Load required VAE, synchformer, and Clip model into MMAudio feature node.

(c) Load your AI video without audio. Here, the video length should be lesser than 1 minute. Longer videos will give you out of memory error if you are using 12GB VRAM or lesser. We uploaded 8 seconds long AI generated video.

We uploaded video without audio sample. This is AI generated video using Google Veo.

(d) Now put your related positive and negative prompts for your video.

We added prompt as: a human skating on moon surface.

CFG: 4.5

Steps: 25

(e) Generate video by click on "Run" button.

The generated output has been shown below which we have posted on our X(twitter) platform.

MMAudio - Generate Background Audio for your AI videos... 🎧

Install+workflow: 😀😉https://t.co/itYdMkwIK4 pic.twitter.com/RzE5XEbIz5
— Stable Diffusion Tutorials (@SD_Tutorial) December 24, 2024

MMAudio can generate Audio from any Video

Installing custom nodes

Workflow

Posted by Admin

Search This Blog

Trending

Wan 2.1: Install & Generate Videos/Images locally with lower VRAM

Train your WAN2.1 Lora model on Windows/Linux

Wan2.1 FusionX 14B: Consistent Fast VideoGen with Low VRAM

Easy Install ComfyUI Portable (Windows/Mac/Linux)

Run Stable Diffusion 10x faster on AMD GPUs

Installing Stable Diffusion 3.5 Locally

Our Social Pages

Recent Posts

Important pages

Contact form

MMAudio can generate Audio from any Video

Installing custom nodes

Workflow

Posted by Admin

Related Posts

Search This Blog

Trending

Wan 2.1: Install & Generate Videos/Images locally with lower VRAM

Train your WAN2.1 Lora model on Windows/Linux

Wan2.1 FusionX 14B: Consistent Fast VideoGen with Low VRAM

Easy Install ComfyUI Portable (Windows/Mac/Linux)

Run Stable Diffusion 10x faster on AMD GPUs

Installing Stable Diffusion 3.5 Locally

Our Social Community

Our Social Pages

Recent Posts

Important pages

Contact form