Thursday, January 23, 2025

The way to Practice and Use Hunyuan Video LoRA Fashions

Date:

This text will present you the right way to set up and use Home windows-based software program that may prepare Hunyuan video LoRA fashions, permitting the person to generate customized personalities within the Hunyuan Video basis mannequin:

Click on to play. Examples from the current explosion of  movie star Hunyuan LoRAs from the civit.ai neighborhood.

For the time being the 2 hottest methods of producing Hunyuan LoRA fashions regionally are:

1) The diffusion-pipe-ui Docker-based framework, which depends on Home windows Subsystem for Linux (WSL) to deal with a few of the processes.

2) Musubi Tuner, a brand new addition to the favored Kohya ss diffusion coaching structure. Musubi Tuner doesn’t require Docker and doesn’t rely upon WSL or different Linux-based proxies – however it may be tough to get working on Home windows.

Due to this fact this run-through will give attention to Musubi Tuner, and on offering a very native answer for Hunyuan LoRA coaching and era, with out the usage of API-driven web sites or industrial GPU-renting processes resembling Runpod.

Click on to play. Samples from LoRA coaching on Musubi Tuner for this text. All permissions granted by the individual depicted, for the needs of illustrating this text.

REQUIREMENTS

The set up would require at minimal a Home windows 10 PC with a 30+/40+ collection NVIDIA card that has not less than 12GB of VRAM (although 16GB is advisable). The set up used for this text was examined on a machine with 64GB of system RAM and a NVIDIA 3090 graphics playing cards with 24GB of VRAM. It was examined on a devoted test-bed system utilizing a recent set up of Home windows 10 Skilled, on a partition with 600+GB of spare disk house.

WARNING

Putting in Musubi Tuner and its conditions additionally entails the set up of developer-focused software program and packages instantly onto the principle Home windows set up of a PC. Taking the set up of ComfyUI into consideration, for the top phases, this mission would require round 400-500 gigabytes of disk house. Although I’ve examined the process with out incident a number of occasions in newly-installed check mattress Home windows 10 environments, neither I nor unite.ai are responsible for any injury to programs from following these directions. I counsel you to again up any necessary knowledge earlier than making an attempt this type of set up process.

ConsiderationsIs This Technique Nonetheless Legitimate?

The generative AI scene is shifting very quick, and we will count on higher and extra streamlined strategies of Hunyuan Video LoRA frameworks this 12 months.

…and even this week! Whereas I used to be writing this text, the developer of Kohya/Musubi produced musubi-tuner-gui, a classy Gradio GUI for Musubi Tuner:

Clearly a user-friendly GUI is preferable to the BAT information that I exploit on this characteristic – as soon as musubi-tuner-gui is working. As I write, it solely went on-line 5 days in the past, and I can discover no account of anybody efficiently utilizing it.

In line with posts within the repository, the brand new GUI is meant to be rolled instantly into the Musubi Tuner mission as quickly as attainable, which can finish its present existence as a standalone GitHub repository.

Primarily based on the current set up directions, the brand new GUI will get cloned instantly into the prevailing Musubi digital atmosphere; and, regardless of many efforts, I can not get it to affiliate with the prevailing Musubi set up. Which means when it runs, it should discover that it has no engine!

As soon as the GUI is built-in into Musubi Tuner, problems with this type will certainly be resolved. Although the writer concedes that the brand new mission is ‘actually tough’, he’s optimistic for its improvement and integration instantly into Musubi Tuner.

Given these points (additionally regarding default paths at install-time, and the usage of the UV Python package deal, which complicates sure procedures within the new launch), we are going to most likely have to attend a bit for a smoother Hunyuan Video LoRA coaching expertise. That mentioned, it appears to be like very promising!

However if you cannot wait, and are keen to roll your sleeves up a bit, you may get Hunyuan video LoRA coaching working regionally proper now.

Let’s get began.

Why Set up Something on Naked Metallic?

(Skip this paragraph for those who’re not a complicated person)Superior customers will marvel why I’ve chosen to put in a lot of the software program on the naked steel Home windows 10 set up as an alternative of in a digital atmosphere. The reason being that the important Home windows port of the Linux-based Triton package deal is much harder to get working in a digital atmosphere. All the opposite bare-metal installations within the tutorial couldn’t be put in in a digital atmosphere, as they have to interface instantly with native {hardware}.

Putting in Prerequisite Packages and Packages

For the packages and packages that should be initially put in, the order of set up issues. Let’s get began.

1: Obtain Microsoft Redistributable

Obtain and set up the Microsoft Redistributable package deal from https://aka.ms/vs/17/launch/vc_redist.x64.exe.

It is a simple and fast set up.

redistributable

2: Set up Visible Studio 2022

Obtain the Microsoft Visible Studio 2022  Neighborhood version from https://visualstudio.microsoft.com/downloads/?cid=learn-onpage-download-install-visual-studio-page-cta

Begin the downloaded installer:

vs installing

We do not want each out there package deal, which might be a heavy and prolonged set up. On the preliminary Workloads web page that opens, tick Desktop Improvement with C++ (see picture under).

vs options

Now click on the Particular person Elements tab on the top-left of the interface and use the search field to seek out ‘Home windows SDK’.

vs windows sdk

By default, solely the Home windows 11 SDK is ticked. If you’re on Home windows 10 (this set up process has not been examined by me on Home windows 11), tick the newest Home windows 10 model, indicated within the picture above.

Seek for ‘C++ CMake’ and verify that C++ CMake instruments for Home windows is checked.

vs c plus

This set up will take not less than 13 GB of house.

vs concluding

As soon as Visible Studio has put in, it should try and run in your laptop. Let it open totally. When the Visible Studio’s full-screen interface is lastly seen, shut this system.

3: Set up Visible Studio 2019

Among the subsequent packages for Musubi predict an older model of Microsoft Visible Studio, whereas others want a more moderen one.

Due to this fact additionally obtain the free Neighborhood version of Visible Studio 19 both from Microsoft (https://visualstudio.microsoft.com/vs/older-downloads/ – account required) or Techspot (https://www.techspot.com/downloads/7241-visual-studio-2019.html).

Set up it with the identical choices as for Visible Studio 2022 (see process above, besides that Home windows SDK is already ticked within the Visible Studio 2019 installer).

You will see that the Visible Studio 2019 installer is already conscious of the newer model because it installs:

vs 2019

When set up is full, and you’ve got opened and closed the put in Visible Studio 2019 utility, open a Home windows command immediate (Kind CMD in Begin Search) and kind in and enter:

the place cl

The consequence needs to be the recognized places of the 2 put in Visible Studio editions.

vs cl confirmed

Should you as an alternative get INFO: Couldn’t discover information for the given sample(s), see the Test Path part of this text under, and use these directions so as to add the related Visible Studio paths to Home windows atmosphere.

Save any modifications made in response to the Test Paths part under, after which strive the the place cl command once more.

4: Set up CUDA 11 + 12 Toolkits

The varied packages put in in Musubi want totally different variations of NVIDIA CUDA, which accelerates and optimizes coaching on NVIDIA graphics playing cards.

The explanation we put in the Visible Studio variations first is that the NVIDIA CUDA installers seek for and combine with any current Visible Studio installations.

Obtain an 11+ collection CUDA set up package deal from:

https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Home windows&target_arch=x86_64&target_version=11&target_type=exe_local (obtain ‘exe (native’) )

Obtain a 12+ collection CUDA Toolkit set up package deal from:

https://developer.nvidia.com/cuda-downloads?target_os=Home windows&target_arch=x86_64

The set up course of is an identical for each installers. Ignore any warnings concerning the existence or non-existence of set up paths in Home windows Atmosphere variables – we’re going to attend to this manually later.

Set up NVIDIA CUDA Toolkit V11+

Begin the installer for the 11+ collection CUDA Toolkit.

cuda unpacking

cuda splash

At Set up Choices, select Customized (Superior) and proceed.

cuda options

Uncheck the NVIDIA GeForce Expertise choice and click on Subsequent.

cuda no geforce

Depart Choose Set up Location at defaults (that is necessary):

cuda selection location

Click on Subsequent and let the set up conclude.

cuda preparing installation

Ignore any warning or notes that the installer offers about Nsight Visible Studio integration, which isn’t wanted for our use case.

Set up NVIDIA CUDA Toolkit V12+

Repeat the complete course of for the separate 12+ NVIDIA Toolkit installer that you simply downloaded:

cuda 12 splash

The set up course of for this model is an identical to the one listed above (the 11+ model), aside from one warning about atmosphere paths, which you’ll ignore:

cuda environment

When the 12+ CUDA model set up is accomplished, open a command immediate in Home windows and kind and enter:

nvcc –version

This could verify details about the put in driver model:

cuda CLI confirm

To verify that your card is acknowledged, kind and enter:

nvidia-smi

cuda smi

5: Set up GIT

GIT can be dealing with the set up of the Musubi repository in your native machine. Obtain the GIT installer at:

https://git-scm.com/downloads/win (’64-bit Git for Home windows Setup’)

git splash

Run the installer:

git destination

Use default settings for Choose Elements:

git select components

Depart the default editor at Vim:

git vim

Let GIT determine about department names:

git branch names

Use advisable settings for the Path Atmosphere:

git path

Use advisable settings for SSH:

git ssh

Use advisable settings for HTTPS Transport backend:

git openssl

Use advisable settings for line-ending conversions:

git commit

Select Home windows default console because the Terminal Emulator:

git default console

Use default settings (Quick-forward or merge) for Git Pull:

git pull

Use Git-Credential Supervisor (the default setting) for Credential Helper:

git credential manager

In Configuring additional choices, depart Allow file system caching ticked, and Allow symbolic hyperlinks unticked (except you might be a complicated person who’s utilizing arduous hyperlinks for a centralized mannequin repository).

git extra options

Conclude the set up and check that Git is put in correctly by opening a CMD window and typing and getting into:

git –version

git cli test

GitHub Login

Later, whenever you try and clone GitHub repositories, you might be challenged in your GitHub credentials. To anticipate this, log into your GitHub account (create one, if obligatory) on any browsers put in in your Home windows system. On this method, the 0Auth authentication methodology (a pop-up window) ought to take as little time as attainable.

After that preliminary problem, it is best to keep authenticated mechanically.

6: Set up CMake

CMake 3.21 or newer is required for elements of the Musubi set up course of. CMake is a cross-platform improvement structure able to orchestrating various compilers, and of compiling software program from supply code.

Obtain it at:

https://cmake.org/obtain/  (‘Home windows x64 Installer’)

Launch the installer:

cmake welcome

Guarantee Add Cmake to the PATH atmosphere variable is checked.

cmake add to path

Press Subsequent.

cmake installing

Kind and enter this command in a Home windows Command immediate:

cmake –version

If CMake put in efficiently, it should show one thing like:

cmake model 3.31.4CMake suite maintained and supported by Kitware (kitware.com/cmake).

cmake verified

7: Set up Python 3.10

The Python interpreter is central to this mission. Obtain the three.10 model (the perfect compromise between the totally different calls for of Musubi packages) at:

https://www.python.org/downloads/launch/python-3100/ (‘Home windows installer (64-bit)’)

Run the obtain installer, and depart at default settings:

python path

python progress setup

On the finish of the set up course of, click on Disable path size restrict (requires UAC admin affirmation):

python disable path length limit

In a Home windows Command immediate kind and enter:

python –version

This could end in Python 3.10.0

python confirmed CLI

Test Paths

The cloning and set up of the Musubi frameworks, in addition to its regular operation after set up, requires that its parts know the trail to a number of necessary exterior parts in Home windows, notably CUDA.

So we have to open the trail atmosphere and verify that every one the requisites are in there.

A fast strategy to get to the controls for Home windows Atmosphere  is to kind Edit the system atmosphere variables into the Home windows search bar.

edit system environment variables windows 10

Clicking it will open the System Properties management panel. Within the decrease proper of System Properties, click on the Atmosphere Variables button, and a window referred to as Atmosphere Variables opens up. Within the System Variables panel within the backside half of this window, scroll all the way down to Path and double-click it. This opens a window referred to as Edit atmosphere variables. Drag the width of this window wider so you possibly can see the total path of the variables:

environment path windows

Right here the necessary entries are:

C:Program FilesNVIDIA GPU Computing ToolkitCUDAv12.6binC:Program FilesNVIDIA GPU Computing ToolkitCUDAv12.6libnvvpC:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.8binC:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.8libnvvpC:Program Recordsdata (x86)Microsoft Visible Studio2019CommunityVCToolsMSVC14.29.30133binHostx64x64C:Program FilesMicrosoft Visible Studio2022CommunityVCToolsMSVC14.42.34433binHostx64x64C:Program FilesGitcmdC:Program FilesCMakebin

Typically, the right path variables ought to already be current.

Add any paths which are lacking by clicking New on the left of the Edit atmosphere variable window and pasting within the right path:

environment add to path

Do NOT simply copy and paste from the paths listed above; verify that every equal path exists in your individual Home windows set up.

If there are minor path variations (notably with Visible Studio installations), use the paths listed above to seek out the right goal folders (i.e., x64 in Host64 in your individual set up. Then paste these paths into the Edit atmosphere variable window.

After this, restart the pc.

Putting in MusubiUpgrade PIP

Utilizing the newest model of the PIP installer can clean a few of the set up phases. In a Home windows Command immediate with administrator privileges (see Elevation, under), kind and enter:

pip set up –upgrade pip

Elevation

Some instructions could require elevated privileges (i.e., to be run as an administrator). Should you obtain error messages about permissions within the following phases, shut the command immediate window and reopen it in administrator mode by typing CMD into Home windows search field, right-clicking on Command Immediate and choosing Run as administrator:

run as administrator cmd

For the subsequent phases, we’re going to use Home windows Powershell as an alternative of the Home windows Command immediate. You will discover this by getting into Powershell into the Home windows search field, and (as obligatory) right-clicking on it to Run as administrator:

powershell run as administrator

Set up Torch

In Powershell, kind and enter:

pip set up torch torchvision torchaudio –index-url https://obtain.pytorch.org/whl/cu118

Be affected person whereas the numerous packages set up.

When accomplished, you possibly can confirm a GPU-enabled PyTorch set up by typing and getting into:

python -c “import torch; print(torch.cuda.is_available())”

This could end in:

C:WINDOWSsystem32>python -c “import torch;print(torch.cuda.is_available())”True

Set up Triton for Home windows

Subsequent, the set up of the Triton for Home windows part. In elevated Powershell, enter (on a single line):

pip set up https://github.com/woct0rdho/triton-windows/releases/obtain/v3.1.0-windows.post8/triton-3.1.0-cp310-cp310-win_amd64.whl

(The installer triton-3.1.0-cp310-cp310-win_amd64.whl works for each Intel and AMD CPUs so long as the structure is 64-bit and the atmosphere matches the Python model)

After working, this could end in:

Efficiently put in triton-3.1.0

We are able to verify if Triton is working by importing it in Python. Enter this command:

python -c “import triton; print(‘Triton is working’)”

This could output:

Triton is working

To verify that Triton is GPU-enabled, enter:

python -c “import torch; print(torch.cuda.is_available())”

This could end in True:

triton installed

Create the Digital Atmosphere for Musubi

Any further, we are going to set up any additional software program right into a Python digital atmosphere (or venv). Which means all you’ll need to do to uninstall all the next software program is to tug the venv’s set up folder to the trash.

Let’s create that set up folder: make a folder referred to as Musubi in your desktop. The next examples assume that this folder exists: C:Customers[Your Profile Name]DesktopMusubi.

In Powershell, navigate to that folder by getting into:

cd C:Customers[Your Profile Name]DesktopMusubi

We wish the digital atmosphere to have entry to what we’ve got put in already (particularly Triton), so we are going to use the –system-site-packages flag. Enter this:

python -m venv –system-site-packages musubi

Anticipate the atmosphere to be created, after which activate it by getting into:

.musubiScriptsactivate

From this level on, you possibly can inform that you’re within the activated digital atmosphere by the truth that (musubi) seems in the beginning of all of your prompts.

musubi activated environment

Clone the Repository

Navigate to the newly-created musubi folder (which is contained in the Musubi folder in your desktop):

cd musubi

Now that we’re in the suitable place, enter the next command:

git clone https://github.com/kohya-ss/musubi-tuner.git

Anticipate the cloning to finish (it is not going to take lengthy).

cloning

Putting in Necessities

Navigate to the set up folder:

cd musubi-tuner

Enter:

pip set up -r necessities.txt

Anticipate the numerous installations to complete (it will take longer).

musubi installing

Automating Entry to the Hunyuan Video Venv

To simply activate and entry the brand new venv for future periods, paste the next into Notepad and put it aside with the identify activate.bat, saving it with All information choice (see picture under).

name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate

cd C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tuner

cmd

(Exchange [Your Profile Name]with the true identify of your Home windows person profile)

bat file create

It doesn’t matter into which location you save this file.

Any further you possibly can double-click activate.bat and begin work instantly.

bat activate

Utilizing Musubi TunerDownloading the Fashions

The Hunyuan Video LoRA coaching course of requires the downloading of not less than seven fashions as a way to help all of the attainable optimization choices for pre-caching and coaching a Hunyuan video LoRA. Collectively, these fashions weigh greater than 60GB.

Present directions for downloading them might be discovered at https://github.com/kohya-ss/musubi-tuner?tab=readme-ov-file#model-download

Nevertheless, these are the obtain directions on the time of writing:

clip_l.safetensorsllava_llama3_fp16.safetensors andllava_llama3_fp8_scaled.safetensorscan be downloaded at:https://huggingface.co/Cozy-Org/HunyuanVideo_repackaged/tree/major/split_files/text_encoders

mp_rank_00_model_states.ptmp_rank_00_model_states_fp8.pt andmp_rank_00_model_states_fp8_map.ptcan be downloaded at:https://huggingface.co/tencent/HunyuanVideo/tree/major/hunyuan-video-t2v-720p/transformers

pytorch_model.ptcan be downloaded at:https://huggingface.co/tencent/HunyuanVideo/tree/major/hunyuan-video-t2v-720p/vae

Although you possibly can place these in any listing you select, for consistency with later scripting, let’s put them in:

C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodels

That is in keeping with the listing association prior thus far. Any instructions or directions hereafter will assume that that is the place the fashions are located; and do not forget to exchange [Your Profile Name] together with your actual Home windows profile folder identify.

Dataset Preparation

Ignoring neighborhood controversy on the purpose, it is truthful to say that you’ll want someplace between 10-100 pictures for a coaching dataset in your Hunyuan LoRA. Superb outcomes might be obtained even with 15 pictures, as long as the photographs are well-balanced and of fine high quality.

A Hunyuan LoRA might be educated each on pictures or very brief and low-res video clips, or perhaps a combination of every – though utilizing video clips as coaching knowledge is difficult, even for a 24GB card.

Nevertheless, video clips are solely actually helpful in case your character strikes in such an uncommon method that the Hunyuan Video basis mannequin may not learn about it, or be capable of guess.

Examples would come with Roger Rabbit, a xenomorph, The Masks, Spider-Man, or different personalities that possess distinctive attribute motion.

Since Hunyuan Video already is aware of how extraordinary women and men transfer, video clips aren’t obligatory to acquire a convincing Hunyuan Video LoRA human-type character. So we’ll use static pictures.

Picture PreparationThe Bucket Listing

The TLDR model:

It is best to both use pictures which are all the identical dimension in your dataset, or use a 50/50 break up between two totally different sizes, i.e., 10 pictures which are 512x768px and 10 which are 768x512px.

The coaching may go nicely even for those who do not do that – Hunyuan Video LoRAs might be surprisingly forgiving.

The Longer Model

As with Kohya-ss LoRAs for static generative programs resembling Steady Diffusion, bucketing is used to distribute the workload throughout differently-sized pictures, permitting bigger pictures for use with out inflicting out-of-memory errors at coaching time (i.e., bucketing ‘cuts up’ the photographs into chunks that the GPU can deal with, whereas sustaining the semantic integrity of the entire picture).

For every dimension of picture you embrace in your coaching dataset (i.e., 512x768px), a bucket, or ‘sub-task’ can be created for that dimension. So you probably have the next distribution of pictures, that is how the bucket consideration turns into unbalanced, and dangers that some pictures can be given higher consideration in coaching than others:

2x 512x768px images7x 768x512px images1x 1000x600px image3x 400x800px pictures

We are able to see that bucket consideration is split unequally amongst these pictures:

buckets

Due to this fact both stick to at least one format dimension, or try to hold the distribution of various sizes comparatively equal.

In both case, keep away from very giant pictures, as that is prone to decelerate coaching, to negligible profit.

For simplicity, I’ve used 512x768px for all of the pictures in my dataset.

Disclaimer: The mannequin (individual) used within the dataset gave me full permission to make use of these photos for this function, and exercised approval of all AI-based output depicting her likeness featured on this article.

example woman

My dataset consists of 40 pictures, in PNG format (although JPG is ok too). My pictures have been saved at C:UsersMartinDesktopDATASETS_HUNYUANexamplewoman

You need to create a cache folder contained in the coaching picture folder:

cache folder

Now let’s create a particular file that can configure the coaching.

TOML Recordsdata

The coaching and pre-caching processes of Hunyuan Video LoRAs obtains the file paths from a flat textual content file with the .toml extension.

For my check, the TOML is situated at C:UsersMartinDesktopDATASETS_HUNYUANtraining.toml

The contents of my coaching TOML appear like this:

[general]

decision = [512, 768]

caption_extension = “.txt”

batch_size = 1

enable_bucket = true

bucket_no_upscale = false

[[datasets]]

image_directory = “C:UsersMartinDesktopDATASETS_HUNYUANexamplewoman”

cache_directory = “C:UsersMartinDesktopDATASETS_HUNYUANexamplewomancache”

num_repeats = 1

(The double back-slashes for picture and cache directories aren’t all the time obligatory, however they might help to keep away from errors in circumstances the place there’s a house within the path. I’ve educated fashions with .toml information that used single-forward and single-backward slashes)

We are able to see within the decision part that two resolutions can be thought-about – 512px and 768px. You may also depart this at 512, and nonetheless receive good outcomes.

Captions

Hunyuan Video is a textual content+imaginative and prescient basis mannequin, so we’d like descriptive captions for these pictures, which can be thought-about throughout coaching. The coaching course of will fail with out captions.

There are a mess of open supply captioning programs we might use for this activity, however let’s hold it easy and use the taggui system. Although it’s saved at GitHub, and although it does obtain some very heavy deep studying fashions on first run, it comes within the type of a easy Home windows executable that masses Python libraries and an easy GUI.

After beginning Taggui, use File > Load Listing to navigate to your picture dataset, and optionally put a token identifier (on this case, examplewoman) that can be added to all of the captions:

taggui

(Remember to flip off Load in 4-bit when Taggui first opens – it should throw errors throughout captioning if that is left on)

Choose a picture within the left-hand preview column and press CTRL+A to pick all the photographs. Then press the Begin Auto-Captioning button on the suitable:

taggui confirmation

You will note Taggui downloading fashions within the small CLI within the right-hand column, however provided that that is the primary time you’ve gotten run the captioner. In any other case you will notice a preview of the captions.

taggui tagging

Now, every picture has a corresponding .txt caption with an outline of its picture contents:

taggui results

You possibly can click on Superior Choices in Taggui to extend the size and elegance of captions, however that’s past the scope of this run-through.

Stop Taggui and let’s transfer on to…

Latent Pre-Caching

To keep away from extreme GPU load at coaching time, it’s essential to create two kinds of pre-cached information – one to symbolize the latent picture derived from the photographs themselves, and one other to guage a textual content encoding regarding caption content material.

To simplify all three processes (2x cache + coaching), you should use interactive .BAT information that can ask you questions and undertake the processes when you’ve gotten given the required info.

For the latent pre-caching, copy the next textual content into Notepad and put it aside as a .BAT file (i.e., identify it one thing like latent-precache.bat), as earlier, guaranteeing that the file kind within the drop down menu within the Save As dialogue is All Recordsdata (see picture under):

REM Activate the digital atmosphere

name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate.bat

REM Get person enter

set /p IMAGE_PATH=Enter the trail to the picture listing:

set /p CACHE_PATH=Enter the trail to the cache listing:

set /p TOML_PATH=Enter the trail to the TOML file:

echo You entered:

echo Picture path: %IMAGE_PATH%

echo Cache path: %CACHE_PATH%

echo TOML file path: %TOML_PATH%

set /p CONFIRM=Do you need to proceed with latent pre-caching (y/n)?

if /i “%CONFIRM%”==”y” (

REM Run the latent pre-caching script

python C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunercache_latents.py –dataset_config %TOML_PATH% –vae C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodelspytorch_model.pt –vae_chunk_size 32 –vae_tiling

) else (

echo Operation canceled.

)

REM Preserve the window open

pause

(Just remember to exchange [Your Profile Name] together with your actual Home windows profile folder identify)

bat file save

Now you possibly can run the .BAT file for automated latent caching:

latent caching

When prompted to by the varied questions from the BAT file, paste or kind within the path to your dataset, cache folders and TOML file.

Textual content Pre-Caching

We’ll create a second BAT file, this time for the textual content pre-caching.

REM Activate the digital atmosphere

name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate.bat

REM Get person enter

set /p IMAGE_PATH=Enter the trail to the picture listing:

set /p CACHE_PATH=Enter the trail to the cache listing:

set /p TOML_PATH=Enter the trail to the TOML file:

echo You entered:

echo Picture path: %IMAGE_PATH%

echo Cache path: %CACHE_PATH%

echo TOML file path: %TOML_PATH%

set /p CONFIRM=Do you need to proceed with textual content encoder output pre-caching (y/n)?

if /i “%CONFIRM%”==”y” (

REM Use the python executable from the digital atmosphere

python C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunercache_text_encoder_outputs.py –dataset_config %TOML_PATH% –text_encoder1 C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodelsllava_llama3_fp16.safetensors –text_encoder2 C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodelsclip_l.safetensors –batch_size 16

) else (

echo Operation canceled.

)

REM Preserve the window open

pause

Exchange your Home windows profile identify and save this as text-cache.bat (or some other identify you want), in any handy location, as per the process for the earlier BAT file.

Run this new BAT file, observe the directions, and the required text-encoded information will seem within the cache folder:

cache to folder

Coaching the Hunyuan Video Lora

Coaching the precise LoRA will take significantly longer than these two preparatory processes.

Although there are additionally a number of variables that we might fear about (resembling batch dimension, repeats, epochs, and whether or not to make use of full or quantized fashions, amongst others), we’ll save these issues for an additional day, and a deeper have a look at the intricacies of LoRA creation.

For now, let’s reduce the alternatives a bit and prepare a LoRA on ‘median’ settings.

We’ll create a 3rd BAT file, this time to provoke coaching. Paste this into Notepad and put it aside as a BAT file, like earlier than, as coaching.bat (or any identify you please):

REM Activate the digital atmosphere

name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate.bat

REM Get person enter

set /p DATASET_CONFIG=Enter the trail to the dataset configuration file:

set /p EPOCHS=Enter the variety of epochs to coach:

set /p OUTPUT_NAME=Enter the output mannequin identify (e.g., example0001):

set /p LEARNING_RATE=Select studying price (1 for 1e-3, 2 for 5e-3, default 1e-3):

if “%LEARNING_RATE%”==”1” set LR=1e-3

if “%LEARNING_RATE%”==”2” set LR=5e-3

if “%LEARNING_RATE%”==”” set LR=1e-3

set /p SAVE_STEPS=How usually (in steps) to avoid wasting preview pictures:

set /p SAMPLE_PROMPTS=What’s the location of the text-prompt file for coaching previews?

echo You entered:

echo Dataset configuration file: %DATASET_CONFIG%

echo Variety of epochs: %EPOCHS%

echo Output identify: %OUTPUT_NAME%

echo Studying price: %LR%

echo Save preview pictures each %SAVE_STEPS% steps.

echo Textual content-prompt file: %SAMPLE_PROMPTS%

REM Put together the command

set CMD=speed up launch –num_cpu_threads_per_process 1 –mixed_precision bf16 ^

C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunerhv_train_network.py ^

–dit C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunermodelsmp_rank_00_model_states.pt ^

–dataset_config %DATASET_CONFIG% ^

–sdpa ^

–mixed_precision bf16 ^

–fp8_base ^

–optimizer_type adamw8bit ^

–learning_rate %LR% ^

–gradient_checkpointing ^

–max_data_loader_n_workers 2 ^

–persistent_data_loader_workers ^

–network_module=networks.lora ^

–network_dim=32 ^

–timestep_sampling sigmoid ^

–discrete_flow_shift 1.0 ^

–max_train_epochs %EPOCHS% ^

–save_every_n_epochs=1 ^

–seed 42 ^

–output_dir “C:Users[Your Profile Name]DesktopMusubiOutput Models” ^

–output_name %OUTPUT_NAME% ^

–vae C:/Customers/[Your Profile Name]/Desktop/Musubi/musubi/musubi-tuner/fashions/pytorch_model.pt ^

–vae_chunk_size 32 ^

–vae_spatial_tile_sample_min_size 128 ^

–text_encoder1 C:/Customers/[Your Profile Name]/Desktop/Musubi/musubi/musubi-tuner/fashions/llava_llama3_fp16.safetensors ^

–text_encoder2 C:/Customers/[Your Profile Name]/Desktop/Musubi/musubi/musubi-tuner/fashions/clip_l.safetensors ^

–sample_prompts %SAMPLE_PROMPTS% ^

–sample_every_n_steps %SAVE_STEPS% ^

–sample_at_first

echo The next command can be executed:

echo %CMD%

set /p CONFIRM=Do you need to proceed with coaching (y/n)?

if /i “%CONFIRM%”==”y” (

%CMD%

) else (

echo Operation canceled.

)

REM Preserve the window open

cmd /ok

As traditional, make sure you exchange all cases of [Your Profile Name] together with your right Home windows profile identify.

Be sure that the listing C:Customers[Your Profile Name]DesktopMusubiOutput Fashions exists, and create it at that location if not.

Coaching Previews

There’s a very primary coaching preview characteristic lately enabled for Musubi coach, which lets you pressure the coaching mannequin to pause and generate pictures primarily based on prompts you’ve gotten saved. These are saved in an mechanically created folder referred to as Pattern, in the identical listing that the educated fashions are saved.

previews of training

To allow this, you’ll need to avoid wasting eventually one immediate in a textual content file. The coaching BAT we created will ask you to enter the placement of this file; subsequently you possibly can identify the immediate file to be something you want, and put it aside wherever.

Listed here are some immediate examples for a file that can output three totally different pictures when requested by the coaching routine:

prompt

As you possibly can see within the instance above, you possibly can put flags on the finish of the immediate that can have an effect on the photographs:

–w is width (defaults to 256px if not set, in response to the docs)–h is peak (defaults to 256px  if not set)–f is the variety of frames. If set to 1, a picture is produced; multiple, a video.–d is the seed. If not set, it’s random; however it is best to set it to see one immediate evolving.–s is the variety of steps in era, defaulting to twenty.

See the official documentation for extra flags.

Although coaching previews can shortly reveal some points which may trigger you to cancel the coaching and rethink the info or the setup, thus saving time, do keep in mind that each additional immediate slows down the coaching a bit extra.

Additionally, the larger the coaching preview picture’s width and peak (as set within the flags listed above), the extra it should gradual coaching down.

Launch your coaching BAT file.

Query #1 is ‘Enter the trail to the dataset configuration. Paste or kind within the right path to your TOML file.

Query #2 is ‘Enter the variety of epochs to coach’. It is a trial-and-error variable, because it’s affected by the quantity and high quality of pictures, in addition to the captions, and different components. Basically, it is best to set it too excessive than too low, since you possibly can all the time cease the coaching with Ctrl+C within the coaching window for those who really feel the mannequin has superior sufficient. Set it to 100 within the first occasion, and see the way it goes.

Query #3 is ‘Enter the output mannequin identify’. Title your mannequin! Could also be finest to maintain the identify fairly brief and easy.

Query #4 is ‘Select studying price’, which defaults to 1e-3 (choice 1). It is a good place to begin, pending additional expertise.

Query #5 is ‘How usually (in steps) to avoid wasting preview pictures. Should you set this too low, you will notice little progress between preview picture saves, and it will decelerate the coaching.

Query #6 is ‘What’s the location of the text-prompt file for coaching previews?’. Paste or kind within the path to your prompts textual content file.

The BAT then exhibits you the command it should ship to the Hunyuan Mannequin, and asks you if you wish to proceed, y/n.

Go forward and start coaching:

training begins

Throughout this time, for those who verify the GPU part of the Efficiency tab of Home windows Activity Supervisor, you may see the method is taking round 16GB of VRAM.

task manager GPU

This will not be an arbitrary determine, as that is the quantity of VRAM out there on fairly a couple of NVIDIA graphics playing cards, and the upstream code could have been optimized to suit the duties into 16GB for the good thing about those that personal such playing cards.

That mentioned, it is rather straightforward to lift this utilization, by sending extra exorbitant flags to the coaching command.

Throughout coaching, you may see within the lower-right aspect of the CMD window a determine for the way a lot time has handed since coaching started, and an estimate of whole coaching time (which can differ closely relying on flags set, variety of coaching pictures, variety of coaching preview pictures, and a number of other different components).

waiting time

A typical coaching time is round 3-4 hours on median settings, relying on the out there {hardware}, variety of pictures, flag settings, and different components.

Utilizing Your Skilled LoRA Fashions in Hunyuan VideoChoosing Checkpoints

When coaching is concluded, you should have a mannequin checkpoint for every epoch of coaching.

trained files

This saving frequency might be modified by the person to avoid wasting roughly incessantly, as desired, by amending the –save_every_n_epochs [N] quantity within the coaching BAT file. Should you added a low determine for saves-per-steps when organising coaching with the BAT, there can be a excessive variety of saved checkpoint information.

Which Checkpoint to Select?

As talked about earlier, the earliest-trained fashions can be most versatile, whereas the later checkpoints could supply essentially the most element. The one strategy to check for these components is to run a few of the LoRAs and generate a couple of movies. On this method you may get to know which checkpoints are best, and symbolize the perfect steadiness between flexibility and constancy.

ComfyUI

The preferred (although not the one) atmosphere for utilizing Hunyuan Video LoRAs, in the intervening time, is ComfyUI, a node-based editor with an elaborate Gradio interface that runs in your net browser.

Source: https://github.com/comfyanonymous/ComfyUI

Supply: https://github.com/comfyanonymous/ComfyUI

Set up directions are simple and out there on the official GitHub repository (further fashions should be downloaded).

Changing Fashions for ComfyUI

Your educated  fashions are saved in a (diffusers) format that’s not suitable with most implementations of ComfyUI. Musubi is ready to convert a mannequin to a ComfyUI-compatible format. Let’s arrange a BAT file to implement this.

Earlier than working this BAT, create the C:Customers[Your Profile Name]DesktopMusubiCONVERTED folder that the script is anticipating.

REM Activate the digital atmosphere

name C:Customers[Your Profile Name]DesktopMusubimusubiScriptsactivate.bat

:START

REM Get person enter

set /p INPUT_PATH=Enter the trail to the enter Musubi safetensors file (or kind “exit” to give up):

REM Exit if the person varieties “exit”

if /i “%INPUT_PATH%”==”exit” goto END

REM Extract the file identify from the enter path and append ‘transformed’ to it

for %%F in (“%INPUT_PATH%”) do set FILENAME=%%~nF

set OUTPUT_PATH=C:Customers[Your Profile Name]DesktopMusubiOutput ModelsCONVERTEDpercentFILENAMEpercent_converted.safetensors

set TARGET=different

echo You entered:

echo Enter file: %INPUT_PATH%

echo Output file: %OUTPUT_PATH%

echo Goal format: %TARGET%

set /p CONFIRM=Do you need to proceed with the conversion (y/n)?

if /i “%CONFIRM%”==”y” (

REM Run the conversion script with accurately quoted paths

python C:Customers[Your Profile Name]DesktopMusubimusubimusubi-tunerconvert_lora.py –input “%INPUT_PATH%” –output “%OUTPUT_PATH%” –target %TARGET%

echo Conversion full.

) else (

echo Operation canceled.

)

REM Return to begin for an additional file

goto START

:END

REM Preserve the window open

echo Exiting the script.

pause

As with the earlier BAT information, save the script as ‘All information’ from Notepad, naming it convert.bat (or no matter you want).

As soon as saved, double-click the brand new BAT file, which can ask for the placement of a file to transform.

convert file bat

Paste in or kind the trail to the educated file you need to convert, click on y, and press enter.

conversion done

After saving the transformed LoRA to the CONVERTED folder, the script will ask if you need to transform one other file. If you wish to check a number of checkpoints in ComfyUI, convert a choice of the fashions.

When you’ve gotten transformed sufficient checkpoints, shut the BAT command window.

Now you can copy your transformed fashions into the modelsloras folder in your ComfyUI set up.

Sometimes the right location is one thing like:

C:Customers[Your Profile Name]DesktopComfyUImodelsloras

Creating Hunyuan Video LoRAs in ComfyUI

Although the node-based workflows of ComfyUI appear complicated initially, the settings of different extra professional customers might be loaded by dragging a picture (made with the opposite person’s ComfyUI) instantly into the ComfyUI window. Workflows may also be exported as JSON information, which might be imported manually, or dragged right into a ComfyUI window.

Some imported workflows could have dependencies that won’t exist in your set up. Due to this fact set up ComfyUI-Supervisor, which may fetch lacking modules mechanically.

Source: https://github.com/ltdrdata/ComfyUI-Manager

Supply: https://github.com/ltdrdata/ComfyUI-Supervisor

To load one of many workflows used to generate movies from the fashions on this tutorial, obtain this JSON file and drag it into your ComfyUI window (although there are much better workflow examples out there on the numerous Reddit and Discord communities which have adopted Hunyuan Video, and my very own is customized from one in all these).

This isn’t the place for an prolonged tutorial in the usage of ComfyUI, however it’s price mentioning a couple of of the essential parameters that can have an effect on your output for those who obtain and use the JSON structure that I linked to above.

comfy2

1) Width and Peak

The bigger your picture, the longer the era will take, and the upper the chance of an out-of-memory (OOM) error.

2) Size

That is the numerical worth for the variety of frames. What number of seconds it provides as much as rely upon the  body price (set to 30fps on this structure). You possibly can convert seconds>frames primarily based on fps at Omnicalculator.

3) Batch dimension

The upper you set the batch dimension, the faster the consequence could come, however the higher the burden of VRAM. Set this too excessive and you might get an OOM.

4) Management After Generate

This controls the random seed. The choices for this sub-node are fastened, increment, decrement and randomize. Should you depart it at fastened and don’t change the textual content immediate, you’re going to get the identical picture each time. Should you amend the textual content immediate, the picture will change to a restricted extent. The increment and decrement settings will let you discover close by seed values, whereas randomize offers you a completely new interpretation of the immediate.

5) Lora Title

You’ll need to pick your individual put in mannequin right here, earlier than making an attempt to generate.

6) Token

If in case you have educated your mannequin to set off the idea with a token, (resembling ‘example-person’), put that set off phrase in your immediate.

7) Steps

This represents what number of steps the system will apply to the diffusion course of. Larger steps could receive higher element, however there’s a ceiling on how efficient this method is, and that threshold might be arduous to seek out. The widespread vary of steps is round 20-30.

8) Tile Dimension

This defines how a lot info is dealt with at one time throughout era. It is set to 256 by default. Elevating it may velocity up era, however elevating it too excessive can result in a very irritating OOM expertise, because it comes on the very finish of a protracted course of.

9) Temporal Overlap

Hunyuan Video era of individuals can result in ‘ghosting’, or unconvincing motion if that is set too low. Basically, the present knowledge is that this needs to be set to a better worth than the variety of frames, to supply higher motion.

Conclusion

Although additional exploration of ComfyUI utilization is past the scope of this text, neighborhood expertise at Reddit and Discords can ease the training curve, and there are a number of on-line guides that introduce the fundamentals.

 

First revealed Thursday, January 23, 2025

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Share post:

Popular

More like this
Related

WriteHuman Overview: Can It Actually Make AI Textual content Sound Human?

Have you ever ever learn one thing and instantly...

High 10 AI Observe Administration Options for Healthcare Suppliers (January 2025)

AI apply administration options are enhancing healthcare operations by...

Logging off Life however Dwelling on: How AI Is Redefining Dying, Reminiscence, and Immortality

Think about attending a funeral the place the one...

CapCut Overview: Is This TikTok-Backed Device Proper for You?

Have you ever ever tried making a video, solely...