
I2VGen-XL
Professional-grade image-to-video synthesis via cascaded diffusion and spatial-temporal refinement.

Edit 3D scenes with text instructions using Iterative Dataset Updates and Diffusion Models.

InstructNeRF2NeRF is a state-of-the-art framework for editing Neural Radiance Fields (NeRFs) using text-based instructions. Unlike traditional 3D editing which requires manual geometry or texture manipulation, this tool utilizes a 2D diffusion model (InstructPix2Pix) to iteratively refine a 3D scene. The architecture employs a unique 'Iterative Dataset Update' (IDU) method, which alternates between updating the underlying dataset of images using the diffusion model and retraining the NeRF to maintain multi-view consistency. This ensures that edits—such as changing a person's clothes, turning a landscape from summer to winter, or stylizing a room—remain spatially coherent from any camera angle. As of 2026, it remains the industry standard for researchers and VFX artists looking to bridge the gap between text-to-image generative AI and consistent 3D world-building. It is primarily built upon the Nerfstudio ecosystem, offering high modularity and support for various NeRF backbones like Nerfacto.
InstructNeRF2NeRF is a state-of-the-art framework for editing Neural Radiance Fields (NeRFs) using text-based instructions.
Explore all tools that specialize in diffusion models. This domain focus ensures InstructNeRF2NeRF delivers optimized results for this specific requirement.
The system replaces original training images with diffusion-edited versions incrementally during NeRF training.
Leverages a diffusion model specifically trained to follow human instructions for image editing.
Uses camera pose information to ensure the diffusion model applies edits consistently relative to the viewer.
Fully integrated as a plug-in for Nerfstudio, utilizing its optimized rendering backends.
Adjusts the strength of the diffusion guidance dynamically throughout the training process.
Install CUDA-capable environment and Python 3.8+.
Install Nerfstudio (the base framework) via pip or conda.
Clone the InstructNeRF2NeRF repository from GitHub.
Prepare a set of images or a video of a real-world scene.
Run 'ns-process-data' to generate camera poses and point clouds.
Train a base NeRF model using 'ns-train nerfacto'.
Launch the InstructNeRF2NeRF training pipeline with 'ns-train in2n'.
Connect to the Web Viewer (Vuer) to visualize real-time updates.
Input text prompts (e.g., 'Make it look like a desert') to trigger the IDU loop.
Export the final edited scene as a video or mesh.
All Set
Ready to go
Verified feedback from other users.
"Highly praised by the research community for its consistency, though noted for high VRAM requirements."
Post questions, share tips, and help other users.

Professional-grade image-to-video synthesis via cascaded diffusion and spatial-temporal refinement.

Photorealistic Virtual Try-On and Neural Garment Synthesis for E-commerce

Generative Efficient Textured 3D Mesh Synthesis for High-Fidelity 2026 Digital Twins
Efficient 3D mesh generation from single images using sparse-view large reconstruction models.