Linear Fusion MultiDiffusion for Fast Training-Free Spherical Panorama Generation

ECCV 2026

Akio Hayakawa¹, Yusuke Mukuta^1,2, Tatsuya Harada^1,2

¹The University of Tokyo ²RIKEN Center for Advanced Intelligence Project

Abstract

We propose LF-MultiDiffusion, a training-free panorama generation method that extends MultiDiffusion to support linear projections between target and reference image spaces. Our key idea is to reformulate latent aggregation as a regularized least-squares problem and solve it efficiently with a Krylov-based iterative solver inside the denoising loop. This formulation enables denser and more natural mappings than prior training-free methods, yielding more stable generation with far fewer perspective views. As a result, LF-MultiDiffusion reduces the number of image generator evaluations during denoising and significantly improves inference efficiency. Experiments show that LF-MultiDiffusion achieves better visual quality, text alignment, and panoramic consistency than the strongest training-free baseline, while providing a 15.36× speedup.

LF-MultiDiffusion teaser — LF-MultiDiffusion supports denser linear mappings for fast training-free panorama generation.

Demo

Each method is shown with its generated panorama image and a perspective-view video that looks around the scene. The time next to each method name denotes the wall-clock generation time reported in the paper. See Additional Demo Examples below for more details.

LF-MultiDiffusion (Ours, 2m 6s)

Panorama

Look-around perspective video

DynamicScaler (7m 31s)

Panorama

Look-around perspective video

SphereDiff (32m 15s)

Panorama

Look-around perspective video