LIST3R: Long-sequence Instance-aware 3D Reconstruction

Using persistent object instances as anchors to organize long-horizon 3D reconstruction.

Jing Gao  ·  Wei Wang  ·  Feiran Wang  ·  Yan Yan
Beijing Jiaotong University  ·  University of Illinois Chicago

Overview

We present LIST3R, an instance-aware framework for long-sequence 3D reconstruction inspired by the way humans organize spatial memory around stable and recognizable objects. LIST3R organizes long-sequence reconstruction around instance anchors, using them to reconnect fragmented subsequences and consolidate local observations into a coherent global 3D scene. Given a long video, our approach partitions it into overlapping subsequences and builds a local instance library for each partial reconstruction, maintaining trackable anchors with semantic and geometric evidence. These anchors are matched across subsequences to recover revisited regions and provide object-aware constraints for fragment alignment, producing a global reconstruction. During this process, the evolving geometric evidence updates the local instance libraries and progressively organizes them into a unified global 3D instance library. Experiments on long-sequence benchmarks show that our method produces more accurate trajectories and higher-quality 3D reconstructions, highlighting the effectiveness of persistent instance anchors for organizing long-horizon 3D reconstruction.

LIST3R teaser
LIST3R leverages instance guidance to recover more effective revisits and smoother cross-subsequence alignment, producing more accurate and stable camera trajectories than the baseline.

Method

Our core idea is to use recognizable object instances as persistent anchors throughout the reconstruction process. Our method follows a three-stage pipeline. First, we build a local instance library for each subsequence, organizing partial reconstructions around trackable object-centric cues. Second, these local instance libraries are used to establish cross-subsequence associations, including long-range revisit discovery and instance-aware subsequence merging, which are further refined by confidence-weighted optimization for global consistency. Finally, local instance observations are consolidated into a unified 3D instance library.

LIST3R method overview
Method overview. LIST3R first builds a local instance library for each subsequence to capture recognizable object cues. These cues connect subsequences and assemble fragmented local reconstructions into a coherent global scene. Instance evidence is continuously updated and finally consolidated into a global instance library.

Visualizations

Loading point-cloud turntables…

Quantitative Analysis

Method TUM ETH3D BONN
ATE↓RTE↓RRE↓ ATE↓RTE↓RRE↓ ATE↓RTE↓RRE↓
CUT3R0.8660.96340.192.8952.53743.040.3190.56158.13
TTT3R0.3170.3859.921.3170.93910.330.1490.75947.51
VGGT-Long0.3250.48925.211.2921.70132.920.1230.78747.43
π-Long0.2080.2797.810.5620.45513.650.0940.77048.01
Scal3R0.2670.3295.720.8070.5907.000.1170.77949.09
LIST3R (Ours)0.1500.2116.970.5160.4449.320.0850.77945.89

Camera pose estimation on long sequences. ATE / RTE in meters, RRE in degrees — all lower is better. Green = best.

Estimated long-sequence camera trajectories
Estimated long-sequence camera trajectories.
MethodETH3DNRGBD
Chamfer↓Acc↓Comp↓NC↑F@5↑ Chamfer↓Acc↓Comp↓NC↑F@5↑
CUT3R140.062.8217.30.5364.073.250.196.40.5759.0
TTT3R102.636.6168.60.6107.341.326.256.40.64722.2
VGGT-Long50.656.744.40.61819.86.15.36.90.85768.9
π-Long41.537.845.30.68632.75.04.45.50.87668.9
Scal3R33.836.531.10.65826.27.74.211.10.82971.2
LIST3R (Ours)27.431.123.60.70936.54.74.15.30.87573.4

Point cloud reconstruction quality. Chamfer / Acc / Comp in cm. Green = best.

Qualitative long-sequence reconstruction
Qualitative results for long-sequence 3D reconstruction.