ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video

The first feedforward framework that jointly infers 3D Gaussian Splatting assets and physical parameters from a monocular dynamic sequence.

Boyuan Wang1,2, Xiaofeng Wang1,3, Yongkang Li1, Zheng Zhu1*, Yifan Chang2, Angen Ye2, Guosheng Zhao1,2, Chaojun Ni1, Guan Huang1, Yijie Ren2, Yueqi Duan3, Xingang Wang2*

1GigaAI  |  2Institute of Automation, Chinese Academy of Sciences  |  3Tsinghua University

Teaser

ReconPhys teaser
ReconPhys predicts simulation-ready 3DGS assets with physical attributes from a single monocular video.

Abstract

Reconstructing non-rigid objects with physical plausibility remains challenging due to expensive per-scene optimization and the lack of physical supervision. ReconPhys is a feedforward framework that jointly learns physical attribute estimation and 3D Gaussian Splatting reconstruction from a single monocular video. A dual-branch architecture with a differentiable simulation-rendering loop enables self-supervised learning without ground-truth physics labels. On a large-scale synthetic benchmark, ReconPhys reaches 21.64 PSNR in future prediction versus 13.27 from optimization baselines, and reduces Chamfer Distance from 0.349 to 0.004 while running in under one second.

Contributions

Method

ReconPhys combines a frozen 3DGS predictor and a learnable physics predictor. The 3DGS branch outputs canonical Gaussian primitives, while the physics branch estimates mass, stiffness, damping, and friction from video dynamics. A differentiable spring-mass simulator drives anchor motion, and anchor-to-Gaussian interpolation updates Gaussian centers over time.

Training uses a fully differentiable simulation-rendering loop with self-forcing. Reconstruction loss from rendered frames propagates through the simulator to optimize physical attributes, enabling physically meaningful predictions without explicit physics labels.

Framework overview
Figure 2: Framework overview.
Training pipeline
Figure 3: Differentiable training pipeline with self-forcing.

Synthetic Data Pipeline

The training data is generated from Objaverse-XL assets with semantic filtering, 3DGS reconstruction, consistent anchor sampling, and physically parameterized spring-mass simulation. The final dataset includes 496 objects with dynamic 30-frame monocular videos and associated physical parameters.

Experiments

Cross-Object Generalization

We evaluate the model's generalization capability on unseen geometries. Our method consistently outperforms baselines on synthesized objects, achieving superior fidelity in both dynamic reconstruction and future prediction.

Object 1
Object 2
Object 3
Object 4
Table: cross-object generalization
Table 1: Cross-object generalization (dynamic reconstruction and future prediction).

Real-world Non-rigid Assets

We validate practical applicability on real-world non-rigid assets. Our model faithfully captures complex, physics-consistent deformations without per-scene optimization.

Real-World Drop (Object 1)
Simulation (Object 1)
Real-World Drop (Object 2)
Simulation (Object 2)

Physical Disentanglement

Our method disentangles physical attributes from geometric structure, accurately distinguishing between varying physical states assigned to identical underlying geometry, while maintaining high reconstruction fidelity.

Trajectory A (e.g. Higher Stiffness)
Trajectory B (e.g. Lower Stiffness)
Table: physical disentanglement
Table 2: Physical disentanglement on two attribute sets per object.
Table: physical attributes error
Table 3: Comparison of physical-attribute estimation errors.

Application in Robot Non-rigid Object Manipulation

This methodology demonstrates the applicability of video-based non-rigid object reconstruction to robotic deformable object manipulation. The acquired high-fidelity digital assets are tailored for photorealistic physical simulation in virtual environments like PhysTwin.

Scenario (a): Stretching
Scenario (b): Squeezing
Scenario (c): Pinching
Scenario (d): Squashing and Stretching

More Interactive Manipulation Demos

Additional demonstrations of interactive manipulation on diverse deformable objects.

Bun Toy 1
Bun Toy 2
Orange Toy
Pillow
Sponge

More Synthetic Dataset Results

More free-fall dynamic trajectories sampled from our synthesized dataset.

Result 1
Result 2
Result 3
Result 4
Result 5
Result 6

Citation

If you find this work useful, please cite:

@article{wang2026reconphys,
  title   = {ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video},
  author  = {Wang, Boyuan and Wang, Xiaofeng and Li, Yongkang and Zhu, Zheng and Chang, Yifan and Ye, Angen and Zhao, Guosheng and Ni, Chaojun and Huang, Guan and Ren, Yijie and Duan, Yueqi and Wang, Xingang},
  journal = {arXiv preprint arXiv:xxxx.xxxxx},
  year    = {2026}
}