ReconPhys | Project Page

Teaser

Abstract

Reconstructing non-rigid objects with physical plausibility remains challenging due to expensive per-scene optimization and the lack of physical supervision. ReconPhys is a feedforward framework that jointly learns physical attribute estimation and 3D Gaussian Splatting reconstruction from a single monocular video. A dual-branch architecture with a differentiable simulation-rendering loop enables self-supervised learning without ground-truth physics labels. On a large-scale synthetic benchmark, ReconPhys reaches 21.64 PSNR in future prediction versus 13.27 from optimization baselines, and reduces Chamfer Distance from 0.349 to 0.004 while running in under one second.

Contributions

A first-of-its-kind feedforward framework for jointly recovering deformable object appearance, geometry, and physical attributes from monocular videos.
A dual-branch architecture with self-supervised physics learning, plus an automated synthesis pipeline for large-scale deformable object data.
State-of-the-art performance on cross-object generalization and future prediction, with orders-of-magnitude faster inference than per-scene optimization methods.

Method

ReconPhys combines a frozen 3DGS predictor and a learnable physics predictor. The 3DGS branch outputs canonical Gaussian primitives, while the physics branch estimates mass, stiffness, damping, and friction from video dynamics. A differentiable spring-mass simulator drives anchor motion, and anchor-to-Gaussian interpolation updates Gaussian centers over time.

Training uses a fully differentiable simulation-rendering loop with self-forcing. Reconstruction loss from rendered frames propagates through the simulator to optimize physical attributes, enabling physically meaningful predictions without explicit physics labels.

Training pipeline — Figure 3: Differentiable training pipeline with self-forcing.

Synthetic Data Pipeline

The training data is generated from Objaverse-XL assets with semantic filtering, 3DGS reconstruction, consistent anchor sampling, and physically parameterized spring-mass simulation. The final dataset includes 496 objects with dynamic 30-frame monocular videos and associated physical parameters.

Experiments

Cross-Object Generalization

We evaluate the model's generalization capability on unseen geometries. Our method consistently outperforms baselines on synthesized objects, achieving superior fidelity in both dynamic reconstruction and future prediction.

Object 1

Object 2

Object 3

Object 4

Table: cross-object generalization — Table 1: Cross-object generalization (dynamic reconstruction and future prediction).

Real-world Non-rigid Assets

We validate practical applicability on real-world non-rigid assets. Our model faithfully captures complex, physics-consistent deformations without per-scene optimization.

Real-World Drop (Object 1)

Simulation (Object 1)

Real-World Drop (Object 2)

Simulation (Object 2)

Physical Disentanglement

Our method disentangles physical attributes from geometric structure, accurately distinguishing between varying physical states assigned to identical underlying geometry, while maintaining high reconstruction fidelity.

Trajectory A (e.g. Higher Stiffness)

Trajectory B (e.g. Lower Stiffness)

Table: physical disentanglement — Table 2: Physical disentanglement on two attribute sets per object.

Table: physical attributes error — Table 3: Comparison of physical-attribute estimation errors.

Application in Robot Non-rigid Object Manipulation

This methodology demonstrates the applicability of video-based non-rigid object reconstruction to robotic deformable object manipulation. The acquired high-fidelity digital assets are tailored for photorealistic physical simulation in virtual environments like PhysTwin.

Scenario (a): Stretching

Scenario (b): Squeezing

Scenario (c): Pinching

Scenario (d): Squashing and Stretching

More Interactive Manipulation Demos

Additional demonstrations of interactive manipulation on diverse deformable objects.

Bun Toy 1

Bun Toy 2

Orange Toy

Pillow

Sponge

More Synthetic Dataset Results

More free-fall dynamic trajectories sampled from our synthesized dataset.

Result 1

Result 2

Result 3

Result 4

Result 5

Result 6

Citation

If you find this work useful, please cite:

@article{wang2026reconphys,
  title   = {ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video},
  author  = {Wang, Boyuan and Wang, Xiaofeng and Li, Yongkang and Zhu, Zheng and Chang, Yifan and Ye, Angen and Zhao, Guosheng and Ni, Chaojun and Huang, Guan and Ren, Yijie and Duan, Yueqi and Wang, Xingang},
  journal = {arXiv preprint arXiv:xxxx.xxxxx},
  year    = {2026}
}