F2F-AP: Flow-to-Future Asynchronous Policy for Real-time Dynamic Manipulation

Haoyu Wei¹ Xiuwei Xu^1* Ziyang Cheng¹ Hang Yin¹ Angyuan Ma¹ Bingyao Yu¹ Jie Zhou¹ Jiwen Lu¹

¹Department of Automation, Tsinghua University

paper Paper (arXiv) code Code (GitHub)

Abstract

Asynchronous inference has emerged as a prevalent paradigm in robotic manipulation, achieving significant progress in ensuring trajectory smoothness and efficiency. However, a systemic challenge remains unresolved, as inherent latency causes generated actions to inevitably lag behind the real-time environment. This issue is particularly exacerbated in dynamic scenarios, where such temporal misalignment severely compromises the policy's ability to interpret and react to rapidly evolving surroundings. In this paper, we propose a novel framework that leverages predicted object flow to synthesize future observations, incorporating a flow-based contrastive learning objective to align the visual feature representations of predicted observations with ground-truth future states. Empowered by this anticipated visual context, our asynchronous policy gains the capacity for proactive planning and motion, enabling it to explicitly compensate for latency and robustly execute manipulation tasks involving actively moving objects. Experimental results demonstrate that our approach significantly enhances responsiveness and success rates in complex dynamic manipulation tasks.

Approach

F2F-AP aims to resolve the latency problem inherent in asynchronous policies by utilizing temporally aligned future proprioceptive states and visual observations, which can be formulated as follows:

π(a_t+H:t+H+n | s_t+H, ô_t+H)

First, F2F-AP employs predicted object flow as a bridge to synthesize future observations, ensuring alignment with the ground truth within the feature space. Furthermore, the action chunks generated by F2F-AP are strictly aligned with the exact moment of execution, effectively eliminating the need for discarding initial H action steps or employing post-hoc action fusion algorithms. Through this design, F2F-AP substantially improves the model's performance on dynamic tasks.

Left: Illustration of the asynchronous inference achieved by F2F-AP. The model plans from a future state s_t3 towards the anticipated position t₆ of the interacting object at timestamp t₁, enabling advance planning and motion despite real-world system latency. Middle: The model takes robot states and multi-frame RGB images as input. A Flow Predictor extracts object flow to synthesize augmented future observations, which are then processed by the Policy as future observation to generate action chunks. Right: We introduce contrastive learning to minimize the feature distance between predicted and real future observations. The ★ indicates that these features share the same encoder.

Experiments

To assess the capability of F2F-AP in real-time dynamic scenarios, we designed five tasks for grasping moving objects, executed on two different robots.

Through comparative experiments involving different inference modalities for imitation learning policies, we demonstrate that F2F-AP significantly enhances policy performance in real-time dynamic tasks with system latency.

More stochastic and non-linear experiment.

Bibtex

@article{wei2026f2fap, title={\href{http://arxiv.org/abs/2604.02408} {F2F-AP: Flow-to-Future Asynchronous Policy for Real-time Dynamic Manipulation}}, author={Wei Haoyu and Xu Xiuwei and Cheng Ziyang and Yin Hang and Ma Angyuan and Yu Bingyao and Zhou Jie and Lu Jiwen}, year={2026} }