HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots

Abstract

Humanoid whole-body control requires adapting to diverse tasks such as navigation, loco-manipulation, and tabletop manipulation, each demanding a different mode of control. For example, navigation relies on root velocity tracking, while tabletop manipulation prioritizes upper-body joint angle tracking. Existing approaches typically train individual policies tailored to a specific command space, limiting their transferability across modes. We present the key insight that full-body kinematic motion imitation can serve as a common abstraction for all these tasks and provide general-purpose motor skills for learning multiple modes of whole-body control. Building on this, we propose HOVER (Humanoid Versatile Controller), a multi-mode policy distillation framework that consolidates diverse control modes into a unified policy. HOVER enables seamless transitions between control modes while preserving the distinct advantages of each, offering a robust and scalable solution for humanoid control across a wide range of modes. By eliminating the need for policy retraining for each control mode, our approach improves efficiency and flexibility for future humanoid applications.

Right-Hand Mode | Two-Hand Mode | Left-Hand Mode
(the same HOVER policy under different control modes)

Left-Hand Mode | Right-Hand Mode | Two-Hand Mode
(the same HOVER policy under different control modes)

Head Mode | H2O Mode | OmniH2O Mode
(the same HOVER policy under different control modes)

H2O Mode | OmniH2O Mode | ExBody Mode | HumanPlus Mode
(the same HOVER policy under different control modes)

HumanPlus Mode | ExBody Mode | OmniH2O Mode | H2O Mode
(the same HOVER policy under different control modes)

ExBody→H2O Swtich | HumanPlus→OmniH2O Swtich | RootVel→RootPitch Swtich
(the same HOVER policy under different control modes)

Head Mode | Left-Hand Mode | Right-Hand Mode | Two-Hand Mode | Root-Velocity Mode
(the same HOVER policy under different control modes)

Method

Training Framework

The HOVER policy is distilled from the Oracle policy through proprioception and command masking. The task commands for the student are determined via mode-specific and sparsity-based masks, applied to both upper and lower body motions independently. These masks generate diverse task command modes, refining the student's inputs. The distillation employs DAgger to align the student’s actions with those of the oracle, optimizing through supervised learning on the oracle’s actions.

Deployment Framework

HOVER enables versatile humanoid control with a unified multi-mode command space. The versatile multi-mode command space supports kinematic position tracking (blue), local joint angle tracking (yellow), and root tracking (purple). Highlighted boxes indicate active commands being tracked, while the masks (dashed boxes on the right) allow selective activation of different command spaces to accommodate various tasks.

BibTeX

@article{he2024hover,
      title={HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots},
      author={He, Tairan and Xiao, Wenli and Lin, Toru and Luo, Zhengyi and Xu, Zhenjia and Jiang, Zhenyu and Liu, Changliu and Shi, Guanya and Wang, Xiaolong and Fan, Linxi and Zhu, Yuke},
      journal={arXiv preprint arXiv:2410.21229},
      year={2024}
    }

HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots

Abstract

Right-Hand Mode | Two-Hand Mode | Left-Hand Mode (the same HOVER policy under different control modes)

Left-Hand Mode | Right-Hand Mode | Two-Hand Mode (the same HOVER policy under different control modes)

Head Mode | H2O Mode | OmniH2O Mode (the same HOVER policy under different control modes)

H2O Mode | OmniH2O Mode | ExBody Mode | HumanPlus Mode (the same HOVER policy under different control modes)

HumanPlus Mode | ExBody Mode | OmniH2O Mode | H2O Mode (the same HOVER policy under different control modes)

ExBody→H2O Swtich | HumanPlus→OmniH2O Swtich | RootVel→RootPitch Swtich (the same HOVER policy under different control modes)

Head Mode | Left-Hand Mode | Right-Hand Mode | Two-Hand Mode | Root-Velocity Mode (the same HOVER policy under different control modes)