Learning a Distributed Hierarchical Locomotion Controller for Embodied Cooperation

Chuye Hong^*¹, Kangyao Huang^*¹, Huaping Liu¹,

¹Tsinghua University ^*Equal contributions

DRL Image

DRL Image

DRL Image

Embodied Cooperation Environments

Abstract

In this work, we propose a distributed hierarchical locomotion control strategy for whole-body cooperation and demonstrate the potential for migration into large numbers of agents. Our method utilizes a hierarchical structure to break down complex tasks into smaller, manageable sub-tasks. By incorporating spatiotemporal continuity features, we establish the sequential logic necessary for causal inference and cooperative behaviour in sequential tasks, thereby facilitating efficient and coordinated control strategies. Through training within this framework, we demonstrate enhanced adaptability and cooperation, leading to superior performance in task completion compared to the original methods. Moreover, we construct a set of environments as the benchmark for embodied cooperation.

Video

Method

DRL Image

D-HRL: demonstrate the information flow within the distributed hierarchical reinforcement learning, using scenario Cooperative Transport as an example. Here we maintain a centralized training but distributed control hierarchical reinforcement learning framework, where we decompose the complex behaviours into three levels: Upper Layer (UL), Middle Layer (ML), and Lower Layer (LL). UL module processes the external information e including environmental perception and relative position to colleagues, extracting features and sending to ML; ML module is a recurrent neural network layer that maintains a recurrent state $h$ considering temporal and spatial correlation, and outputs a locomotion command into LL module; LL module is a pre-trained locomotion control layer that generates action a and applies it to agent according to the proprioceptive observation p, where we have two modes: position and velocity. The goal of Cooperative Transport is to move a cylinder object to the red target zone collaboratively by a group of Ant robots.

Cooperated Transport

No Hierachy ()

D-HRL (✔)

Crossing Corridor

No Hierachy ()

D-HRL (✔)

Ravine Bridging

No Hierachy ()

D-HRL (✔)

Scalability

3 agent (✔)

4 agents (✔)

5 agents (✔)

6 agents (✔)