Optimizing human motion for the control of a humanoid robot

合集下载

基于虚拟人设计与实现机器人舞蹈动作的一种方法

文章编号：1007-757X(2012)5-0011-04基于虚拟人设计与实现机器人舞蹈动作的一种方法郭瑾，高伟，刘德山，徐本强摘要：基于三维虚拟人的骨骼模型仿真实现了机器人的舞蹈动作。

分析了机器人舞蹈动作运动的基本规律和实现动作的基本方法，基于此建立了三维虚拟人实现机器人舞蹈动作的关键帧数据库，在此基础上利用中间帧的过渡函数实现了虚拟人骨骼模型仿真机器人舞蹈动作的方法。

实验中利用NOBODY机器人舞蹈视频实现了三维虚拟人骨骼模型的仿机器人的舞蹈动画，结果证明了方法的正确性和可行性，对于机器人的三维虚拟化和仿真研究具有重要意义。

关键词：舞蹈；机器人；三维虚拟人；仿真中图分类号：TP311 文献标志码：ADesign and Realization Method of Robot Dance Motion Based On Virtual HumanGuo Jin, Gao Wei, Liu Deshan, Xu Benqiang（Computer and Information Technology, Liaoning Normal University, Dalian 116081,China）Abstract: This paper introduces realization of robot dance motion based on the simulation of 3D virtual human bone model. First, the basic regular pattern and basic realization method of robot dance motion are analyzed, and then build key frame database of robot dance motion implemented by 3D virtual human. Based on this, the transition function of middle frame is adopted to realize the si-mulation of robot dance motion by virtual bone model. In the experiment, NOBODY robot dance video is used to realize robot dance animation simulated by 3D virtual human bone model. The result of experiment shows that this method is correct and feasible, and it has important sense to 3D virtualization and simulation research.Key words: Dance; Robot; 3D Virtual Human; Simulation0 引言随着虚拟现实技术和机器人技术的日益成熟，音视频相结合的虚拟人模拟实现仿真机器人舞蹈动画作为一种崭新的艺术形式受到越来越广泛的关注，在包括计算机游戏在内的各种娱乐应用领域有着广泛的应用前景。

基于Leap motion的手势移动机器人控制

基于Leap motion的手势移动机器人控制简介(Introduction)设计题目：基于Leap motion的手势移动机器人控制。

开发环境（Software Development Environment）：Arduino，Visual Studio 2013。

硬件平台：Leap motion，HCR移动机器人（控制器：Arduino Mega 2560，电机：12V 直流减速电机146rpm）。

1 本周内容(Section 1)功能实现分为以下五部分：1.Leap motion识别手势信息。

利用Leap motion 现有的例程，识别几个简单的手势信息（点击、滑动、顺时针或逆时针旋转）。

2.自定义头文件，将Leap motion识别到的几个手势信息定义为我们所需要的指令。

（这里我们是将点击定义为“W”即小车向前移动命令，将滑动定义为“S”即小车向后移动命令，将顺时针或逆时针旋转定义为“A”或“D”即小车左转或右转命令。

）3.将我们所使用的Leap motion例程和自定义头文件程序发送到控制小车的串口。

我们这时可以通过串口调试助手来观察串口收发指令的情况。

4.将下位机程序烧入小车控制器，这里我们所使用的是Arduino Mega 2560集成开发板（基于A VR单片机）。

5.利用手势来控制小车的移动，达到简单的人机交互的目的。

2 总结(Conclusion)3.1所遇到的问题：1. 手势控制只能进行一次。

即只有第一次的手势信息有效，能够达到控制小车的目的。

但后面的手势虽能通过串口助手监测到，但无法实现控制小车的目的？可能的原因：1）串口只能执行接受到的第一条指令。

2）缺少中断和循环程序。

2. Leap motion手势识别程序编写难度大，所牵扯到的工程项目多，目前还不能自己编写Leap motion手势识别程序，只能找相关的例程（C++程序）。

3. 不要忽略了Leap motion环境配置这里步骤，未配置好环境会导致在程序调试的过程中出现诸多的错误。

Research on Locomotion Control based on Humanoid Soccer Robot

Research on Locomotion Control based on Humanoid Soccer RobotWang Hao, Bao Dun-qiao, Fang Bao-fu, Li LongSchool of Computer and Information, Hefei University of Technologyjsjxwangh@ , dunqiaobao@Abstract: In order to control a humanoid soccer robot with a high degree of freedom to get faster and more stable movement, locomotion control should be designed well which includes gait planning, motor planning and state detection. The gait planning plans accelerated & decelerate walking period and offset angle in each cycle. Motor planning tries to get current stance by programming smooth joint motion. The state detection is taken as events trigger which driving the gait planning or handling abnormal state just like falling down, joint mistake etc. The humanoid robot system simulation RoboCup 3D league is a novel experiment platform for researchers of humanoid robot behavior. It is very cheap and easy to optimize the humanoid algorithms and apply results in real humanoid robot. The proposed control has been tested in RoboCup 3D server platform. The result on RoboCup simulation team –HfutEngine3D shows this control method’s validity.Key words: Locomotion Control, Humanoid Soccer Robot, Gait planning, Simulation1. IntroductionCompared with wheeled mobile robot, humanoid robot put up more superiority on many aspects such as adaptability of complex unstructured landform. Since complex ontic structure, numerical model of humanoid robot request more accurate and efficient locomotion planning, which provides a corking experiment platform for cybernetics, optimization algorithm, kinematics and dynamics simulation.The humanoid robot simulation was used for the first time in RoboCup-2007 (See Fig.1).The HOAP-2 (Humanoid for Open Architecture Platform) is the prototype of the robot which is used in the RoboCup simulation league which has 20 degrees of freedom with one motor in each joint. Therefore it can behave like a real humanoid robot.This paper proposes a locomotion control for humanoid robot which has been implemented in our RoboCup simulation team – HfutEngine3D. Section II briefly describes robot’s locomotion controller which includes walking planning and state detection controller. Section III gives the experimental results, followed by conclusion and future work in Section IV.Fig.1 RoboCup Soccer 3D league2. Locomotion ControlLocomotion control contains three parts: gait planning controller, motor planning controller and state detection controller. Here two locomotion states are defined. The state which robot is standing steadily or walking evenly without any other object obstructing is called E(execution)-State. The other state is called T(trigger)-State. In this state, robot falls down or it’s walking target changes. E-State executes planned gait and walk alone planned path. T-State layouts new walking path and gait during all walking period. Fig.2 shows locomotion control’s operation.Fig.2 Locomotion Control Operation 2.1 Gait PlanningIn the field, our robot can walk directly to a random point which needs to control the accelerated and decelerated walking period. The following describes what the gait controller in T-State planning should do:(1).Dynamic acceleration calculated by the distance to target and target angle.In the accelerated period, robot should accelerate it’s velocity with the accelerate value decreasing gradually. A log10 function is used for that its differential coefficient function is a descending function.(101.67607log (1)ValueofAcceleration DistToGoal =×) (2).Predict three period lengths: acceleration speed period, uniform speed period, deceleration speed period. Calculating not only acceleration time but also deceleration time, robot’s max walkvelocity must be surveyed. Most use linear multinomial to calculate the velocity. However, this method is not very accurate.In the aspect of description of the relationship between the distance to target and desired robot’s max velocity, quadratic multinomial is more accurate than linear multinomial just like Newton interpolation polynomial. ()[]()[]()()[]()()()0100120101011,,,,,,(2)n n n p x f x x x x f x x x x x x x f x x x x x x x x x −=−+−−++−−−"""3d Supposed that robot’s reachable max velocity is, then we choose randomly avalue namedwhich is just small thanV-max. After increasing to, the robot’swalking speed will reduce gradually to zerowhen robot approaching the target. The whole scene will be tested many times for calculating the average distance the robot has walkedby. By using Newton interpolation formulas, three group date: (,), (,) and(,) should be imported to the maxvelocity function. max V 1m V 1m V 1d 1d 1m V 2d 2m V 3m V ()()()()()()()211121322132211221(3)m m m d m m m m m V V V V d d d d V V V V d d d d d d d d d d −=+⋅−+−−−⎛⎞−⎜⎟−−⎝⎠⋅−⋅−−md V is just lied on the value d, distance totarget.Similarly, the slowdown point function iscreated. The only difference from max velocity function is what should be calculated is the distance of robot’s slow down way.()()()211121322132211231(4)m m m m m m m m m d d DistToBrake d v v v v d d d dv v v v v v v v d d −=+⋅−+−−−−−−⋅−⋅−−2.2 Motor Planning Almost in every cycle, the simulator gets motor commands from 20 joints of robot. Based on gait planning, joint should swing to the planned stance in E-State time. Three parameters will be considered for calculating the velocity, which are joint initial angle, joint ultimate angle and stated time. Generally,there are two velocity strategies are proposed.One takes uniform velocity strategy. It is()()()()__;____(5)0;___ultimate angle initial angle stated time if ultimate angle initial angle threshold angle Velocity if ultimate angle initial angle threshold angle −⎧⎪⎪⎪−>==⎨⎪⎪⎪−<⎩ The parameter of threshold angle is an experiential value which considers the inertia of joint’s swing. In a very short distance to ultimate angle condition, low of inertia should be considered so that the inertia will drive the joint keep on swing even when joint’s motor have stop then. When desired angle is largerthen threshold angle, the velocity is calculated by desired angle dividing the stated time. Unfortunately, the threshold angle is hard to get the exact value even after a large mount of experiments.The other strategy is using variational velocity, it can describe as the following.A perfect circumstance is that the velocity of joint’s swing is changing. The velocity is firstly accelerated and then decelerated. Lastly, joint just stops at the ultimate angle. The sinusoid function’s differential coefficient is a cosine function which is known as degressive function.Fig.3 Sinusoid Angle FunctionFig.4 Cosine Velocity FunctionRobot’s take sinusoid function as angle function while taking cosine function as velocity function Angle function:()()__sin 2t Angle Ultimate angle Initial angle Time π⎛⎞=−⋅⋅⎜⎟⎝⎠6 Velocity function:()()()__cos 22dAngle t Ultimate angle Initial angle Time Time dt ππ⎛⎞=⋅−⋅⋅⋅⎜⎟⎝⎠7condition handle map.le map3. E Result ree points to which respe Value t is current planned cycle which istaken as input value. Time represents total planned cycles. 2.3 State DetectionState Detection’s mechanism is just like burst mode which works both in E-State and T-State. In E-State, state detection controller detects robot current velocity, direction, whether falling down and planned path. At same time, some conditions of transferring E-State to T-State must been established, which can be considered as trigger conditions. Following is our robot’s trigger conditions which have been mapped to various handle process.●Condition: Falling down handle process: Rise up program●Condition: Joint mistake handle process: Revert Zero-Stance program●Condition: Target point shift handle process: New walking path planning●Condition: Walking path conflict handle process: Brake walking programMeanwhile one process will be mappedto others process. All process is towards to new planning handle. Fig.5 shows the Fig.5 Condition hand xperiment on Controller &3.1 Gait ComparisonIn the field, th ctive distance from start point are 10, 30 and 50. Robot Ra’s gait strategy is turning firstly, and then takes uniform velocity for walking. Since having no decelerated walking period, Ra has to use special action to get stop. Robot Rb’s strategy is the one mentioned in Section II. The parameters for comparison are the walking time and error distance to the correct point, which are showed in Fig.6 andFig.7. Here Robot A (represented as Ra) takes original controller while Robot B (representedas Rb) takes new controller.Fig6. Using Time for walking Compariso the same n In a shorter journey, Ra takes almost time with Rb to finish it. As the journey become more longer, Rb becomes more faster than Ra. At the same time, Accuracy about Rb also shows advantage in Fig.7.Fig.7 Accuracy after Stop Comparison3.2 M ts of Ra & Rb with angl otor ComparisonMotor drives the join e displacement in an same time: 20, 40, 80 and 160. Each angle displacement will be tired for 10 times. Ra uses uniform velocity and Rb uses variational velocity. Fig8. logs the offset of angle. With angle displacement’s increasing, Rb’s offset of angle is much less than Ra. They are not in one order ofmagnitude.Fig8. Motor Accuracy Comparison.4 Con locomotion cont . Experimental study of biped clusion and Future WorksThis paper proposed a roller for humanoid soccer robot, which is adopted by HfutEngine3D which just take the 3rd place of RoboCup IranOpen 2008. This achievement and above experiment all proved the controller’s validity.The future work for us is to design a motion stabilization controller which enable robot just by using some kinetics calculations to change motion stance to be steady-going when it meets some abnormal occasion just like collision with other object. REFERENCE ：[1]. Kajita. S, Tani, K dynamic walking’ IEEE Control Systems. 1996 V ol.16, No.1:13-19[2]. Kajita. S, Humanoid Robots. Tsinghua University Press, 2007: 20-67[3]. R.Tedrake, T.W.Zhang, H.S.Seung. Stochastic Policy Gradient Reinforcement Learning on a Simple 3D Biped. Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS2004): 2849-2854[4].Guestrin, C., Venkataraman, S., Koller, D Context-specific multiagent coordination and planning with factored MDPs. In: Proc. 8th Nation. Conf. on Artificial Intelligence, Edmonton, Canada (2002) [5].Kok, J.R., Spaan, M.T.J., Vlassis, N.: Non-communicative multi-robot coordination in dynamic environments. Robotics and Autonomous Systems 50 (2005): 99-114[6] 6 AN Zhu-lin, YU Jing-jing, WANG Hao. Robocup Simulation League Goalie Design. Proceedings of 1st Austria Open of RoboCup. Melbourne,pp,77-87,Jan, 2003.[7]. Fang Bao-fu Wang Hao Liu Jia, and Su Chen-wen. Team Strategy of HfutAgent. Proceedings of 2003 Master China RoboCup, Beijing, pp.1-5, Aug, 2003.。

基于扰动观测器的轮式移动机器人滚动时域路径跟踪控制

第51卷第3期2021年5月吉林大学学报（工学版）Journal of Jilin University (Engineering and Technology Edition)V o l. 51 N o. 3M a y 2021基于扰动观测器的轮式移动机器人滚动时域路径跟踪控制于树友“2,常欢2,孟凌宇2,郭洋2,曲婷1(1.吉林大学汽车仿真与控制国家重点实验室，长春130022;2.吉林大学通信工程学院，长春130022)摘要：轮式移动机器人路径跟踪控制问题中通常存在状态约束和输入约束，并且系统运行时容易受到外部扰动的影响。

本文基于非线性扰动观测器提出了一种轮式移动机器人滚动时域路径跟踪控制策略。

当没有外部扰动作用于系统时，滚动时域控制算法可以满足控制约束和状态约束，并且使得轮式移动机器人跟踪期望的轨迹；当存在外部干扰，尤其是慢变扰动时，非线性扰动观测器能够估计扰动，并通过反馈补偿扰动对轮式移动机器人移动轨迹的影响。

仿真结果表明，在外部干扰存在的情况下该控制策略能够保证移动机器人渐近跟踪期望路径。

关键词：自动控制技术；轮式移动机器人；路径跟踪问题；扰动观测器；滚动时域控制中图分类号：T P273 文献标志码：A文章编号：1671-5497(2021)03-1097-09D O I：10. 13229/ki.j d x b g x b20200065Disturbance observer based moving horizon control for pathfollowing problems of wheeled mobile robotsY U S h u-y o u12,C H A N G H u a n2,M E N G L i n g-y u2,G U O Y a n g z,Q U T i n g1(1. S ta te K e y L a b o r a to r y o f A u to m o tiv e S im u la tio n a n d C o n tro l ^J ilin U n i v e r s i ty C h a n g c h u n130022, C h in a；2. C o lle g e o f C o m m u n ic a tio n E n g in e e r in g, J ilin U n iv e r sity y C h a n g ch u n130022, C h in a)Abstract：State constraints,input constraints and external disturbances usually exist in the path following problem of w h e e l e d mobile robots.Ba s e d o n nonlinear disturbance observer,a m o v i n g horizon control strategy for path following p r o blem of wheeled mobile robots is proposed in this paper.W h i l e there is n o disturbance at all,the m o v i n g horizon control can satisfy the input and state constraints,and drive the w h e eled mobile robot to the desired path.W h i l e there are disturbances,in particular,slow varying and “big”disturbances,the proposed nonlinear disturbance observer can estimate the disturbances,and c o m p ensate the influence of the disturbances o n the w h e e l e d mobile robot through a feedback.Simulation results s h o w that the proposed control strategy can guarantee the convergence of the mobile robot to the desired path under the external disturbance.收稿日期：2020-02-10.基金项目：国家自然科学基金项目（U1964202,61703178);江苏省新能源汽车动力系统重点实验室开放课题项目(JKLNEVPS201901).作者简介：于树友（1974-)，男，教授，博士 .研究方向：预测控制，鲁棒控制.********************.cn通信作者：曲婷（1982-)，女，副教授，博士 .研究方向：汽车动力系统控制及驾驶员行为建模.E-mail ：**************.cn•1098 .吉林大学学报（工学版）第51卷Key words：automatic control technology；w h e eled mobile robot；path following p r o b l e m；disturbance observer；m o d e l predictive control〇引言轮式移动机器人（W h e e l e d mobile robot，W M R>是典型的非完整约束系统由于Brockett 条件11的存在，不能获得连续可微、线性时不变的反馈控制律镇定轮式移动机器人系统。

基于 ADAMS 的仿人机器人行走仿真

基于 ADAMS 的仿人机器人行走仿真肖乐;常晋义;殷晨波【摘要】A simulation method combining ADAMS and Matlab was adopted. 3D entity model, kinematics and dy⁃namic model were built in ADAMS. The design and simulation of the control system were realized in Matlab. The data communication between Matlab and ADAMS was realized by the interface module ADAMS/Controls. This sim⁃ulation method lays a foundation for on-line control of humanoid robot.% 采用机械系统动力学仿真分析软件 ADAMS 进行建模和动力学仿真，提供机器人三维实体模型、运动学和动力学模型以及动画仿真。

采用控制系统专业软件 Matlab 进行机器人控制系统设计，提供控制关节目标轨迹、稳定控制算法并输出驱动力矩。

通过ADAMS/Controls 接口模块建立起 Matlab 与 ADAMS 之间的数据接口。

联合仿真方法为实现仿人机器人在线控制奠定了基础【期刊名称】《常熟理工学院学报》【年(卷),期】2012(000)004【总页数】6页(P73-78)【关键词】仿人机器人;虚拟样机;行走稳定性;联合仿真【作者】肖乐;常晋义;殷晨波【作者单位】常熟理工学院计算机科学与工程学院,江苏常熟215500;常熟理工学院计算机科学与工程学院,江苏常熟215500;南京工业大学机械与动力工程学院,南京210009【正文语种】中文【中图分类】TP24由于仿人机器人研制的复杂性，有必要在物理样机制造之前先建立一个虚拟样机系统，在虚拟环境中模拟机器人双足行走的状态，通过模型计算出各个关节的驱动力矩、ZMP点的变化轨迹等，并对设计方案进行优化，提高物理样机研制成功的概率[1-4].为了准确地建立仿人机器人的虚拟样机模型，发挥各类专业软件的优势，本文采用ADAMS软件进行建模和动力学仿真，并在Matlab中建立控制系统，Matlab将机器人关节力矩控制指令传送给ADAMS，ADAMS将通过建立虚拟位置传感器将关节角实时反馈给Matlab，实现联合仿真.ADAMS（Automatic Dynamic Analysis of Mechanical System）是世界上应用最广泛的机械系统动力学仿真分析软件[5-6].它由几十个模块组成，其中最主要的模块为ADAMS/View用户界面模块和ADAMS/Solver求解器，通过这两个模块可以对大部分的系统进行仿真分析.ADAMS/View提供一个直接面向用户的基本操作环境，包括样机的建模和各种建模工具、样机模型数据的输入与编辑、与求解器和后处理等程序的自动连接、虚拟样机分析参数的设置、各种数据的输入和输出、同其他应用程序的接口等.ADAMS/Solver是求解机械系统运动和动力学问题的程序.完成样机分析的准备工作以后，ADAMS/View自动调用ADAMS/Solver模块，求解样机模型的静力学、运动学、或动力学问题，完成仿真分析以后再自动地返回ADAMS/View操作界面.在ADAMS中建立的仿人机器人虚拟样机模型及基本组成部件与主要关节如图1所示，其简化条件为：①腿和地面都是刚性的，不考虑其弹性特征；②忽略纵向平面和横向平面的动力学耦合；③足与地面有足够大的摩擦，在行走过程中，支撑腿脚与地面没有滑动；④忽略关节摩擦.已知所有连杆长度li（i＝1，2，3，4，5，6，7）和关节角度，计算仿人机器人摆动腿的位姿.假设各关节在参考坐标系中的坐标为（xi,zi），则运动学方程为仿人机器人重心在固定坐标系中的坐标使用ADAMS/Controls控制模块，将ADAMS/View程序和MATAB控制分析软件有机的连接起来，实现将ADAMS机械系统虚拟样机作为一个机械系统模型引入控制分析软件，ADAMS模型中的输入变量相当于要求的控制量，即关节驱动力矩；输出变量相当于传感器的测量值，即系统的状态信息，主要包括：各个关节的角位移、角速度和角加速度以及整体信息如重心、ZMP等.联合仿真分析包括下面4个基本步骤.（1）构造ADAMS/View样机模型.首先构造ADAMS/View机械系统样机模型，包括几何模型、各种约束和作用力等.（2）确定ADAMS的输入和输出.输出是指进入控制程序的变量，表示从ADAMS/Controls输出到控制程序的变量.输入是指从控制程序返回到ADAMS的变量，表示控制程序的输出.通过定义输入和输出，实现ADAMS和Matlab控制程序之间的信息封闭循环，即从ADAMS输出的信号进入控制程序，同时从控制程序输出的信号进入ADAMS程序.这里所有程序的输入都应该是设置的变量，而输出可以是变量或是测量值.（3）构造控制系统方框图.用Matlab控制程序编写的整个系统的控制图，ADAMS/View的机械系统样机模型设置为控制图中的一个模块.（4）联合仿真.根据各关节自由度实际运动角度，在线计算摆动脚位置（xf，zf）值.控制系统建模的目的是建立一个机械与控制一体化的样机模型，通过向ADAMS 方框图中添加控制系统，实现控制系统的建模，基本步骤如下：（1）启动控制程序Matlab；（2）在Matlab程序中输入ADAMS模块；（3）在Simulink中设置仿真参数；（4）运用Simulink工具，进行控制系统建模.用Matlab/simulink软件建立控制系统框图如图2所示.双击joint control模块，显示关节控制模块的子系统，如图3所示，完成各个控制图标以及下一级模块设置.双击adams_sub模块，显示adams_sub模块的子系统，如图4所示，对其中各个模块参数进行设置.为了准确地建立仿人机器人的虚拟样机模型，发挥各类专业软件的优势，可采用ADAMS软件进行建模和动力学仿真，提供机器人三维实体模型、运动学和动力学模型以及动画仿真.采用应用控制系统专业软件Matlab进行机器人控制系统设计，提供控制关节目标轨迹、稳定控制算法并输出驱动力矩.通过ADAMS/ Controls接口模块建立起Matlab与ADAMS的实时数据管道，Matlab将机器人关节力矩控制指令传送给ADAMS，ADAMS通过建立虚拟位置传感器将关节角实时反馈给Matlab，形成一个完整的闭环控制系统，实现联合仿真，如图5所示.联合仿真程序中，步态通过上身位置与时间的对应关系进行调整，输入PD控制器预先规划好的步态轨迹，同时与实际步态值进行比较，通过计算实际值与规划值之间的差值，得出控制关节所需的扭矩，从而实现对仿人机器人稳定行走步态的控制.但是仅仅对步态的控制还不能满足稳定行走的要求，还需要在行走过程中实时计算ZMP点的轨迹，通过ZMP轨迹点的位置来确定机器人与之对应的姿态，实现仿人机器人的稳定步行.仿人机器人完整动作过程包括：摆动腿抬起，离开地面、向前摆动、到达最高点、继续向前、最后落到地面，仿真步行周期设置为1.0 s，步行速度2 km/h，步幅为520 mm，图6-图8的仿真结果图都能清楚的反映出仿人机器人摆动腿各个关节运动曲线参数的变化情况.可以看出在整个过程中，各个关节曲线连续无突变，ZMP曲线始终位于支持多边形稳定区域内.通过仿人机器人运动过程的计算机仿真，可以直接观察到机器人实际的姿态变化，为实时控制提供了依据.【相关文献】[1]马宏绪.两足步行机器人动态行走研究[D].长沙：国防科技大学，1995.[2]C B Yin,A Albert.Stability Maintenance of a Humanoid Robot under Disturbance with Fictitious Zero-Moment Point[A].IEEE/RSJ International Conference on Intelligent Robots and Systems[C].Edmonton Alberta Canada August,2005:1780-1787.[3]Li Zhaohui,Huang Qiang,Li Kejie,et al.Stability Criterion and Pattern Planning for Humanoid Running[C].Proceedings of the 2004 IEEE International Conference on Robotics and Automation.New Orleans,France:IEEE Press,2004:1059-1064.[4]肖乐，常晋义.仿人机器人下楼梯的自适应模糊控制方法[J].计算机工程，2009，35（13）：193-195.[5]徐燕华.复杂产品的虚拟样机仿真研究[D].天津：天津大学机械工程学院，2004.[6]余朝举.基于虚拟样机技术的双足步行机器人设计与研究[D].北京：北京邮电大学，2009.。

人体下肢运动力学分析与建模

论文作者签名：
日期：
年
月
日
指导教师签名：
日期：
年
月
日
杭州电子科技大学硕士学位论文
第1章
1.1 课题背景及研究意义
绪论
从古至今，因为战争、工伤、疾病、交通事故和意外伤害等原因而产生的下肢截肢者随着工业、交通的迅速发展而迅速增加。一项调查显示仅美国每年就大约有 11 万人失去下肢，而我国目前下肢残疾者更是高达 600 万人，其中下肢截肢者约 137 万多人[1]。这些下肢截肢者由于失去了人类最基本的功能之一——行走，生活难以自理，被安置在一些脱离社会的特定角落，致使他们在身体心理都充满着常人无法体会的痛苦。但是现在医疗水平还不能使肢体再生，为这些截肢者安装人工假肢就成了恢复其一些日常活动的唯一手段。随着科学的进步，人们生活水平的提高，不仅要求假肢要具有很好的装饰性，而且对其运动性能的要求也越来越高。智能下肢假肢通过检测穿戴者的运动状态，来控制假肢运动，从而提高步态的灵活性、协调性和安全性。现代运动生物力学对人体腿部运动信息的采集与分析在机器人和假肢研究方面有着重大的作用。二十世纪中期以来，把生物力学同体育科学理论研究相结合条件日趋成熟，运动生物力学逐渐形成为了一门独立的学科。运动生物力学研究的内容是人体运动中的机械运动规律，以生物学和力学的理论、方法研究人体从事各种活动、运动和劳动的动作技术，使复杂的人体动作技术奠基于最基本的生物学和力学规律之上，并以数学、力学、生物学以及动作技术原理的形式加以定量描述。随着计算机、传感器、测速器、高速摄影、测力台和电子解析系统技术的应用，使准确地测量与分析人体运动的参数成为现实，科学技术的发展为运动生物力学的研究奠定了坚实的物质基础，而生物学、力学理论的发展与完善则为它建立了坚实的理论基础。人体运动、活动和劳动中的各种动作技术，可以通过生物力学方法进行测试研究，提高动作技术效率，提高运动技术水平[2]。本论文依据获取的下肢的运动力学信息，通过建立人体下肢动力学模型和进行运动力学分析，研究表面肌电信号与运动参数，关节力矩相互之间的关系，从而对下肢运动进行建模。为深入地研究假肢的设计和控制建立基础，使假肢的运动自然协调，快速灵活。为国内下肢假肢事业的发展打下基础，缩小和国外智能假肢发展水平的差距，改善残疾人的生活质量，提高他们的社会活动参与能力，促进社会的和谐发展，对康复医学也具有重要的意义。

下肢外骨骼机器人结构设计和动力学仿真_杨宗林

下肢外骨骼机器人结构设计和动力学仿真杨宗林，曾亿山，王善杰（合肥工业大学机械与汽车工程学院，安徽合肥230009）来稿日期：2015-07-09作者简介：杨宗林，（1991-），女，安徽六安人，硕士研究生，主要研究方向：机器人。

1引言随着现代军事的发展和对单兵作战能力要求的提高，步兵正在按照机动力、防护力、进攻力和信息力分别提高的路径发展，单个士兵所携带的装置也日趋复杂、沉重[2]。

外骨骼机器人是一种并联在人体上机电一体化装置，随人体一起运动[3]，能够提供助力，减少人体疲劳。

目前国内外的外骨骼机器人驱动方式主要有液压、气动、电动三中常用方式[4]。

液压驱动系统虽然体积小、推力大，但可靠性差、维护麻烦[5]。

气动驱动系统推力偏小，不能实现精确的中间位置调节[6]。

本设计将采用电缸驱动的方式，其主要优点有：精确度高，调速方便，噪音小等。

通过对下肢外骨骼机器人（EXO-P ）结构的设计和动力学仿真，可以为电缸的选型以及控制提供依据。

为了做出一种能够快速响应的外骨骼机器人，EXO-P 选用伺服电缸的方式。

但是仿人步态规划建模和动力学仿真是个问题。

现国内外的外骨骼机器人（像美国加州大学伯克利分校的BLEEX 系统[7]，中科院的外骨骼机器人）在结构设计阶段的动力学仿真所用的运动轨迹，是将机构简化计算出来的。

这种方法太过理论化，没有实际应用意义。

为此，EXO-P 采用了NDI 三维动态测量方法。

进行仿人步态规划建模，在此基础上，对规划的步态进行动力学仿真，并得到了结果，切实解决了伺服电缸选型的实际问题。

2结构设计2.1自由度的分配在人体骨骼系统研究的基础上，设计出可穿戴外骨骼机器人的自由度分配，如图1所示。

其中，髋关节和踝关节具有屈曲/伸展、转动和外展/内收三个自由度，膝关节具有屈/伸自由度，脚踝有屈曲/伸展，屈曲/伸展两自由度。

脚底为橡胶，可弯曲。

摘要：为增加单兵作战能力，以人为核心，同时具有机械的高负载能力、耐力、长时间运动能力，设计出一款结合人工智能与机械的助力机器人。

机器人学导论分析解析

Forward kinematics of manipulators (Chapter 3) Kinematics is the science of motion that treats motion without regard to the forces which cause it. e.g. position, velocity, acceleration and all higher order derivatives of the position variables.
Cartesian Spherical
Cylindrical
Articulated
SCARA
Topics： • Lectures: Description of position and orientation (Chapter 2) Forward kinematics of manipulators (Chapter 3) Inverse kinematics of manipulators (Chapter 4) Velocities, static forces, singularities (Chapter 5) Dynamics (Chapter 6) Trajectory generation (Chapter 7) Manipulator design and sensors (Chapter 8) Linear position control (Chapter 9) Nonlinear position control (Chapter 10) • Experiments: Programming robots and Off-line programming and simulation (Chapter 12 & 13)

人机协作机器人的碰撞检测识别及安全控制

分类号：T P249密级：公开UDC：62-5编号：201531204032河北工业大学硕士学位论文人机协作机器人的碰撞检测识别及安全控制论文作者：郑晨晨学生类别：全日制专业学位类别：工程硕士领域名称：机械工程指导教师：陈贵亮职称：高级工程师资助基金项目：D i s s e r t a t i o n S u b m i t t e d t oH e b e i U n i v e r s i t y o f T e c h n o l o g yf o rT h e M a s t e r D e g r e e o fM e c h a n i c a l E n g i n e e r i n gC O L I S S I O NDE T E C T I O NA N DI D E N T IF I C A T I O NA N D S E C U R I T Y C O N T R O L O F C O L L A B O R A T I V E R O B O Tb yZ h e n g C h e n c h e nS u p e r v i s o r:C h e n G u i l i a n gM a r c h2018原创性声明本本人郑重声明：所呈交的学位论文，是本人在导师指导下，进行研究工作所取得的成果。

除文中已经注明引用的内容外，本学位论文不包含任何他人或集体已经发表的作品内容，也不包含本人为获得其他学位而使用过的材料。

对本论文所涉及的研究工作做出贡献的其他个人或集体，均已在文中以明确方式标明。

本学位论文原创性声明的法律责任由本人承担。

学位论文作者签名：日期：关于学位论文版权使用授权的说明本人完全了解河北工业大学关于收集、保存、使用学位论文的以下规定：学校有权采用影印、缩印、扫描、数字化或其它手段保存论文；学校有权提供本学位论文全文或者部分内容的阅览服务；学校有权将学位论文的全部或部分内容编入有关数据库进行检索、交流；学校有权向国家有关部门或者机构送交论文的复印件和电子版。

融合元学习和PPO算法的四足机器人运动技能学习方法

第41卷第1期2024年1月控制理论与应用Control Theory&ApplicationsV ol.41No.1Jan.2024融合元学习和PPO算法的四足机器人运动技能学习方法朱晓庆†,刘鑫源,阮晓钢,张思远,李春阳,李鹏(北京工业大学信息学部,北京100020;计算智能与智能系统北京市重点实验室，北京100020)摘要:具备学习能力是高等动物智能的典型表现特征,为探明四足动物运动技能学习机理,本文对四足机器人步态学习任务进行研究,复现了四足动物的节律步态学习过程.近年来,近端策略优化(PPO)算法作为深度强化学习的典型代表,普遍被用于四足机器人步态学习任务,实验效果较好且仅需较少的超参数.然而,在多维输入输出场景下,其容易收敛到局部最优点,表现为四足机器人学习到步态节律信号杂乱且重心震荡严重.为解决上述问题,在元学习启发下,基于元学习具有刻画学习过程高维抽象表征优势,本文提出了一种融合元学习和PPO思想的元近端策略优化(MPPO)算法,该算法可以让四足机器人进化学习到更优步态.在PyBullet仿真平台上的仿真实验结果表明,本文提出的算法可以使四足机器人学会行走运动技能,且与柔性行动者评价器(SAC)和PPO算法的对比实验显示,本文提出的MPPO算法具有步态节律信号更规律、行走速度更快等优势.关键词:四足机器人;步态学习;强化学习;元学习引用格式:朱晓庆,刘鑫源,阮晓钢,等.融合元学习和PPO算法的四足机器人运动技能学习方法.控制理论与应用,2024,41(1):155–162DOI:10.7641/CTA.2023.20847A quadruped robot kinematic skill learning method integratingmeta-learning and PPO algorithmsZHU Xiao-qing†,LIU Xin-yuan,RUAN Xiao-gang,ZHANG Si-yuan,LI Chun-yang,LI Peng(Faulty of Information Technology,Beijing University of Technology,Beijing100020,China;Beijing Key Laboratory of Computational Intelligence and Intelligent System,Beijing100020,China) Abstract:Learning ability is a typical characteristic of higher animal intelligence.In order to explore the learning mechanism of quadruped motor skills,this paper studies the gait learning task of quadruped robots,and reproduces the rhythmic gait learning process of quadruped animals from scratch.In recent years,proximal policy optimization(PPO) algorithm,as a typical representative algorithm of deep reinforcement learning,has been widely used in gait learning tasks for quadruped robots,with good experimental results and fewer hyperparameters required.However,in the multi-dimensional input and output scenario,it is easy to converge to the local optimum point,in the experimental environment of this study,the gait rhythm signals of the trained quadruped robot were irregular,and the center of gravity oscillates.To solve the above problems,inspired by meta-learning,based on the advantage of meta-learning in characterizing the high-dimensional abstract representation of learning processes,this paper proposes an meta proximal policy optimization (MPPO)algorithm that combines meta-learning and PPO algorithms.This algorithm can enable quadruped robots to learn better gait.The simulation results on the PyBullet simulation platform show that the algorithm proposed in this paper can enable quadruped robots to learn walking pared with soft actor-critic(SAC)and PPO algorithms,the MPPO algorithm proposed in this paper has advantages such as more regular gait rhythm signals and faster walking speed.Key words:quadruped robot;gait learning;reinforcement learning;meta-learningCitation:ZHU Xiaoqing,LIU Xinyuan,RUAN Xiaogang,et al.A quadruped robot kinematic skill learning method integrating meta-learning and PPO algorithms.Control Theory&Applications,2024,41(1):155–162收稿日期:2022–09–27;录用日期:2023–05–06.†通信作者.E-mail:*******************.cn.本文责任编委:吴敏.国家自然科学基金项目(62103009),北京市自然科学基金项目(4202005)资助.Supported by the National Natural Science Foundation of China(62103009)and the Natural Science Foundation Project of Beijing(4202005).156控制理论与应用第41卷1引言近些年,移动机器人由于可在多种场景下完成任务受到广泛关注,移动机器人被分类为:轮式机器人、履带式机器人和足式机器人[1];由于结构问题,传统轮式机器人和履带式机器人可应用的场景范围较小,在非平坦地形上,存在越障能力差以及由于地形多变而造成重心不稳等问题[2].相反,足式机器人由于其仿生的腿部结构而拥有更高的灵活性和适应性,可以像动物一样在各种地形完成行走任务[3–4].根据足数量的不同可以将足式机器人进行分类,相较于双足机器人和六足机器人,四足机器人在结构上具备更稳定和更易控制的优点.正因此四足机器人在医疗、物资运输、环境勘探和资源采集等场景中具备很大的应用前景.通过人为指定运动速度的方式,使机器人具备像动物一样的行走姿态[5],这种方法需要人为设置控制器.然而,这种控制器的设计流程更为繁琐,特别是针对机器人的系统和腿部的控制器,不仅对设计者的专业知识水平有较高的要求,而且,在实际应用过程中,控制器的具体参数也需要反复调整和试验[6–7].强化学习[8]具备让智能体自己学习的能力,近年来该技术在各应用领域蓬勃发展[9–11].在机器人设计、操作、导航和控制等方面，也以此为基础不断拓展[12–13].为了解决传统的基于人工设计的控制器存在的问题,越来越多的研究使用强化学习来代替原有设计方式[14],Kohl和Stone[15]在底层控制器设计完善的基础上,通过强化学习调整控制器参数,训练出了四足机器人Aibo的行走控制策略,代替原有的手动设置参数的过程,并且由强化学习训练的步态行走速度快于当时手动调参可达到的最快步态;Peng等人[6]先对动物的动作进行捕捉,得到四足机器人目标动作,之后利用强化学习来让四足机器人完成目标动作,实现了让机器人通过模仿动物来学习运动技能;ETH团队[16]引用策略调整轨迹生成器(policies modulating trajectory generators,PMTG)架构,即设计一个轨迹生成器作为先验,再通过强化学习对前面轨迹生成器进行调整,其中应用了课程学习[17],教师策略中通过编码器将视觉信息和本体感知信息融合编码成隐变量,再让学生策略通过机器人的历史状态来提取隐变量得到机器人的控制策略,并通过在机器人建模时使用随机化处理和训练神经网络电机模型实现了将强化学习算法实际部署在物理机器人上,完成了多种地形上的鲁棒性行走[18–20].上述研究或通过良好的运动先验知识或通过精心设置的奖励函数实现了机器人步态学习和在物理机器人上的部署,然而如何在少量运动先验前提下,通过简单奖励函数快速有效地学会四足机器人步态仍是一个正在面临的挑战.本文在Shi等人的研究[21]基础上,提出一种基于生物启发的元近端策略优化(meta proximal policy opti-mization,MPPO)算法,在有运动先验知识的情况下,仅通过简单奖励函数来快速学习机器人行走技能,和现有强化学习算法柔性行动者评价器(soft actor-critic, SAC)和近端策略优化(proximal policy optimization, PPO)的对比试验结果来看,本文的算法具备一定程度的有效性.2算法设计2.1近端策略优化算法对于机器人系统,策略梯度(policy gradient)算法针对连续动作空间的任务,可取得显著成效,传统策略梯度算法的基本思想是通过最大化状态价值(value)函数Vπ(s)来更新策略(policy)函数,即直接去优化策略网络,更新策略函数的损失函数被定义为L(θ)=∑logπ(a|s;θ)f(s,a),(1)其中f(s,a)为在状态s下采取动作a的评估,策略的梯度表示为g(θ)=E[∞∑t=0φt∇θlogπθ(a t|s t)],(2)最后通过式(3)更新参数θθ←θ+αg(θ).(3)其中φt有多种计算方法,通常被表示为优势函数(adv-antage function),如式(4)所示,作为一个衡量当前动作好坏的标准,如结果为正,证明当前动作a在此策略下要更好,并且之后提高此动作的概率以优化策略,即Aπ(s t,a t)=Qπ(s t,a t)−Vπ(s t),(4)将式(4)代入式(2)得到g(θ)=E[∞∑t=0∇θlogπθ(a t|s t)Aπ(s t,a t)].(5)然而,传统的策略梯度算法在计算式(4)–(5)时会因偏差和方差的问题导致无法准确估计优势函数.以下考虑两种极端情况,以整条轨迹来估计优势函数时虽然不会有偏差,但是会有高方差的问题,即过拟合;假如以单步更新,虽然不会有方差,但会有很大的偏差,即欠拟合.因此,需要在这两种极端情况中间找到一个平衡,但是优势函数估计的偏差是无法避免的,导致算法性能对更新步长十分敏感,假如一次更新步长过大,将可能会导致下次采样完全偏离,也就导致策略偏离到新的位置,从而形成恶性循环.为了解决这个问题,PPO算法[22]在信任域策略优化(trust region pol-icy optimization,TRPO)[23]的基础上被提出.PPO算法在策略梯度这个同策略(on-policy)算法的基础上引入了重要性采样(importance sampling),将其变成了一个近–异策略(near off-policy)的算法.第1期朱晓庆等:融合元学习和PPO 算法的四足机器人运动技能学习方法157E x ∼p (x )[f (x )]=p (x )f (x )d x =q (x )p (x )q (x )f (x )d x =E x ∼q (x )[p (x )q (x )f (x )].(6)上式为重要性采样过程,将式(6)代入式(5)得到ˆg t =ˆEt [πθ(a t |s t )πθold (a t |s t )πθ(a t +1|s t +1)πθold (a t +1|s t +1)...πθ(a T |s T )πθold (a T |s T )∇θlog πθ(a t |s t )log ˆA t ],(7)只计算t 时刻的梯度公式就变为了ˆg t ≈ˆE t [πθ(a t |s t )πθold (a t |s t )∇θlog πθ(a t |s t )log ˆA t]=ˆEt [πθ(a t |s t )πθold (a t |s t )∇θπθ(a t |s t )πθ(a t |s t )ˆA t ]=ˆE t [∇θπθ(a t |s t )πθold (a t |s t )ˆA t ],(8)于是损失函数由式(1)变成了L (θ)=ˆEt [πθ(a t |s t )πθold (a t |s t )ˆA t ].(9)作为一个策略梯度算法的优化算法,PPO 算法核心思想是通过对式(9)引入截断(CLIP)限制更新幅度,让每次策略更新后落在一个信任域(trust region)里,以此解决了更新步长的问题,PPO 算法的损失函数如式(10)所示:L CLIP (θ)=ˆEt [min(πθ(a t |s t )πθold (a t |s t )ˆA t ,CLIP(πθ(a t |s t )πθold (a t |s t ),1−ϵ,1+ϵ)ˆA t )],(10)其中:ϵ是超参数,用来限制更新的大小;ˆAt 是优势函数在t 时刻的估计值.最终训练时的损失函数为L (θ)=ˆEt [L CLIP (θ)+c 1L VF (θ)−c 2S [πθ](s t )],(11)其中:c 1,c 2为系数,S 为熵,L VF (θ)是平方误差损失(V θ(s t )−V targ t )2.与TRPO 算法相比,PPO 的超参数较少,在计算上更易实施,且效果更好,因而受到广泛的关注和应用.2.2MPPOPPO 具有在大部分实验环境中有很好效果且具有较少超参数的优势,但是在多维输入输出场景下,如本文涉及的四足机器人步态学习任务实验环境中的呈现效果一般,且容易收敛到局部最优,表现为步态节律信号杂乱、重心震荡严重.为此,本文在融合元学习的基础上提出了MPPO 算法.不可知模型元学习(Model-agnostic meta-learning,MAML)[24–25]由元学习(meta learning)引出,元学习优势体现在可快速适应新任务且减轻过拟合现象.从特征学习角度看,元学习训练的模型参数通过少量梯度更新能在新任务上产生良好结果,这样的过程可被视为构建广泛适用于许多任务的模型本质表示.从系统角度看,元学习是最大化了新任务损失函数对参数的敏感性:灵敏度较高时,参数微小局部变化可导致任务损失的大幅改善.本文提出的MPPO 算法,融合元学习和强化学习思想,即不是找到当前观测下的最佳参数,而是希望找到可以广泛适应在不同观测上采样到的数据上的参数,让四足机器人遵循奖励函数学习到运动技能的本质,获得更高奖励值并达到更优步态.其算法伪代码和框图如表1和图1所示.表1算法1:MPPO 算法伪代码Table 1Pseudocode输入:49维观测向量.输出:12维动作向量.1初始化θ,θ′,设置α;2while 不收敛do3使用πθ策略和环境交互T 步,收集D =(s,a π(θ),r );4通过策略梯度下降方法更新参数得到预更新参数θ′,即θ′=θ−α∇θL (πθ),其中的L (πθ)使用式(2)计算;5使用πθ′策略和环境交互T 步,收集D =(s,a πθ′,r );6通过梯度策略下降法更新参数θ,即θ=θ−α∇θ′L (πθ′),其中的L (πθ′)使用式(2)计算.7end图1算法框图Fig.1Algorithmic ﬂowchart首先,使用全局策略πθ0和环境交互,并通过PPO算法流程更新参数得到预更新的参数θ′.由于PPO 策略的输出动作是一个高斯分布的采样结果,其训练方向会很多样.通过πθ′0和环境的交互收集到的D 0[s,a (πθ′0),r,s ′]作为数据用于更新策略πθ0,这步可以看作一个有预见性和兼顾性的预更新,对应了元学习中通过单个任务的训练误差更新一个中间策略,通过中间策略把当前任务上的测试误差作为元学习训练误差的更新方法,即不追求当前状态下的最优方向,转为追求下一状态下的优势方向,以此更新全局策略.在本文中,PPO 算法的演员(actor)和评判家(critic)网158控制理论与应用第41卷络都采用了长短期记忆网络(long short-term memory,LSTM)[26]网络,相比多层感知器(multilayer perce-ptron,MLP)网络,其对像机器人系统这样序列型的输入有更好效果.3实验3.1实验对象本文使用pybullet物理引擎[27]在gym环境中进行模拟实验,机器人为Laikago,它是一个12自由度的四足机器人,每条腿有3个自由度,分别为髋关节、摆动关节和膝关节,控制频率为38Hz,其身体参数即各关节活动范围如表2所示.表2四足机器人身体参数Table2Body parameters of quadruped robot身体参数数值正常站立尺寸/cm370×270×295整机质量/kg11关节自由度12髋关节活动范围/(◦)−46∼46摆动关节活动范围/(◦)−60∼240膝关节活动范围/(◦)−154.5∼−52.53.2实验设计为了验证本文MPPO算法有效性,将此算法和传统SAC[28]和PPO算法进行对比实验,实验设置如下.3.2.1奖励函数在强化学习问题中,奖励函数可以被视为智能体的目标,在智能体与环境交互的每一个时刻中,当遇到状态时,智能体会根据策略选择一个动作(action),同时由设定的奖励函数给出一个奖励值(reward),而智能体的目标就是学习到一个策略来最大化这个奖励,因此这个奖励函数的设置应与想达到的训练目标相匹配,通过奖励函数来教会智能体一个期望的技能或者说引导智能体的策略向期望的目标上发展.在本文使用的策略梯度算法中,首先通过设计奖励函数引导价值函数向其靠拢,再通过价值函数来引导策略进行学习,以此让机器人学习到期望的行走技能.为了让机器人学会平稳快速节能的行走,奖励函数被设计为r t=c1r vt +c2r et+c3r st,(12)其中c1,c2和c3为各部分奖励函数之间的系数.r v t =X t−X t−1t,(13)其中:X t代表机器人所处位置,t为根据控制频率计算的时间,这部分的奖励函数是为了鼓励机器人朝着期望的方向前进并且速度越快越好.r e t =E t,(14)其中E t代表能耗,这部分的奖励函数是通过负的系数来促使机器人兼顾能耗问题.r st=[1−tan(roll2+pitch2)],(15)其中:roll和pitch为机器人惯性运动单元(Inertial mo-tion unit,IMU)测得的俯仰角和横滚角,这部分的奖励通过一个双曲正切函数来引导机器人以更平稳的姿态行走.3个系数的大小影响了3个部分在奖励函数中的重要程度,通过这3部分奖励之间相互制约,机器人学会了一种兼顾速度稳定和低能耗的行走技能.3.2.2训练流程本文实验训练流程图如图2,为了加速训练过程,采用类似文献[14]的实验框架,受残差控制[29]启发,机器人输出由a cpg(t)和a mppo(t)两部分构成,a cpg(t)作为运动先验知识输出一个周期性的节律信号[30];a mppo(t)作为残差信号,其中a mppo(t)策略的输入即状态是49维的由3部分构成:1)12维的运动先验中枢模式发生器(Central pattern generators,CPG)参数;2)34维的机器人状态信息,其中24维是机器人的电机信息,包含4条腿的髋关节和2个膝关节的角度和速度,6维机器人质心姿态信息包含位姿(俯仰,横滚,偏航)与加速度(x,y,z3个方向)各3项,4维的足端接触检测信息(检测每条腿是否触地);3)3维机器人的位置信息(包括x,y,z3方向的位移).输出即动作是12维的期望位置,即每条腿髋关节和2个膝关节电机的期望位置,叠加成为机器人的动作输出.动作合成后,由比例微分控制(Proportion Differential,PD)控制器来跟踪实现对机器人腿部的控制.为了维持训练过程稳定性,两部分的动作网络采取分时训练[21]方法,即训练过程中先固定a cpg(t)网络的参数,并根据MPPO的算法流程更新a mppo(t)的参数,当交互够N步后,固定a mppo(t)的网络参数,通过进化算法对a cpg(t)进行更新,当一个网络参数进行训练的时候另一个网络就维持参数停止训练.图2训练流程图Fig.2Trainingﬂowchart4实验结果本实验使用笔记本电脑,训练过程中每2048步作为一个回合(episode),并在每个回合结束时进行一次第1期朱晓庆等:融合元学习和PPO 算法的四足机器人运动技能学习方法159评估.按照第3节设计的实验流程进行训练1,训练过程中四足机器人的行走步态和身体数据如图3–4所示.图3四足机器人行走步态学习过程截图,其中(a)–(c)为训练初期,(d)–(f)为训练末期Fig.3Screenshot of quadruped robot walking gait learningprocess.At the beginning of training,it appears as (a)–(c),and at the end of training,it appears as(d)–(f)图4训练前后机器人行走质心高度变化曲线Fig.4Height variation curve of robot walking centroid beforeand after training从图3(a)–(c)和图4可看出,在训练初期,四足机器人步态非常不稳定,身体上下晃动幅度在0.21到0.26之间,呈现较大且不规律的晃动,无法以正常姿态前进.随着训练进行300个回合时,机器人步态及身体数据如图3(d)–(f)和图4所示.可看出,在训练完成时,机器人的步态呈现出对角接触步态,行走过程中,身体的上下晃动幅度较小,在0.24到0.265之间,高于刚开始训练质心高度并呈现稳定的节律行走步态.该训练结果是由奖励函数中的r st 引导而来.对比上图训练开始时和训练结束时机器人行走质心高度的变化可看出,训练完成后机器人行走时更稳定且质心高度更高.图5是训练过程中的奖励值,从中可以看出,本文提出的MPPO 算法在四足机器人步态学习的任务中,奖励值可以从零开始迅速上升,大约到75个回合时,收敛于4000左右,75回合之间的过程是四足机器人从初始状态开始学习到可以初步完成行走任务的过程,训练回合超过75回合之后,奖励值缓慢上升至4500左右,并且策略趋于收敛,代表完成行走步态的学习.图5训练过程奖励值Fig.5Rewards during training图6和图7分别是机器人行走距离和行走偏移变化曲线,横坐标为回合数量,纵坐标分别为训练过程中机器人行走距离(每一次评估最多进行500步)和行走偏移量,虚线为期望前进方向即0偏移.图6中看出,本文提出的MPPO 算法在行走距离上从零开始迅速上升,在20个回合左右行走距离达到12左右,并在后续逐渐波动,最终在250个回合左右收敛至11.偏移量在100到150回合中逐渐变大,在150到200步逐渐变小,最终在300步时收敛到–0.1左右.这种笔直前进步态是由奖励函数中r vt 部分引导而来的.图6机器人行走距离变化曲线Fig.6Variation curve of robot walking distance1训练过程视频:https:///video/BV1Q24y117EQ/?vd source=4b5a3b89f1ef3731413895e73cb00b7e.160控制理论与应用第41卷图7机器人行走偏移变化曲线Fig.7Variation curve of robot walking deviation5讨论与总结5.1讨论为了验证本文算法的优势,本文将传统的PPO 算法和传统的SAC 算法与MPPO 算法进行了对比试验.在四足机器人步态的全部训练过程中,每2048步设置为一个回合,并且在每一个回合进行一次数据评估,实验结果如图8所示.图8机器人质心高度变化曲线Fig.8Height variation curve of robot centroid训练完成后分别对3种算法训练出的四足机器人步态进行行走测试2,图8为机器人行走测试中质心高度的变化,可看到本文提出的MPPO 算法训练结果相比较于传统PPO 算法和SAC 算法,行走步态质心起伏幅度更小,说明本文提出的MPPO 算法训练出的四足机器人步态有更好的稳定性;并且MPPO 算法训练的行走步态的质心高度整体高于传统的SAC 和PPO 算法,说明其行走步态使机器人在行走过程中拥有腿伸展程度更高,因此可以迈出更大的步伐.图9为训练过程中机器人行走的距离曲线,横坐标为回合数量,纵坐标为评估一次机器人行走的距离(每一次评估最多进行500步).从图中可看出,本文提出的MPPO 算法在行走距离上从零开始迅速上升,在20个回合左右行走距离达到12左右,并在后续逐渐波动,最终在250个回合左右收敛至11.相较于传统SAC 算法在20–50个回合行走距离从6上升至10,最终收敛至9.和传统PPO 算法在前100个回合有起伏最终收敛于6左右的实验结果.在相同上限步数的情况下,行走更远的距离就相当于拥有更大的步长和更快的速度,此实验结论和对比机器人行走时质心高度变化曲线得到结论相同,即本文提出的MPPO 算法训练出的行走步态有更大的行走步长和更好的稳定性.图9训练过程机器人行走距离曲线Fig.9Walking distance curve of robot during training从图10中可看出,本文提出的MPPO 算法在四足机器人步态学习的任务中,奖励上升幅度大,且最终收敛时的奖励值最高.图10训练过程奖励值Fig.10Rewards during training23种算法训练出的机器人步态对比视频:https:///video/BV17M411r7Ri/?vd source=4b5a3b89f1ef3731413895e 73cb00b7e.第1期朱晓庆等:融合元学习和PPO算法的四足机器人运动技能学习方法161而传统的SAC算法在有先验知识的基础上奖励值先降低后缓慢升高,大约到100回合的时候,奖励值收敛到3000左右,训练效果略低于本文提出算法;传统PPO算法在训练过程中奖励值略微升高后就急剧下降,收敛至2000左右陷入局部最优,训练效果远小于传统SAC算法和本文提出的MPPO算法,由上可知,本文所提出的算法在四足机器人步态学习的任务中效果优于传统的强化学习算法.5.2总结为阐明复现高等生物的运动技能学习机理,本文以四足机器人为研究对象,基于强化学习框架探究四足机器人技能学习算法,鉴于元学习具有刻画学习过程高维抽象表征优势,本文将元学习引入了PPO算法,提出了一种MPPO算法.以四足机器人行走步态作为任务进行学习训练,仿真实验结果验证了该算法可行性.本文提出的算法,仅需简单设计的奖励函数即可使得四足机器人学会走起来,并且相比较于SAC算法和PPO算法,本文的MPPO算法不仅在训练速度上有一定的优势,并且训练出的步态也呈现更佳效果,如其质心姿态更加平稳,震荡小.对比实验结果表明本文提出的MPPO算法具有兼顾性和预见性优势,可以解决四足机器人步态学习过程中出现的局部最优问题,学习到更优步态.后续将继续在MPPO算法中引入动态奖励函数,以获得更快的收敛速度以及更稳健的更新方向,并且在多地形任务上进行实验.在完成仿真实验后,将进行实物实验.参考文献:[1]MENG X G,WANG S,CAO Z Q,et al.A review of quadruped robot-s and environment perception.The35th Chinese Control Conference (CCC).Chengdu,China:IEEE,2016:6350–6356.[2]RUBIO F,V ALERO F,LLOPIS A C.A review of mobile robots:Concepts,methods,theoretical framework,and applications.Inter-national Journal of Advanced Robotic Systems,2019,16(2):1–22.[3]ZHONG Y H,WANG R X,FENG H S,et al.Analysis and researchof quadruped robot’s legs:A comprehensive review.International Journal of Advanced Robotic Systems,2019,16(3):1–15.[4]ZHUANG H C,GAO H B,DENG Z Q,et al.A review of heavy-dutylegged robots.Science China Technological Sciences,2014,57(2): 298–314.[5]GONC¸ALVES R S,CARV ALHO J C M.Review and latest trends inmobile robots used on power transmission lines.International Jour-nal of Advanced Robotic Systems,2013,10(12):408.[6]PENG X B,COUMANS E,ZHANG T N,et al.Learning agile robot-ic locomotion skills by imitating animals.ArXiv Preprint,2020:arX-iv:2004.00784.[7]CHEN Guangrong,GUO Sheng,HOU Bowen,et al.Motion controlof redundant hydraulic driven quadruped robot based on extended Ja-cobian matrix.Control Theory&Applications,2021,38(2):213–223.(陈光荣,郭盛,侯博文,等.基于扩展雅可比矩阵的冗余液压驱动四足机器人运动控制.控制理论与应用,2021,38(2):213–223.)[8]SUTTON R S.Introduction:The challenge of reinforcement learn-ing.Reinforcement Learning.Boston,MA:Springer,1992:1–3.[9]WEN Jianwei,ZHANG Li,DUAN Yanduo,et al.Activeﬂoor controlin data center based on model deep reinforcement learning.Control Theory&Applications,2022,39(6):1051–1056.(温建伟,张立,段彦夺,等.基于模型深度强化学习的数据中心主动地板控制.控制理论与应用,2022,39(6):1051–1056.)[10]QIN Rui,ZENG Shuai,LI Juanjuan,et al.Parallel enterprise re-source planning based on deep reinforcement learning.Acta Auto-matica Sinica,2017,43(9):1588–1596.(秦蕊,曾帅,李娟娟,等.基于深度强化学习的平行企业资源计划.自动化学报,2017,43(9):1588–1596.)[11]YU Shengping,HAN Xinchen,YUAN Zhiming,et al.Dynamic trainscheduling method for high speed rail based on strategy gradient rein-forcement learning.Control and Decision,2022,37(9):2407–2417.(俞胜平,韩忻辰,袁志明,等.基于策略梯度强化学习的高铁列车动态调度方法.控制与决策,2022,37(9):2407–2417.)[12]KHAN M A,KHAN M R J,TOOSHIL A,et al.A systematic re-view on reinforcement learning-based robotics within the last decade.IEEE Access,2020,8:176598–176623.[13]ZOU Qijie,LIU Shihui,ZHANG Yue,et al.Rapid exploration ofrandom tree path reprogramming algorithm based on reinforcemen-t learning in special environments.Control Theory&Applications, 2020,37(8):1737–1748.(邹启杰,刘世慧,张跃,等.基于强化学习的快速探索随机树特殊环境中路径重规划算法.控制理论与应用,2020,37(8):1737–1748.)[14]WANG J Y,HU C X,ZHU Y.CPG-based hierarchical locomotioncontrol for modular quadrupedal robots using deep reinforcement learning.IEEE Robotics and Automation Letters,2021,6(4):7193–7200.[15]KOHL N,STONE P.Policy gradient reinforcement learning for fastquadrupedal locomotion.Proceedings of the IEEE International Con-ference on Robotics and Automation.New Orleans,LA:IEEE,2004: 2619–2624.[16]ISCEN A,CALUWAERTS K,TAN J,et al.Policies modulating tra-jectory generators.ArXiv Preprint,2019:arXiv:1910.02812.[17]CHEN D,ZHOU B,KOLTUN V,et al.Learning by cheating.ArXivPreprint,2019:arXiv:1912.12294.[18]HWANGBO J,LEE J,DOSOVITSKIY A,et al.Learning agile anddynamic motor skills for legged robots.Science Robotics,2019, 4(26):eaau5872.[19]LEE J,HWANGBO J,WELLHAUSEN L,et al.Learning quadrupe-dal locomotion over challenging terrain.Science Robotics,2020, 5(47):eabc5986.[20]MIKI T,LEE J,HWANGBO J,et al.Learning robust perceptive lo-comotion for quadrupedal robots in the wild.Science Robotics,2022, 7(62):eabk2822.[21]SHI H J,ZHOU B,ZENG H S,et al.Reinforcement learning withevolutionary trajectory generator:A general approach for quadru-pedal locomotion.IEEE Robotics and Automation Letters,2022,7(2): 3085–3092.[22]SCHULMAN J,WOLSKI F,DHARIWAL P,et al.Proximal policyoptimization algorithms.ArXiv Preprint,2017:arXiv:1707.06347.。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

OptimizingHumanMotionfortheControlofaHumanoidRobotAllaSafonova1,NancyS.Pollard2andJessicaK.Hodgins31ComputerScience,CarnegieMellonUniversity,alla@cs.cmu.edu

2ComputerScience,BrownUniversityandATRHumanInformationScienceLaboratories,Dept3,nsp@cs.brown.edu

3ComputerScience,CarnegieMellonUniversityandATRHumanInformationScienceLaboratories,Dept3,jkh@cs.cmu.edu

AbstractUsingpre-recordedhumanmotionandtrajectorytrack-ing,wecancontrolthemotionofahumanoidrobotforfree-space,upperbodygestures.However,thenumberofdegreesoffreedom,rangeofjointmotion,andachievablejointve-locitiesoftoday’shumanoidrobotsarefarmorelimitedthanthoseoftheaveragehumansubject.Inthispaper,weexploreasetoftechniquesforlimitinghumanmotionofupperbodygesturestothatachievablebyaSarcoshumanoidrobotlo-catedatATR.Inparticularwehavefoundthatitisimportanttopreservetheconﬁgurationofthehandsandheadforupperbodymotion.Weassessthequalityoftheresultsbycompar-ingthemotionofthehumanactortothatoftherobot,bothvisuallyandquantitatively.

1.IntroductionHumanoidrobotsarealreadycommoninthemeparkswheretheinvestmentforanewattractionissubstantial.Tomakehumanoidentertainmentrobotsaviableal-ternativeforsmallerscaleattractionssuchaslocation-basedentertainmentvenues,museums,orrestaurants,weneedeasierwaysofprogrammingtheserobots.En-tertainmentrobotsmusthaveanaturalandentertainingstyleofmotionandoftenrequiresubstantialmotiondatabasestoensurealargevarietyofbehaviors.Forahumanoidrobot,suchastheSarcosrobotatATR(DB)[1]showninFigure1,oneobviousap-proachistodrivethemotionoftherobotwithmotioncapturedatarecordedfromaprofessionalactor.Suchdatawouldcontainthetimingandmanyoftheothersubtleelementsoftheactor’sperformance.However,thecurrentmechanicallimitationsofhumanoidrobotspreventtherecordedmotionfrombeingdirectlyap-plied,unlessthehumanactorsuseonlyafractionoftheirnaturaljointrangeandmovewithslowerveloci-tiesthanthosecommonlyseeninhumanmotion.Toaddresstheselimitations,thelocationofthemarkersinthemotioncapturedataisﬁrstmappedtothedegreesoffreedomoftherobotbyinversekine-

Figure1:TheSarcoshumanoidrobotatATR(DB)trackingmotioncapturedataofahumanactor.

maticsonindividuallimbs.Aconstrainedoptimiza-tiontechniqueisthenusedtoimposejointanglesandvelocitylimitsonthemotionwhileavoidingself-collisions.Optimizationtechniquesallowustotrans-formthemotiontothecapabilitiesofthehumanoidrobotbyspecifyinganobjectivefunctionandasetofconstraintsthatpreservethesalientcharacteristicsoftheoriginalmotion.Therobottracksthetrajectoriesofthetransformeddatausingapositionandvelocitytrackingsystemwithfeedforwardtrajectorylearning.Wetestedthesetechniqueswithfourteenmotionse-quencesfromsevenprofessionalactors.Eachsubjectperformedtothesameaudiotrackofthechildren’ssong,“I’malittleteapot.”Wechosethisselectionbe-causeitwasfamiliarenoughthatmostactorswouldperformthemotioninasimilarbutnotidenticalway.Itwasourhopethatanindividual’sstylewouldbepre-servedthroughthetransformationsnecessarytoallowtherobottoperformthemotion.

2.RelatedWorkTwobodiesofworkarerelevanttothisresearch:roboticsresearchershaveexploredalgorithmsforadaptinghumanmotiontohumanoidandleggedrobotsandresearchersincomputeranimationhaveexploredalgorithmsforadaptinghumanmotiontoanimatedcharacters.Becauseofthedifﬁcultyofanimatingcomplexchar-acters,theanimationcommunityhasdevotedasig-niﬁcantefforttoadaptinghumanmotiontoanimatedcharacters.Inhismotioneditingandretargetingwork,Gleicherchosetoperformatrajectoryoptimizationthatdidnotmaintainphysicalrealisminordertoal-lowinteractiveresponsefortheuser[2,3].Inper-formanceanimationorcomputerpuppetry,incontrast,themotionisapplieddirectlytothecharacterwithouttheopportunityforuserintervention.Shinetal.im-plementedanimportance-basedapproachinwhichtherelativeimportanceofjointanglesforfreespacemove-mentsandendeffectorpositionandorientationwereevaluatedbasedontheproximityofobjectsandonapriorinotationsintheperformer’sscript[4].Theirsys-temwasusedtoanimateacharacterinrealtimeforbroadcast.Zordanuseddatarecordedfromamagneticmotioncapturesetuptodrivetheupperbodymotionofdynamicallysimulatedhumanﬁguresthatboxandplaypingpong[5].Inrobotics,motiondatahasbeenproposedbyDas-guptaandNakamuraasawaytomodifythewalkingpatternofleggedrobotstoappearmorehuman-like[6].Becausebalanceisofprimaryconcerninwalking,theyusedthemotiondatatospecifyatrajectoryforthezeromomentpointinthecontrolsystemratherthanusingitexplicitlyasdesiredjointangles.RileyandhercolleaguesadaptedmotiondatarecordedfromanOptotrakmotioncapturesystemtothesamerobotusedinthiswork[7].Ifthemotionwasoutsideoftherobot’sjointanglelimits,theirsystemtranslatedandscaledthejointtrajectoryglobally.Thisapproachkeptthemagnitudeofthemotionsaslargeaspossiblebutatthecostofperhapsusingapartofthejointspacethatthehumanhadnotused.Inanearlierversionofthiswork[8],weexploredalocalscalingapproachdesignedtopreserveasmuchaspossiblelocalvariationsinthemotion,suchasoscilla-tions.Althoughthisapproachdidproducereasonableresults,weobservedthatlargeerrorsinhandconﬁgu-rationweredistractingandthatscalingintroducedself-collisions.Thecurrentpaperaddressesbothoftheseproblems.