Partial Local FriendQ Multiagent Learning Application to Team Automobile Coordination Probl

合集下载

机器学习——动力学耦合车辆跟驰模型

ｍｅｔｈｏｄｎｄａｃｏｕｐｌｅｄｈｅｔＯｉｐｐｓｍｏｄｅｌａｎｄＢＰｍｏｄｅｌｂａｓｅｄｏｎｂａｃｋｐｒｏｐａｇａｔｉｏｎｎｅｕｒａｌｎｅｗｏｔｒｋｔ０ｅｓｔａｂｌｉｓｈ
ｔｈｅｒｅｒｅｍａｉｎｓｎｏｒｅｓｅａｒｃｈｅｓｏｎｈｅｔｂｕｉｌｄｉｎｇｏｆｃａｒ－ｆｏｌｌｏｗｉｎｇｍｏｄｅｌｂｙｃｏｕｐｌｉｎｇｈｅｔｔｗｏｍｅｔｈｏｄｓ．Ｏｎｔｈｅ
ｂａｓｉｓｏｆｈｅｔｌｉｎｅａｒｃｏｍｂｉｎａｔｉｏｎｆｏｒｅｃａｓｔ，ｔｈｉｓｐａｐｅｒｉｍｐｒｏｖｅｓｈｅｔｏｂｊｅｃｔｉｖｅｆｕｎｃｔｉｏｎｏｆｈｅｔｏｐｔｉｍａｌｗｅｉｇｈｔｉｎｇ
ＥｎｇｉｎｅｅｉｒｎｇＣｅｎｔｅｒｏｆＡｎｈｕｉＰｒｏｖｉｎｃｅ，Ｓｕｚｈｏｕ２３４０００，Ａｎｈｕｉ，Ｃｈｉｎａ）
Ａｂｓｔｒａｃｔ：Ｓｏｆａｒ，ｔｈｅｃａｒ－ｆｏｌｌｏｗｉｎｇｍｏｄｅｌｉｓｍｏｓｔｌｙｂｕｉｌｔｂｙｄｙｎａｍｉｃａｎｄｍａｃｈｎｅｉｌｅａｒｎｉｎｇａｌｇｏｉｔｒｈｍｓ，

自动驾驶汽车控制系统参数辨识与学习

ISSN 1674-8484CN 11-5904/U汽车安全与节能学报, 第9卷第2期, 2018年J Automotive Safety and Energy, Vol. 9 No. 2, 2018Identification and Learning in Autonomous V ehicle Control SystemsWANG Leyi 1, George Yin 2, ZHAO Guangliang 3, LI Shengbo 4, Xu Biao 4, LI Keqiang 4(1. Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI 48202, USA; 2. Department of Mathematics, Wayne State University, Detroit, MI 48202, USA; 3. GE Global Research Niskayuna, NY 12309, USA;4. Department of Automotive Engineering Tsinghua University, Beijing 100084, China)Abstract: System parameters of autonomous vehicles need to be identified and learned during operation to solve the problem that autonomous vehicles encounter many uncertainties that change with time, operating conditions, and environments. By capturing system behavior in a closed-loop setting and using data to learn the related parameters, system reliability and robustness can be quantitatively established. This paper focuses on a basic scenario of an autonomous vehicle following its front vehicle. By integrating control actions with vehicle dynamics, a learning algorithm using operational data and confidence ellipsoids was employed to support robustness and reliability. A simulation case study was used to illustrate the strategies. The results show the proposed method can estimate the vehicle’s parameters accurately.Key words: vehicle control; autonomous vehicle; identification of parameters; learning; robustness自动驾驶汽车控制系统参数辨识与学习（英文）王乐一1，殷刚2，赵广亮3，李升波4，徐彪4，李克强4（1.韦恩州立大学电气与计算机工程系，底特律市 MI 48202，美国；2.韦恩州立大学数学系，底特律市 MI 48202，美国；3.通用电气公司全球科研中心，尼斯卡于纳市 MI 48202，美国；4.清华大学汽车工程系，北京 100084，中国）摘要：针对自动驾驶汽车在行驶过程中会遇到随时间和交通环境变化的不确定性，须对自动驾驶系统参数进行辨识和学习。

基于深度学习的多材料结构拓扑优化方法

第50卷第7期2022年7月同济大学学报（自然科学版）

JOURNALOFTONGJIUNIVERSITY（NATURALSCIENCE）

Vol.50No.7

Jul.2022

论文拓展介绍

基于深度学习的多材料结构拓扑优化方法项程，陈艾荣（同济大学土木工程学院，上海200092）

摘要：提出了一种基于深度卷积神经网络（CNN）的多材料结构拓扑优化方法，实现在不需要任何迭代分析的情况下，在极短的时间内预测出多材料优化结构。研究中，采用了流行的U-Net网络结构，以提高神经网络的边界提取能力。通过有序多材料SIMP（各向同性实材料惩罚密度法）插值方法（OrderedSIMP）生成随机加载条件、质量分数及成本分数下的多材料优化结构数据集，训练得到深度学习神经网络。将所提出方法的效率和精度与传统算法进行比较，对该方法的性能进行评价，结果表明，该方法在几乎不牺牲设计方案性能的前提下，显著降低计算成本。该方法对于拓扑优化在未来多材料结构设计实践中具有巨大潜力和广阔应用前景。

关键词：深度学习；卷积神经网络；多材料设计；拓扑优化中图分类号：TP181；O342文献标志码：A

TopologyOptimizationofMulti-materialStructuresBasedonDeepLearning

XIANGCheng，CHENAirong（CollegeofCivilEngineering，TongjiUniversity，Shanghai200092，China）

Abstract：Atopologyoptimizationmethodofmulti-materialstructurebasedondeepconvolutionneuralnetwork（CNN）isproposed，whichcanpredicttheoptimizedstructureofmulti-materialinaveryshorttimewithoutanyiteration.ThepopularU-Netnetworkstructureisadoptedtoimprovetheedgeextractionabilityofneuralnetwork.Totrainthenetwork，theorderedmulti-materialSIMP（isotropicrealmaterialpenaltydensitymethod）interpolationmethod（OrderedSIMP）isusedtogeneratemulti-materialoptimalstructuredatasetsunderrandomloadingconditions，massfractionandcostfraction.Theefficiencyandaccuracyoftheproposedmethodarecomparedwithtraditionalalgorithms，andthe

群体智能算法在汽车零部件优化设计中的应用

群体智能算法在汽车零部件优化设计中的应用以群体智能算法在汽车零部件优化设计中的应用为主题，本文将介绍群体智能算法的基本概念和原理，并探讨其在汽车零部件优化设计中的具体应用。

一、群体智能算法概述群体智能算法是一种模拟自然界生物群体行为的计算方法，其基本原理是通过模拟群体中个体之间的相互作用和信息交流，实现问题的求解。

群体智能算法包括蚁群算法、粒子群优化算法、遗传算法等。

二、群体智能算法在汽车零部件优化设计中的应用1. 蚁群算法在汽车零部件布局优化中的应用蚁群算法模拟了蚂蚁在寻找食物时的行为，通过蚁群中蚂蚁之间的信息传递和协作，找到最优解。

在汽车零部件布局优化中，可以利用蚁群算法确定最佳的零部件布局方案，以达到最小化汽车重量、降低能耗和提高整车性能的目的。

2. 粒子群优化算法在汽车零部件参数优化中的应用粒子群优化算法模拟了鸟群觅食时的行为，通过每个粒子的位置和速度的调整，找到最优解。

在汽车零部件参数优化中，可以利用粒子群优化算法来确定最佳的零部件参数配置，以提高汽车的性能、降低能耗和减少排放。

3. 遗传算法在汽车零部件拓扑优化中的应用遗传算法模拟了生物进化的过程，通过选择、交叉和变异等操作，找到最优解。

在汽车零部件拓扑优化中，可以利用遗传算法来确定最佳的零部件拓扑结构，以提高零部件的强度、降低重量和改善汽车的安全性能。

三、群体智能算法在汽车零部件优化设计中的优势1. 全局搜索能力强：群体智能算法能够同时搜索多个解空间，有助于找到全局最优解。

2. 适应性强：群体智能算法能够自适应地调整搜索策略，适应不同问题的求解。

3. 并行性好：群体智能算法具有良好的并行性，能够利用多核处理器的优势，加快求解速度。

4. 鲁棒性高：群体智能算法能够在解空间中进行多次搜索，提高了算法的鲁棒性，能够应对复杂多变的问题。

四、总结群体智能算法在汽车零部件优化设计中具有重要的应用价值。

通过模拟群体中个体之间的相互作用和信息交流，群体智能算法能够找到最优的汽车零部件布局方案、参数配置方案和拓扑结构方案，以提高汽车的性能、降低能耗和改善安全性能。

考虑动态拥堵的多车型绿色车辆路径问题优化

Sept 2021Vol. 42 No. 92021年9月第42卷第9期计算机工程与设计COMPUTER ENGINEERING AND DESIGN考虑动态拥堵的多车型绿色车辆路径问题优化狄卫民，杜慧莉+,张鹏阁(郑州大学管理工程学院，河南郑州450001)摘要：为降低物流配送成本，促进碳减排，提出一种考虑动态拥堵的多车型绿色车辆路径优化方法。

针对常发性道路拥堵状况，将配送时间划分为若干时段，以道路拥堵系数反映不同时段的拥堵状况，同时考虑到碳排放、多车型和客户时间窗的影响，建立以系统总成本最小为目标的绿色车辆路径优化模型，设计求解模型的头脑风暴优化算法。

结合算例，对该问题进行仿真，将结果与遗传算法进行对比，验证了模型的可行性和算法的有效性，表明考虑多车型配送和动态拥堵可以有效降低系统成本。

关键词：绿色车辆路径问题；动态拥堵；碳排放；多车型；头脑风暴优化算法中图法分类号：TP391. 9 文献标识号：A 文章编号：1000-7024 (2021) 09-2614-07doi ： 10.16208/j. issnl000-7024. 2021. 09. 028Optimization of multi-vehicle green vehicle routing problemconsidering dynamic congestionDI Wei-min, DU Hui-li + , ZHANG Peng-ge(School of Management Engineering, Zhengzhou University, Zhengzhou 450001, China)Abstract ： To reduce logistics distribution cost and promote carbon reduction, an optimization method for multi-vehicle green vehicle routing problem considering dynamic congestion was proposed. Aiming at frequent road congestion conditions? thedelivery time was divided into several time periods, and the road congestion coefficient was used to reflect the congestion condi tions at different time periods. Considering the impact of carbon emission, multi-vehicle types and customer time windows simul taneously? a green vehicle routing optimization model with the goal of minimizing the total system cost was established and abrain storm optimization algorithm was designed. This problem was simulated with a calculation example? and the result was compared with genetic algorithm to demonstrate the feasibility of the model and the effectiveness o£ the algorithm. The deductionis obtained simultaneously considering multi-vehicle distribution and dynamic congestion can reduce system cost effectively.Key words ： green vehicle routing problem ； dynamic congestion; carbon emission; multi-vehicle; brain storm optimization algo rithm0引言在规划配送车辆路线时，既要追求经济目标，又要注重环境影响E,因此，绿色车辆路径问题（green vehiclerouting problem, GVRP ）引起了学者们的关注。

基于多智能体团队强化学习的交通信号控制

第２２卷第２期２１年６月０１
广西工学院学报
ＪＯＵＲＮＡＬＯＦＧＵＡＮＧＸＩＵＮⅣ ＥＲＳ ⅡＹＥＣＨＮＯＬ０ＦＩ０ＧＹ
Ｖ０．２１Ｎｏ２２．Ｊ）２００．５０４６１（００．０１１０
上优于基于单个交叉口的控制策略。
１强化学习
１１单智能体强化学习．Ｑ学习作为最为广泛应用的强化学习方法，在Ｍａｋｖ策过程模型下的一种学习［是ｒｏ决引．定义１一个Ｍａｋｖ决策过程（Ｐ是一个多元组（Ａ，）这里Ｓ是离散的状态空间集，是离ｒｏＭＤ）Ｓ，Ｒ，，Ａ散的行动空间集，ＳＡ× Ｒ：ｘ．是智能体（ｇｎ）报酬函数，ＳＡｘ［１是状态转移分布函数．Ａｅｔ的：ｘ．０，］Ｓ
念模型．灵犀等［立了两相邻路口的分层递阶模糊控制模型．海涛等利用递归建模和改进的贝叶斯李２］建欧学习相结合的方法。决了简单交通网络的交通控制问题．ｅｉｇ等㈨用合作多智能体强化学习算法进行解Ｗｉｎｒ
基于多智能体团队强化学习的交通信号控制
李春贵，坚和，自广，萌，周孙王张增芳
（西工学院计算机工程系，广广西柳州５５０）４０６
摘
要：市的区域交通信号协调系统是一个十分复杂的系统，以建立准确的数学模型，过引入主一式团队强城难通从

基于局部加权k近邻的多机器人系统异步互增强学习

ＹａｇＹｕｑａＨａｉｎｅｕｎｎＦｅＪｎＬｕｉＮｉＣｈｎｕｂｏＣａｉｉｎｔＺｈｑａｇＺｈａｇＴｉｎｉｇｎａｐｎ
（ＣｏｌｅｏｆｒｔｎＥｇｎｅｉｇａｇｈｕＵｎｖｒｔＹａｇｕ２５０Ｃｉａｌｇｆｎｏｍａｏｎｉｅｒ，Ｙｎｚｏｉｅｓｙ，ｎ￣ｏ２０９，ｈｎ）ｅＩｉｎｉ（ｔｔＫｙＬｂｒｔｒｆｎｇｍｅｔｄＣｎｒｌｏｏｌｙｔ，ＩｓｔｔｆｕｏｔｎＳａｅｅａｏａｙｏａｅｎｏｔｒＣｍｐｅＳｓｍｓｎｔｕｅｏｔｍａｉ，ｏＭａｎａｏｆｘｅｉＡｏ
ＣｈｎｓａｅｆＳｉｎｅ．Ｂｅｌｇ１０１０．Ｃｈｎ）ｉｅｅＡｃｄｍｙｏｃｅｃｓｌｉ０９ｎｉａ
Ａｂｓｒｃｔａｔ：Ｔｏａｃｌｒｔｈｅｒｉｇｓｅｄｏｏｔｏｕｌ．ｏｏｙｔｍｓａｄｍａｅｆ１ｕｅｏｘｃｅｅａｅｔｅｌａｎｎｐｅｆｒｂｏｓｆｒｍｔｒｂｔｓｓｅｎｋｕ１ｓｆｅ — ｉｐｒｅｃｎｅｕｔｆｏｈｒｒｂｔｎｔｅｃｍｍｕｉａｉｎｄｍａｎ．ｔｏｋｎｆｍｕｔ—ｏｔ１ａｎｉｅｉｎｅａｄｒｓｌｓｏｔｅｏｏｓｉｈｏｎｃｔｏｏｉｗｉｄｓｏｌｉｒｂｏｅｒｎｇ
ＳｐＩｕ（）
Ｓｐ．２２ｅｔ０１
ｄｉ１．９９ｊｉｎ１０ — ５５２１．１０２ｏ：０３６／．ｓ．０１００．０２Ｓ．４ｓ

基于小波特征与注意力机制结合的卷积网络车辆重识别

2022⁃06⁃10计算机应用,Journal of Computer Applications2022,42(6):1876-1883ISSN 1001⁃9081CODEN JYIIDU http ：//基于小波特征与注意力机制结合的卷积网络车辆重识别廖光锴1，张正1，宋治国2*（1.吉首大学信息科学与工程学院，湖南吉首416000；2.吉首大学物理与机电工程学院，湖南吉首416000）（∗通信作者电子邮箱zhiguos@ ）摘要：针对现有的基于卷积神经网络（CNN ）的车辆重识别方法所提取的特征表达力不足的问题，提出一种基于小波特征与注意力机制相结合的车辆重识别方法。

首先，将单层小波模块嵌入到卷积模块中代替池化层进行下采样，减少细粒度特征的丢失；其次，结合通道注意力（CA ）机制和像素注意力（PA ）机制提出一种新的局部注意力模块——特征提取模块（FEM ）嵌入到卷积网络中，对关键信息进行加权强化。

在VeRi 数据集上与基准残差网络ResNet -50、ResNet -101进行对比。

实验结果表明，在ResNet -50中增加小波变换层数能提高平均精度均值（mAP ）；在消融实验中，虽然ResNet -50+离散小波变换（DWT ）比ResNet -101的mAP 降低了0.25个百分点，但是其参数量和计算复杂度都比ResNet -101低，且mAP 、Rank -1和Rank -5均比单独的ResNet -50高，说明该模型在车辆重识别中能够有效提高车辆检索精度。

关键词：车辆重识别；通道注意力；像素注意力；小波变换；卷积神经网络中图分类号：TP 391.41文献标志码：AConvolutional network -based vehicle re -identification combiningwavelet features and attention mechanismLIAO Guangkai 1，ZHANG Zheng 1，SONG Zhiguo 2*（1.College of Information Science and Engineering ，Jishou University ，Jishou Hunan 416000，China ；2.College of Physics and Mechanical and Electrical Engineering ，Jishou University ，Jishou Hunan 416000，China ）Abstract:Aiming at the problem of insufficient representation ability of features extracted by the existing vehicle re -identification methods based on convolution Neural Network （CNN ），a vehicle re -identification method based on thecombination of wavelet features and attention mechanism was proposed.Firstly ，the single -layer wavelet module was embedded in the convolution module to replace the pooling layer for subsampling ，thereby reducing the loss of fine -grained features.Secondly ，a new local attention module named Feature Extraction Module （FEM ）was put forward by combining Channel Attention （CA ）mechanism and Pixel Attention （PA ）mechanism ，which was embedded into CNN to weight and strengthen the key parison experiments with the benchmark residual convolutional network ResNet -50and ResNet -101were conducted on VeRi dataset.Experimental results show that increasing the number of wavelet decompositionlayers in ResNet -50can improve mean Average Precision （mAP ）.In the ablation experiment ，although ResNet -50+Discrete Wavelet Transform （DWT ）has the mAP reduced by 0.25percentage points compared with ResNet -101，it has the numberof parameters and computational complexity lower than those of ResNet -101，and has the mAP ，Rank -1and Rank -5higherthan those of ResNet -50without DWT ，verifying that the proposed model can effectively improve the accuracy of vehicle retrieval in vehicle re -identification.Key words:vehicle re -identification;Channel Attention (CA);Pixel Attention (PA);wavelet transform;ConvolutionalNeural Network (CNN)引言近年来，随着城市智能交通系统与公安系统的快速发展，视频监控在交通控制和安全方面发挥着越来越重要的作用。

基于三维注意力机制的车辆重识别算法

文章编号：

1671 -4598(2022)07 -0194 -07 DOI：10. 16526/j. cnki. 11-4762/tp. 2022. 07. 029 中图分类号:TP3 文献标识码：

计算机测量与控制.2022. 30(7)・194・

Computer Measurement &

Control

k k 乂乂 J: J ” ■ J: J J .<■ .<■ Jr

基于三维注意力机制的车辆重识别算法方芳策1,楂宏诬2,谢兩底1,刻搐噸

(1.中国海洋大学信息科学与工程学部，山东青岛

266100

；

2.中国运载火箭技术研究院研究发展部，

北京

100091)

摘要：为解决套牌车识别难度大的问题，通过深度学习的技术，基于

ResNet-50,结合通道注意力机制和位置注意力机制，

设计了一种三维注意力机制对近似车辆进行精确识别；解决了当前大部分注意力算法都关注于一维的通道注意力和二维的位置注意力，而处理的图像数据是三维的，不能将注意力集中在所有需要关注的区域

，造成部分关键信息遗失的问题；该三维注意力机

制在多种视觉任务下均有很好的效果，在CifarlOO数据集上，相比SENet有1. 12%的提升，在PKU VehiclelD数据集上，相比

SENet平均有2 %的提升。

关键词：

注意力机制；深度学习；套牌车识别；交通管理；

Vehicle Recognition Algorithm

Based

on 3D Attention

Mechanism

FANG Yance1, ZHANG Hongjiang2, XIE Yucheng1,

LIU

Peishun1

(1. College of Information Science and Engineering,

Ocean University of China,

Qingdao

226019,

China

；

2. R&D Department, China Academy of Launch Vehicle Technology, Beijing 100076, China)

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Partial Local FriendQ Multiagent Learning:Application to Team Automobile Coordination Problem1Julien Laumonier and Brahim Chaib-draa21IntroductionIn real world cooperative multiagent problems,each agent has of-ten a partial view of the environment.If communication has a cost,the multiagent system designer has toﬁnd a compromise between in-creasing the observability and total cost of the multiagent system.Tochoose a compromise,we propose,to take into account the degreeof observability,deﬁned as the agent’s vision distance,for a coop-erative multiagent system by measuring the performance of the as-sociated learned policy.Obviously,decreasing observability reducesthe number of accessible states for agents and therefore decreases theperformance of the policy.We restrict our application to team game,a subclass of coordination problems where all agents have the sameutility function.We consider problems where agents’designer doesnot know the model of the world.Thus,we can use learning algo-rithms which have been proven to converge to Pareto-optimal equi-librium such as Friend Q-learning[4].One can take an optimal al-gorithm toﬁnd the policy for the observable problem.The followingassumptions are deﬁned:(1)Mutually exclusive observations,eachagent sees a partial view of the real state but all agents together seethe real state.(2)Possible communication between agents but notconsidered as an explicit part of the decision making.(3)Only nega-tive interactions between agents.One problem which meets these as-sumptions is the choosing lane decision problem related to IntelligentTransportation Systems which aims to reduce congestion,pollution,stress and increase safety of the trafﬁc.2Formal Model and AlgorithmsReinforcement learning allows an agent to learn by interactingwith its environment.For a mono agent system,the basic formalmodel for reinforcement learning is a Markov decision -ing this model,Q-Learning algorithm calculates the optimal valuesof the expected reward for the agent in a state s if the action a is ex-ecuted.On the other hand,game theory studies formally the interac-tions of rational agents.In a one-stage game,each agent has to choosean action to maximize its own utility which depends on the others’actions.In game theory,the main solution concept is the Nash equi-librium which is the best response for all agents.A solution is Paretooptimal if there does not exist any other solution such that one agentcan improve its reward without decreasing the reward of another.Themodel which combines reinforcement learning and game theory,isthe number of joint actions which has to be tested during the learning.This partial local observability allow us to consider a variable number of agents in the multiagent system.Formally,we deﬁne a function g iwhich transforms the joint action a into a partial joint action g i d ( a ,s ).This partial joint action contains all actions of agent which are in the distance d of agent i .The PJA Q-value update function is :Q (f i d (s ),g i d ( a ,s ))=(1−α)Q (f i d (s ),g i d ( a ,s ))+α[r +γmax a d ∈G i d ( A,S )Q (f i d (s ′), a d )]5ResultsWe compare empirically the performance of the totally observ-able problem (FriendQ)and the performance of approximated policy (PJA)on a small problem deﬁned by size X =3,Y =7with 3agents.Figure 1shows that for d =0...2,PJA converges to a local maximum,which increases with d .In these cases,the approximated values are respectively about 76%,86%and 97%from the optimal value given by Friend Q-Learning.When d =3,that is,when the lo-cal view is equivalent to the totally observable view,the average sum rewards converges to the total sum rewards of Friend Q-learning.0 10000 20000 30000 40000 50000S u m R e w a r d EpisodesFigure 1.Rewards for Partial Joint Action Q-learningThe generalization of these results can be done applying PJA on larger problems.Calculating the near optimal policy,using an ap-proximated optimal distance d app ,can be intractable if we need to compare with the results of Friend Q-Learning.We calculate the ra-tio DS =XY /N which represents the degree of space for each agent.As we study a problem where the team of agent has to handle only negative interactions,the higher the ratio,the more space agents have.We compare the performance of PJA algorithm for different ratios.To discover a relation between the ratio DS and the value of d app ,we compare the link between DS and d app。