On the Non-Existence of a Universal Learning Algorithm for Recurrent Neural Networks

合集下载

深度学习研究综述

的第ｉ个神经元被激活函数作用之前的值，Ｗｌｊｉ是第ｌ层的
第ｊ个神经元与第ｌ＋１层的第ｉ个神经元之间的权重，ｂｌｉ
是偏置，ｆ（·）是非线性激活函数，常见的有径向基函数、
ＲｅＬＵ、ＰＲｅＬＵ、Ｔａｎｈ、Ｓｉｇｍｏｉｄ等．
如果采用均方误差（ｍｅａｎｓｑｕａｒｅｄｅｒｒｏｒ），则损失函数为
∑ Ｊ＝
Ｋｅｙｗｏｒｄｓ
ｄｅｅｐｌｅａｒｎｉｎｇ；ｎｅｕｒａｌｎｅｔｗｏｒｋ；ｍａｃｈｉｎｅｌｅａｒｎｉｎｇ；ａｒｔｉｆｉｃｉａｌｉｎｔｅｌｌｉｇｅｎｃｅ；ｃｏｎｖｏｌｕｔｉｏｎａｌｎｅｕｒａｌｎｅｔｗｏｒｋ；ｒｅｃｕｒｒｅｎｔｎｅｕｒａｌｎｅｔｗｏｒｋ
０引言
２０１６年３月，“人工智能”一词被写入中国“十三五” 规划纲要，２０１６年１０月美国政府发布《美国国家人工智能研究与发展战略规划》文件．Ｇｏｏｇｌｅ、大对人工智能的投入．各类人工智能创业公司层出不穷，各种人工智能应用逐渐改变人类的生活．深度学习是目前人工智能的重点研究领域之一，应用于人工智能的众多领域，包括语音处理、计算机视觉、自然语言处理等．
适合处理空间数据，在计算机视觉领域应用广泛．一维卷
积神经网络也被称为时间延迟神经网络（ｔｉｍｅｄｅｌａｙｎｅｕｒａｌｎｅｔｗｏｒｋ），可以用来处理一维数据．ＣＮＮ的设计思想受到了视觉神经科学的启发，主要由卷积层（ｃｏｎｖｏｌｕｔｉｏｎａｌｌａｙｅｒ）和池化层（ｐｏｏｌｉｎｇｌａｙｅｒ）组成．卷积层能够保持图像的空间连续性，能将图像的局部特征提取出来．池化层可以采用最大池化（ｍａｘｐｏｏｌｉｎｇ）或平均池化（ｍｅａｎｐｏｏｌｉｎｇ），池化层能降低中间隐藏层的维度，减少接下来各层的运算量，并提供了旋转不变性．卷积与池化操作示意图如图３所示，图中采用３×３的卷积核和２×２的ｐｏｏｌｉｎｇ．

循环神经网络(RNN, Recurrent Neural Networks)介绍

循环神经网络(RNN, Recurrent Neural Networks)介绍标签：递归神经网络RNN神经网络LSTMCW-RNN2015-09-23 13:24 25873人阅读评论(13) 收藏举报分类：数据挖掘与机器学习（23）版权声明：未经许可, 不能转载目录(?)[+]循环神经网络(RNN, Recurrent Neural Networks)介绍这篇文章很多内容是参考：/2015/09/recurrent-neural-networks-tutorial-part-1-introd uction-to-rnns/，在这篇文章中，加入了一些新的内容与一些自己的理解。

循环神经网络(Recurrent Neural Networks，RNNs)已经在众多自然语言处理(Natural Language Processing, NLP)中取得了巨大成功以及广泛应用。

但是，目前网上与RNNs有关的学习资料很少，因此该系列便是介绍RNNs的原理以及如何实现。

主要分成以下几个部分对RNNs进行介绍：1. RNNs的基本介绍以及一些常见的RNNs(本文内容)；2. 详细介绍RNNs中一些经常使用的训练算法，如Back Propagation ThroughTime(BPTT)、Real-time Recurrent Learning(RTRL)、Extended Kalman Filter(EKF)等学习算法，以及梯度消失问题(vanishing gradient problem)3. 详细介绍Long Short-Term Memory(LSTM，长短时记忆网络)；4. 详细介绍Clockwork RNNs(CW-RNNs，时钟频率驱动循环神经网络)；5. 基于Python和Theano对RNNs进行实现，包括一些常见的RNNs模型。

不同于传统的FNNs(Feed-forward Neural Networks，前向反馈神经网络)，RNNs 引入了定向循环，能够处理那些输入之间前后关联的问题。

基于可变时序移位Transformer-LSTM的集成学习矿压预测方法

基于可变时序移位Transformer−LSTM 的集成学习矿压预测方法李泽西（西安科技大学通信与信息工程学院，陕西西安　710054）摘要：现有的矿压预测模型多为依赖固定长度时序序列特征的单一预测模型，难以准确捕捉矿压时序数据的复合特征，影响矿压预测的准确度。

针对该问题，提出一种基于可变时序移位Transformer−长短时记忆（LSTM ）的集成学习矿压预测方法。

基于拉依达准则和拉格朗日插值法，剔除矿压监测数据中的异常值，插入缺失值，并进行归一化预处理；提出可变时序移位策略，划分不同尺度的矿压时序数据，避免固定长度时序序列可能存在的数据偏移问题；在此基础上，构建基于Transformer−LSTM 的集成学习矿压预测模型，通过结合注意力机制和准确的时序特征表示能力，多层次捕捉矿压变化规律的动态特征，采用集成学习的投票算法，联合预测矿压数据，克服单一预测模型的局限性。

实验结果表明：采用集成学习的投票算法可降低矿压预测平均绝对误差（MAE ）和均方根误差（RMSE ）的波动性，有效减小不同尺度特征序列对矿压预测结果的敏感性影响；Transformer−LSTM 模型在2个综采工作面顶板矿压数据集上预测结果的MAE 较Transformer 模型分别提高了8.9%和9.5%，RMSE 分别提高了12.7%和16.5%，且高于反向传播（BP ）神经网络模型和LSTM 模型，有效提升了矿压预测准确度。

关键词：矿压预测；可变时序移位；Transformer−LSTM 模型；集成学习；投票算法中图分类号：TD323 文献标志码：AEnsemble learning mine pressure prediction method based onvariable time series shift Transformer-LSTMLI Zexi(College of Communication and Information Technology, Xi'an University ofScience and Technology, Xi'an 710054, China)Abstract : The existing mine pressure prediction models are mostly single prediction models that rely on fixed length time series features. It is difficult to accurately capture the composite features of mine pressure time series data, which affects the accuracy of mine pressure prediction. To solve this problem, an ensemble learning mine pressure prediction method based on variable time series shift Transformer-long short-time memory (LSTM)is proposed. Based on the Laida criterion and Lagrange polynomial method, the outlier values in the mine pressure monitoring data are eliminated, and the missing values are inserted. Then normalized preprocessing is performed.The paper proposes a variable time series shift strategy to divide mine pressure time series data at different scales.It avoids potential data shift issues that may exist in fixed length time series. On this basis, the ensemble learning mine pressure prediction model based on Transformer-LSTM is constructed. By combining the attention mechanism and the accurate time series feature representation capability, the dynamic features of the mine pressure change law are captured at multiple levels. The voting algorithm of ensemble learning is used to jointly predict the mine pressure data to overcome the limitations of a single prediction model. The experimental results show that the voting algorithm of ensemble learning can reduce the volatility of mean absolute error (MAE) and收稿日期：2023-05-10；修回日期：2023-07-10；责任编辑：李明。

深度学习常见的专业术语

深度学习常见的专业术语（部分内容转载⾃⽹络，有修改）1. 激活函数（Activation Function）为了让神经⽹络能够学习复杂的决策边界（decision boundary），我们在其⼀些层应⽤⼀个⾮线性激活函数。

最常⽤的函数包括sigmoid、tanh、ReLU（Rectified Linear Unit 线性修正单元）以及这些函数的变体。

2. 优化器3. 仿射层（Affine Layer）神经⽹络中的⼀个全连接层。

仿射（Affine）的意思是前⾯⼀层中的每⼀个神经元都连接到当前层中的每⼀个神经元。

在许多⽅⾯，这是神经⽹络的「标准」层。

仿射层通常被加在卷积神经⽹络或循环神经⽹络做出最终预测前的输出的顶层。

仿射层的⼀般形式为 y = f(Wx + b)，其中 x 是层输⼊，w 是参数，b 是⼀个偏差⽮量，f 是⼀个⾮线性激活函数。

4.5. AlexnetAlexnet 是⼀种卷积神经⽹络架构的名字，这种架构曾在 2012 年 ILSVRC 挑战赛中以巨⼤优势获胜，⽽且它还导致了⼈们对⽤于图像识别的卷积神经⽹络（CNN）的兴趣的复苏。

它由 5 个卷积层组成。

其中⼀些后⾯跟随着最⼤池化（max-pooling）层和带有最终1000条路径的 softmax (1000-way softmax)的 3个全连接层。

Alexnet 被引⼊到了使⽤深度卷积神经⽹络的 ImageNet 分类中。

6. ⾃编码器（Autoencoder）⾃编码器是⼀种神经⽹络模型，它的⽬标是预测输⼊⾃⾝，这通常通过⽹络中某个地⽅的「瓶颈（bottleneck）」实现。

通过引⼊瓶颈，我们迫使⽹络学习输⼊更低维度的表征，从⽽有效地将输⼊压缩成⼀个好的表征。

⾃编码器和 PCA 等降维技术相关，但因为它们的⾮线性本质，它们可以学习更为复杂的映射。

⽬前已有⼀些范围涵盖较⼴的⾃编码器存在，包括降噪⾃编码器（DenoisingAutoencoders）、变⾃编码器（Variational Autoencoders）和序列⾃编码器（Sequence Autoencoders）。

基于MF-LSTM的城市电动汽车集中充电负荷可调潜力评估

电气传动2023年第53卷第8期ELECTRIC DRIVE 2023Vol.53No.8摘要：在新型电力系统背景下，电网需求侧可调控资源对于系统稳定的重要性日益提升。

电动汽车作为重要的可调度负荷资源，对其可调度潜力进行准确评估，能有效提升电网安全稳定运行能力。

现有研究较少考虑电动汽车用户行为偏好对电网负荷调控的影响，因此，提出一种考虑用户充电偏好的电动汽车集中式电站可调潜力评估方法。

考虑电动汽车充电时的外部条件与自身行为偏好因素，建立基于隶属度函数的用户充电行为模型，并结合长短期记忆神经网络算法对充电站的可调潜力进行评估。

最后，通过实际充电站算例，分析了电动汽车用户与负荷可调度潜力之间的耦合关系，验证了所提方法对负荷可调控容量评估的有效性，为电动汽车可调负荷参与削峰填谷等需求响应服务提供了理论支撑。

关键词：电动汽车；调度潜力；用户行为；隶属度函数；长短期记忆神经网络中图分类号：TM715文献标识码：ADOI ：10.19457/j.1001-2095.dqcd24400Evaluation of Adjustable Potential of Urban Electric Vehicle CentralizedCharging Load Based on MF -LSTMPAN Lingling 1，ZHUANG Weijin 1，ZHAO Qi 2，TIAN Jiang 2（1.China Electric Power Research Institute Co.，Ltd.，Nanjing 210000，Jiangsu ，China ；2.Suzhou Power Supply Branch ，State Grid Jiangsu Electric Power Co.，Ltd.，Suzhou 215000，Jiangsu ，China ）Abstract:In the background of the new power system ，the importance of demand-side dispatchable resources of the grid for system stability is increasing.As an important dispatchable load resource ，an accurate assessment of electric vehicle （EV ）dispatchable potential can effectively improve the safety and stability of the grid.Existing research has rarely considered the impact of EV user behavior preferences on grid load regulation.Therefore ，a method for evaluating the adjustable potential of EV centralized power stations considering user charging preferences was proposed.The user charging behavior model based on the membership function （MF ）was established considering external conditions and their own behavioral preferences when charging EVs.And the long short-term memory （LSTM ）neural network algorithm was combined with MF to evaluate the adjustable potential of charging stations.Finally ，the coupling relationship between EV users and load dispatchable potential was analyzed through actual charging station calculations ，which verifies the effectiveness of the proposed method for load dispatchable capacity assessment and provides theoretical support for EV adjustable load participation in demand response services such as peak shaving and valley filling.Key words:electric vehicle （EV ）；scheduling potential ；user behavior ；membership function （MF ）；long short-term memory （LSTM ）neural network基金项目：国家电网公司科技项目（5108-202118041A-0-0-00）作者简介：潘玲玲（1985—），女，硕士，高级工程师，Email ：********************* 通讯作者：庄卫金（1978—），男，本科，高级工程师，Email ：**********************基于MF⁃LSTM 的城市电动汽车集中充电负荷可调潜力评估潘玲玲1，庄卫金1，赵奇2，田江2（1.中国电力科学研究院有限公司，江苏南京210000；2.国网江苏省电力有限公司苏州供电分公司，江苏苏州215000）随着新能源在新型电力系统中的比例持续提升[1]，新能源机组具有的波动性大、随机性强的特性，使其成为重要的可调度资源[2]。

基于双记忆注意力的方面级别情感分类模型

第42卷第

8期

2019年8月

Vol. 42 No.

August 2019计算机学

报

CHINESE JOURNAL OF COMPUTERS

基于双记忆注意力的方面级别情感分类模型曾义夫蓝天吴祖峰刘峤（电子科技大学信息与软件工程学院成都

610054）

摘要方面级别情感分类的研究目标是针对给定语句所描述对象的特定方面，分析该语句所表达出的情感极

性•现有的解决方案中，

基于注意力机制的循环神经网络模型和多层模型性能表现较好匚者都借助了深度网络和

外部记忆做注意力调优，但实验结果表明这些模型在处理复杂语句时的性能不够理想•本文提出一种基于双记忆

注意力机制的方面级别情感分类模型，

基本设计思想是借助循环神经网络的序列学习能力得到语句编码

，

并构造

相应的注意力机制从语句编码中提取出关于给定方面词的情感表达•为此，构造了两个外部记忆：陈述性记忆和程

序性记忆，

分别用于捕获语句中与给定方面词相关的词级别和短语级别信息，并设计了一个分段解码器

，用于从相

关记忆中选择并提取情感语义信息•为验证模型的有效性，在三个基准数据集上进行了测试，包括

SemEval2014的

Laptop和Resturnt数据集和一组常用的Twitter数据集，实验结果表明

，本文提出的模型在分类准确率和泛化能

力上的表现优于相关工作•此外，还设计了专门实验以验证本文提出的方面级别注意力机制和情感语义提取机制

的有效性，为进一步研究方面级别情感语义抽取问题提供了新的思路和实验证据.

关键词方面级别情感分类

；情感分析；注意力机制；记忆

；神经语言模型

中图法分类号 TP311 DOI 号 10. 11897/SP.

J 1016. 2019. 01845

Bi-Memory Based Attention

Model for

Aspect

Level Sentiment

Classification

ZENG Yi-Fu

LAN Tian WU Zu-Feng LIU Qiao

基于GRU_网络的格兰杰因果网络重构

第 22卷第 10期2023年 10月Vol.22 No.10Oct.2023软件导刊Software Guide基于GRU网络的格兰杰因果网络重构杨官学，王家栋（江苏大学电气信息工程学院，江苏镇江 212013）摘要：传统格兰杰因果依赖线性动力学，无法适应非线性应用场景的需求，因此提出一种基于GRU网络的格兰杰因果网络重构方法。

该方法将整个网络重构划分为每个目标节点的邻居节点选择问题，针对每个目标节点构建基于GRU网络的格兰杰因果模型，在循环神经网络中引入简单的门控机制控制信息的更新方式，并对网络输入权重施加组稀疏惩罚以提取节点间的格兰杰因果关系。

然后集成每一个子网络，获得最终完整的因果网络结构，并在GRU网络建模训练过程中考虑采用正则化的优化方法。

通过线性矢量自回归、非线性矢量自回归、非均匀嵌入时滞矢量自回归、Lorenz-96模型及DREAM3竞赛数据集的实验表明，所提网络鲁棒性较强、有效性较高，在网络重构性能上具有明显的优越性。

关键词：网络重构；因果推断；循环神经网络；格兰杰因果；门控循环单元DOI：10.11907/rjdk.231360开放科学（资源服务）标识码（OSID）：中图分类号：TP183 文献标识码：A文章编号：1672-7800（2023）010-0049-09Network Reconstruction via Granger Causality Based on GRU NetworkYANG Guanxue， WANG Jiadong（School of Electrical and Information Engineering， Jiangsu University， Zhenjiang 212013， China）Abstract：Reconstruction method of Granger causality network based on GRU network is proposed to address the traditional Granger causality that relies on linear dynamics and cannot meet the needs of nonlinear application scenarios. This method divides the entire network reconstruc⁃tion into neighbor node selection problems for each target node， constructs a Granger causality model based on GRU network for each target node， introduces a simple gating mechanism to control the update of information in the recurrent neural network， and applies a sparse penalty to the network input weight to extract the Granger causality between nodes. Then integrate each sub network to obtain the final complete causal network structure， and consider using regularization optimization methods during the GRU network modeling and training process. The experi⁃ments on linear vector autoregressive， nonlinear vector autoregressive， non-uniformly embedded time-delay vector autoregressive， Lorenz-96 model， and DREAM3 competition dataset show that the proposed network has strong robustness， high effectiveness， and obvious superiority in network reconstruction performance..Key Words：network reconstruction； causal inference； recurrent neural network； Granger causality； gated recurrent unit0 引言现实生活中，许多复杂系统均可在网络角度被抽象表达，其中网络节点代表系统变量，连边代表各变量间的相互作用关系。

【denoise】图像降噪专题

【denoise】图像降噪专题⼀⽂道尽传统图像降噪⽅法《Image Denoising with Deep Convolutional Neural Networks》《Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising》《Noise2Noise: Learning Image Restoration without Clean Data》《Semantic Image Inpainting with Deep Generative Models》《Image De-raining Using a Conditional Generative Adversarial Network》《DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks》课题：如何解决基于⽣成对抗⽹络的去噪任务的领域失配问题？⼀、整体思路1.1 提出问题：如何针对真实的图像噪声，设计出有效的图像去噪⽹络(包含⽹络框架、损失函数及训练策略)来解决领域失配问题，是真实图像去噪领域另⼀关键问题。

1.2 解决办法：1. 通过对图像上存在的复杂噪声进⾏建模，可以准确刻画并模拟⽣成真实图像噪声，从⽽为后续去噪⽹络提供⼤规模可靠的训练数据对。

2. 针对传统图像去噪⽹络存在的领域失配问题，设计并实现适应于真实图像去噪任务的⽹络框架以及相应的损失函数和训练策略，从⽽实现具有较强泛化能⼒的⾼性能真实图像去噪。

⼆、研究内容2.1 图像去噪⽹络架构的研究2.2 图像去噪⽹络的训练策略研究2.3 图像去噪⽹络中的损失函数设计研究三、数据集:本课题拟采取三种类型的数据对来增强真实图像去噪的泛化能⼒。

这三种类型的数据对分别为：3.1 真实的含噪图像与其对应的⼲净图像3.2 基于条件⽣成对抗⽹络⽣成的含噪图像以及⼲净图像3.3 原始噪声图像以及基于条件⽣成对抗⽹络⽣成的含噪图像，需要指出的是，这⾥输⼊与输出 + 是⼲净图像的不同含噪图像。

基于GPS轨迹数据的货车交通流量需求预测循环神经网络模型

虽然已有学者验证了各自预测效果的有效性, 但迄今少有对这四种模型预测效果分别进行对比, 分析模型各自优缺点,导致决策者在流量预测模型选择时,存在着不知选择何种模型的问题.本文作者针对货车流量特性,基于 GPS 轨迹数据,构建货车交通流量需求预测循环神经网络模型,将 LSTM, GRU,Bi-LSTM 及 Bi-GRU 四种神经网络模型分别应用于货车流量预测,分析判别预测优势和适用性. 为政策决策者在面对各种流量状态,选择适用的货车流量预测模型提供依据.
成本耗费,都影响着模型在实践中的应用.因此学者不断研发新颖的交通流预测方法,在 RNN 模型的基础上演变出多种结构,其中代表性的模型结构为长短时循环神经网络 (LongShortTerm Memory, LSTM),LSTM 被证明存在较高的流量预测效果 .部 [13-16] 分学者认为 LSTM 存在着结构复杂、模型训练较长的问题,于是提出了更加简洁的门控单元(GatedRecurrentUnit,GRU)模型,并验证了流量预测的有效性 .在 [17-18] 文本处理的应用中,学者指出同一文本双向输入训练模型会产生更高的预测效果,因此提出双向长短时循环神经网络 (Bidirectional Long Short Term Memory,BiLSTM)[19] 及双向门控单元 (Bidirectional Gated RecurrentUnit,Bi-GRU)模型 . [20-22]

基于ARIMA-RNN组合模型的云服务器老化预测方法

2021年1月Journal on Communications January 2021 第42卷第1期通信学报V ol.42No.1基于ARIMA-RNN组合模型的云服务器老化预测方法孟海宁1,2，童新宇1，石月开1，朱磊1，冯锴1，黑新宏1（1. 西安理工大学计算机科学与工程学院，陕西西安 710048；2. 陕西省网络计算与安全技术重点实验室，陕西西安 710048）摘要：针对云服务器系统运行环境具有非线性、随机性和突发性的特点，提出了基于整合移动平均自回归和循环神经网络组合模型（ARIMA-RNN）的软件老化预测方法。

首先，采用ARIMA模型对云服务器时间序列数据进行老化预测；然后，利用灰色关联度分析法计算时间序列数据的相关性，确定RNN模型的输入维度；最后，将ARIMA模型预测值和历史数据作为RNN模型的输入进行二次老化预测，从而克服了ARIMA模型对波动较大的时间序列数据预测精度较低的局限性。

实验结果表明，ARIMA-RNN组合模型比ARIMA模型及RNN模型的预测精度高，且比RNN模型预测收敛速度快。

关键词：软件老化；云服务器；预测方法；ARIMA模型；RNN模型中图分类号：TP311.1文献标识码：ADOI: 10.11959/j.issn.1000−436x.2021015Cloud server aging prediction method based on hybrid model of auto-regressive integrated moving average andrecurrent neural networkMENG Haining1,2, TONG Xinyu1, SHI Yuekai1, ZHU Lei1, FENG Kai1, HEI Xinhong11. School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048,China2. Shaanxi Key Lab Network Computer and Security Technology, Xi’an 710048, ChinaAbstract: In view of the nonlinear, stochastic and sudden characteristics of operating environment of cloud server system,a software aging prediction method based on hybrid auto-regressive integrated moving average and recurrent neural net-work model (ARIMA-RNN) was proposed. Firstly, the ARIMA model performs software aging prediction of time series data in cloud server. Then the grey relation analysis method was used to calculate the correlation of the time series data to determine the input dimension of RNN model. Finally, the predicted value of ARIMA model and historical data were used as the input of RNN model for secondary aging prediction, which overcomes the limitation that ARIMA model has low prediction accuracy for time series data with large fluctuation. The experimental results show that the proposed ARIMA-RNN model has higher prediction accuracy than ARIMA model and RNN model, and has faster prediction con-vergence speed than RNN model.Keywords: software aging, cloud server, prediction method, auto-regressive integrated moving average model, recurrent neural network model1引言软件老化是影响软件系统可靠性的潜在因素，当长期运行的软件系统存在软件老化现象时，系统将出现性能下降、异常和错误增加，甚至死机[1]。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

HerbertWiklickyCentrumvoorWiskundeenInformaticaP.O.Box4079,NL-1009ABAmsterdam,TheNetherlandse-mail:herbert@cwi.nl

AbstractWeprovethatthesocalled“loadingproblem”for(recurrent)neuralnet-worksisunsolvable.Thisextendsseveralresultswhichalreadydemon-stratedthattrainingandrelateddesignproblemsforneuralnetworksare(atleast)NP-complete.Ourresultalsoimpliesthatitisimpossibletoﬁndortoformulateauniversaltrainingalgorithm,whichforanyneu-ralnetworkarchitecturecoulddetermineacorrectsetofweights.Forthesimpleproofofthis,wewilljustshowthattheloadingproblemisequivalentto“Hilbert’stenthproblem”whichisknowntobeunsolvable.

1THENEURALNETWORKMODELItseemsthattherearerelativelyfewcommonlyacceptedgeneralformaldeﬁnitionsofthenotionofa“neuralnetwork”.AlthoughourresultsalsoholdifbasedonotherformaldeﬁnitionswewilltrytostayhereveryclosetotheoriginalsettinginwhichJudd’sNPcompletenessresultwasgiven[Judd,1990].Butincontrastto[Judd,1990]wewilldealherewithsimplerecurrentnetworksinsteadoffeedforwardarchitectures.

Ournetworksareconstructedfromthreedifferenttypesofunits:-unitscomputejustthesumofallincomingsignals;for-unitstheactivation(node)functionisgivenbythe

productoftheincomingsignals;andwith-units–dependingiftheinputsignalissmallerorlargerthanacertainthresholdparameter–theoutputiszeroorone.Ourunitsareconnectedorlinkedbyrealweightedconnectionsandoperatesynchronously.

Notethatwecouldbaseourconstructionalsojustononegeneraltypeofunits,namelywhatusuallyiscalled-units.Furthermore,onecouldreplacethe-unitsinthebelowconstructionby(recurrent)modulesofsimplelinearthresholdunitswhichhadtoperformunaryintegermultiplication.Thus,nohigherorderelementsareactuallyneeded.

Aswedealwithrecurrentnetworks,thebehaviorofanetworknowisnotjustgivenbyasimplemappingfrominputspacetooutputspace(aswithfeedforwardarchitectures).Ingeneral,aninputpatternnowismappedtoan(inﬁnite)outputsequence.Butnote,thatifweconsiderastheoutputofarecurrentnetworkacertainﬁnal,stableoutputpattern,wecouldreturntoamorestaticsetting.

2THEMAINRESULTThequestionwewilllookatishowdifﬁcultitistoconstructortrainaneuralnetworkofthedescribedtypesothatitactuallyexhibitsacertaindesiredbehavior,i.e.solvesagivenlearningtask.Wewillinvestigatethisbythefollowingdecisionproblem:

Decision1LoadingProblemINSTANCE:Aneuralnetworkarchitectureandalearningtask.QUESTION:Isthereaconﬁgurationforsuchthatisrealizedby?

Byanetworkconﬁgurationwejustthinkofacertainsettingoftheweightsinaneuralnetwork.Ourmainresultconcerningthisproblemnowjuststatesthatitisundecidableorunsolvable.

Theorem1Thereexistsnoalgorithmwhichcoulddecideforanylearningtaskandany(recurrent)neuralnetwork(consistingof-,-,and-units)ifthegivenarchitecturecanperform.

Thedecisionproblem(asusual)givesa“lowerbound”onthehardnessoftherelatedcon-structiveproblem[GareyandJohnson,1979].Ifwecouldconstructacorrectconﬁgurationforallinstances,itwouldbetrivialtodecideinstantlyifacorrectconﬁgurationexistsatall.Thuswehave:

Corollary2Thereexistsnouniversallearningalgorithmfor(recurrent)neuralnetworks.

3THEPROOFTheproofoftheabovetheoremisbyconstructingaclassofneuralnetworksforwhichitisimpossibletodecide(forallinstance)ifacertainlearningtaskcanbesatisﬁed.Wewillreferforthisto“Hilbert’stenthproblem”andshowthatforeachofitsinstanceswecanconstructaneuralnetwork,sothatsolutionstotheloadingproblemwouldleadtosolutionstotheoriginalproblem(andviceversa).ButasweknowthatHilbert’stenthproblemisunsolvablewealsohavetoconcludethattheloadingproblemweconsiderisunsolvable.

3.1HILBERT’STENTHPROBLEMOurreferenceproblem–ofwhichweknowitisunsolvable–iscloselyrelatedtoseveralfamousandclassicalmathematicalproblemsincludingforexampleFermat’slasttheorem.Deﬁnition1Adiophantineequationisapolynomialinvariableswithintegercoefﬁ-cients,thatis

witheachtermoftheformwheretheindicesaretakenfromandthecoefﬁcientZZ.

Theconcreteproblem,ﬁrstformulatedin[Hilbert,1900]istodevelopauniversalalgorithmhowtoﬁndtheintegersolutionsforall,i.e.avectorwithZZ(orIIN),suchthat.Thecorrespondingdecisionproblemthereforeisthefollowing:

Decision2Hilbert’sTenthProblemINSTANCE:Givenadiophantineequation.QUESTION:Isthereanintegersolutionfor?

Althoughthisproblemmightseemtobequitesimple–itformulationisactuallytheshortestamongD.Hilbert’sfamous23problems–itwasnotuntil1970whenY.Matijasevichcouldprovethatitisunsolvableorundecidable[Matijasevich,1970].ThereisnorecursivecomputablepredicatefordiophantineequationswhichholdsifasolutioninZZ(orIIN)existsandfailsotherwise[Davis,1973,Theorem7.4].

3.2THENETWORKARCHITECTURETheconstructionofaneuralnetworkforeachdiophantineisnowstraightforward(seeFig.1).Itisjustathreestepconstruction.