Generalization by neural networks

合集下载

人工智能的定义

人工智能的定义在昨天人工神经网络课程之后，有一位同学课下问了一个问题，她这学期也在学习“机器学习”课程，感觉“人工神经网络”课程的内容与机器学习课程的内容大同小异。

究竟这些课程之间有何区别呢？弄不清楚这些自己这学期的课程很是担心。

之所以产生这样的疑问，原因来自于这两门课程之间的相似之处，而且随着学科的发展它们重合度也在增加。

但它们之间的差异在哪儿呢？除了它们各自发展的理论和技术历史和路径不同、为了研究热点和实现途径差异之外，当今也许认清它们之间的联系更加重要。

DJ Patil在他的一个短片中 What’s the difference between ML and NN? 总结了机器学习和人工神经网络几点关系：▲ DJ Patil: What's the difference between ML and NN?一种对人工神经网络，机器学习，人工智能之间关系的最基本看法是：人工神经网络是众多问题解决方案中的一种；现今阶段你所能看到的人工神经网络大部分是一种使用大量数据训练的多层深度学习网络，并在传统的误差反向传播（BP）技术之上衍生出很多其他特性；对于神经网络算法的提高也使得它与机器学习方法有了很多共同之处：比如监督学习、非监督学习、Logistic回归、随机森林等。

这些方法的共同之处都是通过一些训练数据及来寻找到一些满足某些约束条件的函数映射。

近日，一篇来自于斯坦福大学的人工智能定义短文 Artificial Intelligence Definitions 从某一角度较为详细的把智能相关的概念进行了梳理，阅读它也许可以帮你尽可能理清这个领域中的众多学科之间的关系。

智能可以被定义为在不确定、时刻变化的环境中通过学习和实施合适的技术来解决碰到的问题或达到既定目标的能力。

而那种安全靠编程来灵活、精确、可靠工作的工厂中的机器人则不具有智能。

Intelligence might be defined as the ability to learn and perform suitable techniques to solve problems and achieve goals, appropriate to the context in an uncertain,ever-varying world.A fully pre-programmed factory robotis flexible, accurate, and consistent but not intelligent.人工智能这一词语是由斯坦福大学退休名誉教授 McCarthy 在1955年提出，是指：“制造出智能设备的科学和工程技术。

Published

P.Sollich@
Peter Sollich
Abstract
We present a new method for obtaining the response function G and its average G from which most of the properties of learning and generalization in linear perceptrons can be derived. We rst rederive the known results for the `thermodynamic limit' of in nite perceptron size N and show explicitly that G is self-averaging in this limit. We then discuss extensions of our method to more general learning scenarios with anisotropic teacher space priors, input distributions, and weight decay terms. Finally, we use our method to calculate the nite N corrections of order 1=N to G and discuss the corresponding nite size e ects on generalization and learning dynamics. An important spin-o is the observation that results obtained in the thermodynamic limit are often directly relevant to systems of fairly modest, `real-world' sizes.

深度图神经网络（GNN）论文

深度图神经⽹络（GNN）论⽂part1/经典款论⽂1. KDD 2016，Node2vec 经典必读第⼀篇，平衡同质性和结构性《node2vec: Scalable Feature Learning for Networks》2. WWW2015，LINE 1阶+2阶相似度《Line: Large-scale information network embedding》3. KDD 2016，SDNE 多层⾃编码器《Structural deep network embedding》4. KDD 2017，metapath2vec 异构图⽹络《metapath2vec: Scalable representation learning for heterogeneous networks》5. NIPS 2013，TransE 知识图谱奠基《Translating Embeddings for Modeling Multi-relational Data》6. ICLR 2018，GAT attention机制《Graph Attention Network》7. NIPS 2017，GraphSAGE 归纳式学习框架《Inductive Representation Learning on Large Graphs 》8. ICLR 2017，GCN 图神经开⼭之作《SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS》9. ICLR 2016，GGNN 门控图神经⽹络《Gated Graph Sequence Neural Networks》10. ICML 2017，MPNN 空域卷积消息传递框架《Neural Message Passing for Quantum Chemistry》part2/热门款论⽂2020年之前11.[arXiv 2019]Revisiting Graph Neural Networks: All We Have is Low-Pass Filters重温图神经⽹络：我们只有低通滤波器[论⽂]12.[NeurIPS 2019]Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks 打破天花板：更强的多尺度深度图卷积⽹络[论⽂]13.[ICLR 2019] Predict then Propagate: Graph Neural Networks meet Personalized PageRank先预测后传播：图神经⽹络满⾜个性化 PageRank[论⽂][代码]14.[ICCV 2019]DeepGCNs: Can GCNs Go as Deep as CNNs?DeepGCN：GCN能像CNN⼀样深⼊吗？[论⽂][代码(Pytorch)][代码(TensorFlow)]15.[ICML 2018]Representation Learning on Graphs with Jumping Knowledge Networks基于跳跃知识⽹络的图表征学习[论⽂]16.[AAAI 2018]Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning深⼊了解⽤于半监督学习的图卷积⽹络[论⽂]2020年17.[arXiv 2020]Deep Graph Neural Networks with Shallow Subgraph Samplers具有浅⼦图采样器的深图神经⽹络[论⽂]18.[arXiv 2020]Revisiting Graph Convolutional Network on Semi-Supervised Node Classification from an Optimization Perspective从优化的⾓度重新审视半监督节点分类的图卷积⽹络[论⽂]19.[arXiv 2020]Tackling Over-Smoothing for General Graph Convolutional Networks解决通⽤图卷积⽹络的过度平滑[论⽂]20.[arXiv 2020]DeeperGCN: All You Need to Train Deeper GCNsDeeperGCN：训练更深的 GCN 所需的⼀切[论⽂][代码]21.[arXiv 2020]Effective Training Strategies for Deep Graph Neural Networks深度图神经⽹络的有效训练策略[论⽂][代码]22.[arXiv 2020]Revisiting Over-smoothing in Deep GCNs重新审视深度GCN中的过度平滑[论⽂]23.[NeurIPS 2020]Graph Random Neural Networks for Semi-Supervised Learning on Graphs⽤于图上半监督学习的图随机神经⽹络[论⽂][代码]24.[NeurIPS 2020]Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks 散射GCN：克服图卷积⽹络中的过度平滑[论⽂][代码]25.[NeurIPS 2020]Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural NetworksTransduction through Gradient Boosting 的优化和泛化分析及其在多尺度图神经⽹络中的应⽤[论⽂][代码]26.[NeurIPS 2020]Towards Deeper Graph Neural Networks with Differentiable Group Normalization 迈向具有可微组归⼀化的更深图神经⽹络[论⽂]27.[ICML 2020 Workshop GRL+]A Note on Over-Smoothing for Graph Neural Networks关于图神经⽹络过度平滑的说明[论⽂]28.[ICML 2020]Bayesian Graph Neural Networks with Adaptive Connection Sampling具有⾃适应连接采样的贝叶斯图神经⽹络[论⽂]29.[ICML 2020]Continuous Graph Neural Networks连续图神经⽹络[论⽂]30.[ICML 2020]Simple and Deep Graph Convolutional Networks简单和深度图卷积⽹络[论⽂] [代码]31.[KDD 2020] Towards Deeper Graph Neural Networks⾛向更深的图神经⽹络[论⽂] [代码]32.[ICLR 2020]Graph Neural Networks Exponentially Lose Expressive Power for Node Classification 图神经⽹络对节点分类的表达能⼒呈指数级下降[论⽂][代码]33.[ICLR 2020] DropEdge: Towards Deep Graph Convolutional Networks on Node Classification DropEdge：迈向节点分类的深度图卷积⽹络[Paper][Code]34.[ICLR 2020] PairNorm: Tackling Oversmoothing in GNNsPairNorm：解决GNN中的过度平滑问题[论⽂][代码]35.[ICLR 2020]Measuring and Improving the Use of Graph Information in Graph Neural Networks测量和改进图神经⽹络中图信息的使⽤[论⽂] [代码]36.[AAAI 2020]Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View从拓扑⾓度测量和缓解图神经⽹络的过度平滑问题[论⽂]同学们是不是发现有些论⽂有代码，有些论⽂没有代码？学姐建议学概念读没代码的，然后再读有代码的，原因的话上周的⽂章有写，花⼏分钟看⼀下【学姐带你玩AI】公众号的——《图像识别深度学习研究⽅向没有导师带该怎么学习》part3/最新款论⽂37.[arXiv 2021]Two Sides of the Same Coin: Heterophily and Oversmoothing in Graph Convolutional Neural Networks同⼀枚硬币的两⾯：图卷积神经⽹络中的异质性和过度平滑[论⽂]38.[arXiv 2021]Graph Neural Networks Inspired by Classical Iterative Algorithms受经典迭代算法启发的图神经⽹络[论⽂]39.[ICML 2021]Training Graph Neural Networks with 1000 Layers训练 1000 层图神经⽹络[论⽂][代码]40.[ICML 2021] Directional Graph Networks ⽅向图⽹络[论⽂][代码]41.[ICLR 2021]On the Bottleneck of Graph Neural Networks and its Practical Implications 关于图神经⽹络的瓶颈及其实际意义[论⽂]42.[ICLR 2021] Adaptive Universal Generalized PageRank Graph Neural Network[论⽂][代码]43.[ICLR 2021]Simple Spectral Graph Convolution简单的谱图卷积[论⽂]。

AI术语

人工智能专业重要词汇表1、A开头的词汇：Artificial General Intelligence/AGI通用人工智能Artificial Intelligence/AI人工智能Association analysis关联分析Attention mechanism注意力机制Attribute conditional independence assumption属性条件独立性假设Attribute space属性空间Attribute value属性值Autoencoder自编码器Automatic speech recognition自动语音识别Automatic summarization自动摘要Average gradient平均梯度Average-Pooling平均池化Accumulated error backpropagation累积误差逆传播Activation Function激活函数Adaptive Resonance Theory/ART自适应谐振理论Addictive model加性学习Adversarial Networks对抗网络Affine Layer仿射层Affinity matrix亲和矩阵Agent代理/ 智能体Algorithm算法Alpha-beta pruningα-β剪枝Anomaly detection异常检测Approximation近似Area Under ROC Curve／AUC R oc 曲线下面积2、B开头的词汇Backpropagation Through Time通过时间的反向传播Backpropagation/BP反向传播Base learner基学习器Base learning algorithm基学习算法Batch Normalization/BN批量归一化Bayes decision rule贝叶斯判定准则Bayes Model Averaging／BMA贝叶斯模型平均Bayes optimal classifier贝叶斯最优分类器Bayesian decision theory贝叶斯决策论Bayesian network贝叶斯网络Between-class scatter matrix类间散度矩阵Bias偏置/ 偏差Bias-variance decomposition偏差-方差分解Bias-Variance Dilemma偏差–方差困境Bi-directional Long-Short Term Memory/Bi-LSTM双向长短期记忆Binary classification二分类Binomial test二项检验Bi-partition二分法Boltzmann machine玻尔兹曼机Bootstrap sampling自助采样法／可重复采样／有放回采样Bootstrapping自助法Break-Event Point／BEP平衡点3、C开头的词汇Calibration校准Cascade-Correlation级联相关Categorical attribute离散属性Class-conditional probability类条件概率Classification and regression tree/CART分类与回归树Classifier分类器Class-imbalance类别不平衡Closed -form闭式Cluster簇/类/集群Cluster analysis聚类分析Clustering聚类Clustering ensemble聚类集成Co-adapting共适应Coding matrix编码矩阵COLT国际学习理论会议Committee-based learning基于委员会的学习Competitive learning竞争型学习Component learner组件学习器Comprehensibility可解释性Computation Cost计算成本Computational Linguistics计算语言学Computer vision计算机视觉Concept drift概念漂移Concept Learning System /CLS概念学习系统Conditional entropy条件熵Conditional mutual information条件互信息Conditional Probability Table／CPT条件概率表Conditional random field/CRF条件随机场Conditional risk条件风险Confidence置信度Confusion matrix混淆矩阵Connection weight连接权Connectionism连结主义Consistency一致性／相合性Contingency table列联表Continuous attribute连续属性Convergence收敛Conversational agent会话智能体Convex quadratic programming凸二次规划Convexity凸性Convolutional neural network/CNN卷积神经网络Co-occurrence同现Correlation coefficient相关系数Cosine similarity余弦相似度Cost curve成本曲线Cost Function成本函数Cost matrix成本矩阵Cost-sensitive成本敏感Cross entropy交叉熵Cross validation交叉验证Crowdsourcing众包Curse of dimensionality维数灾难Cut point截断点Cutting plane algorithm割平面法4、D开头的词汇Data mining数据挖掘Data set数据集Decision Boundary决策边界Decision stump决策树桩Decision tree决策树／判定树Deduction演绎Deep Belief Network深度信念网络Deep Convolutional Generative Adversarial Network/DCGAN深度卷积生成对抗网络Deep learning深度学习Deep neural network/DNN深度神经网络Deep Q-Learning深度Q 学习Deep Q-Network深度Q 网络Density estimation密度估计Density-based clustering密度聚类Differentiable neural computer可微分神经计算机Dimensionality reduction algorithm降维算法Directed edge有向边Disagreement measure不合度量Discriminative model判别模型Discriminator判别器Distance measure距离度量Distance metric learning距离度量学习Distribution分布Divergence散度Diversity measure多样性度量／差异性度量Domain adaption领域自适应Downsampling下采样D-separation （Directed separation）有向分离Dual problem对偶问题Dummy node哑结点Dynamic Fusion动态融合Dynamic programming动态规划5、E开头的词汇Eigenvalue decomposition特征值分解Embedding嵌入Emotional analysis情绪分析Empirical conditional entropy经验条件熵Empirical entropy经验熵Empirical error经验误差Empirical risk经验风险End-to-End端到端Energy-based model基于能量的模型Ensemble learning集成学习Ensemble pruning集成修剪Error Correcting Output Codes／ECOC纠错输出码Error rate错误率Error-ambiguity decomposition误差-分歧分解Euclidean distance欧氏距离Evolutionary computation演化计算Expectation-Maximization期望最大化Expected loss期望损失Exploding Gradient Problem梯度爆炸问题Exponential loss function指数损失函数Extreme Learning Machine/ELM超限学习机6、F开头的词汇Factorization因子分解False negative假负类False positive假正类False Positive Rate/FPR假正例率Feature engineering特征工程Feature selection特征选择Feature vector特征向量Featured Learning特征学习Feedforward Neural Networks/FNN前馈神经网络Fine-tuning微调Flipping output翻转法Fluctuation震荡Forward stagewise algorithm前向分步算法Frequentist频率主义学派Full-rank matrix满秩矩阵Functional neuron功能神经元7、G开头的词汇Gain ratio增益率Game theory博弈论Gaussian kernel function高斯核函数Gaussian Mixture Model高斯混合模型General Problem Solving通用问题求解Generalization泛化Generalization error泛化误差Generalization error bound泛化误差上界Generalized Lagrange function广义拉格朗日函数Generalized linear model广义线性模型Generalized Rayleigh quotient广义瑞利商Generative Adversarial Networks/GAN生成对抗网络Generative Model生成模型Generator生成器Genetic Algorithm/GA遗传算法Gibbs sampling吉布斯采样Gini index基尼指数Global minimum全局最小Global Optimization全局优化Gradient boosting梯度提升Gradient Descent梯度下降Graph theory图论Ground-truth真相／真实8、H开头的词汇Hard margin硬间隔Hard voting硬投票Harmonic mean调和平均Hesse matrix海塞矩阵Hidden dynamic model隐动态模型Hidden layer隐藏层Hidden Markov Model/HMM隐马尔可夫模型Hierarchical clustering层次聚类Hilbert space希尔伯特空间Hinge loss function合页损失函数Hold-out留出法Homogeneous同质Hybrid computing混合计算Hyperparameter超参数Hypothesis假设Hypothesis test假设验证9、I开头的词汇ICML国际机器学习会议Improved iterative scaling/IIS改进的迭代尺度法Incremental learning增量学习Independent and identically distributed/i.i.d.独立同分布Independent Component Analysis/ICA独立成分分析Indicator function指示函数Individual learner个体学习器Induction归纳Inductive bias归纳偏好Inductive learning归纳学习Inductive Logic Programming／ILP归纳逻辑程序设计Information entropy信息熵Information gain信息增益Input layer输入层Insensitive loss不敏感损失Inter-cluster similarity簇间相似度International Conference for Machine Learning/ICML国际机器学习大会Intra-cluster similarity簇内相似度Intrinsic value固有值Isometric Mapping/Isomap等度量映射Isotonic regression等分回归Iterative Dichotomiser迭代二分器10、K开头的词汇Kernel method核方法Kernel trick核技巧Kernelized Linear Discriminant Analysis／KLDA核线性判别分析K-fold cross validation k 折交叉验证／k 倍交叉验证K-Means Clustering K –均值聚类K-Nearest Neighbours Algorithm/KNN K近邻算法Knowledge base知识库Knowledge Representation知识表征11、L开头的词汇Label space标记空间Lagrange duality拉格朗日对偶性Lagrange multiplier拉格朗日乘子Laplace smoothing拉普拉斯平滑Laplacian correction拉普拉斯修正Latent Dirichlet Allocation隐狄利克雷分布Latent semantic analysis潜在语义分析Latent variable隐变量Lazy learning懒惰学习Learner学习器Learning by analogy类比学习Learning rate学习率Learning Vector Quantization/LVQ学习向量量化Least squares regression tree最小二乘回归树Leave-One-Out/LOO留一法linear chain conditional random field线性链条件随机场Linear Discriminant Analysis／LDA线性判别分析Linear model线性模型Linear Regression线性回归Link function联系函数Local Markov property局部马尔可夫性Local minimum局部最小Log likelihood对数似然Log odds／logit对数几率Logistic Regression Logistic 回归Log-likelihood对数似然Log-linear regression对数线性回归Long-Short Term Memory/LSTM长短期记忆Loss function损失函数12、M开头的词汇Machine translation/MT机器翻译Macron-P宏查准率Macron-R宏查全率Majority voting绝对多数投票法Manifold assumption流形假设Manifold learning流形学习Margin theory间隔理论Marginal distribution边际分布Marginal independence边际独立性Marginalization边际化Markov Chain Monte Carlo/MCMC马尔可夫链蒙特卡罗方法Markov Random Field马尔可夫随机场Maximal clique最大团Maximum Likelihood Estimation/MLE极大似然估计／极大似然法Maximum margin最大间隔Maximum weighted spanning tree最大带权生成树Max-Pooling最大池化Mean squared error均方误差Meta-learner元学习器Metric learning度量学习Micro-P微查准率Micro-R微查全率Minimal Description Length/MDL最小描述长度Minimax game极小极大博弈Misclassification cost误分类成本Mixture of experts混合专家Momentum动量Moral graph道德图／端正图Multi-class classification多分类Multi-document summarization多文档摘要Multi-layer feedforward neural networks多层前馈神经网络Multilayer Perceptron/MLP多层感知器Multimodal learning多模态学习Multiple Dimensional Scaling多维缩放Multiple linear regression多元线性回归Multi-response Linear Regression ／MLR多响应线性回归Mutual information互信息13、N开头的词汇Naive bayes朴素贝叶斯Naive Bayes Classifier朴素贝叶斯分类器Named entity recognition命名实体识别Nash equilibrium纳什均衡Natural language generation/NLG自然语言生成Natural language processing自然语言处理Negative class负类Negative correlation负相关法Negative Log Likelihood负对数似然Neighbourhood Component Analysis/NCA近邻成分分析Neural Machine Translation神经机器翻译Neural Turing Machine神经图灵机Newton method牛顿法NIPS国际神经信息处理系统会议No Free Lunch Theorem／NFL没有免费的午餐定理Noise-contrastive estimation噪音对比估计Nominal attribute列名属性Non-convex optimization非凸优化Nonlinear model非线性模型Non-metric distance非度量距离Non-negative matrix factorization非负矩阵分解Non-ordinal attribute无序属性Non-Saturating Game非饱和博弈Norm范数Normalization归一化Nuclear norm核范数Numerical attribute数值属性14、O开头的词汇Objective function目标函数Oblique decision tree斜决策树Occam’s razor奥卡姆剃刀Odds几率Off-Policy离策略One shot learning一次性学习One-Dependent Estimator／ODE独依赖估计On-Policy在策略Ordinal attribute有序属性Out-of-bag estimate包外估计Output layer输出层Output smearing输出调制法Overfitting过拟合／过配Oversampling过采样15、P开头的词汇Paired t-test成对t 检验Pairwise成对型Pairwise Markov property成对马尔可夫性Parameter参数Parameter estimation参数估计Parameter tuning调参Parse tree解析树Particle Swarm Optimization/PSO粒子群优化算法Part-of-speech tagging词性标注Perceptron感知机Performance measure性能度量Plug and Play Generative Network即插即用生成网络Plurality voting相对多数投票法Polarity detection极性检测Polynomial kernel function多项式核函数Pooling池化Positive class正类Positive definite matrix正定矩阵Post-hoc test后续检验Post-pruning后剪枝potential function势函数Precision查准率／准确率Prepruning预剪枝Principal component analysis/PCA主成分分析Principle of multiple explanations多释原则Prior先验Probability Graphical Model概率图模型Proximal Gradient Descent/PGD近端梯度下降Pruning剪枝Pseudo-label伪标记16、Q开头的词汇Quantized Neural Network量子化神经网络Quantum computer量子计算机Quantum Computing量子计算Quasi Newton method拟牛顿法17、R开头的词汇Radial Basis Function／RBF径向基函数Random Forest Algorithm随机森林算法Random walk随机漫步Recall查全率／召回率Receiver Operating Characteristic/ROC受试者工作特征Rectified Linear Unit/ReLU线性修正单元Recurrent Neural Network循环神经网络Recursive neural network递归神经网络Reference model参考模型Regression回归Regularization正则化Reinforcement learning/RL强化学习Representation learning表征学习Representer theorem表示定理reproducing kernel Hilbert space/RKHS再生核希尔伯特空间Re-sampling重采样法Rescaling再缩放Residual Mapping残差映射Residual Network残差网络Restricted Boltzmann Machine/RBM受限玻尔兹曼机Restricted Isometry Property/RIP限定等距性Re-weighting重赋权法Robustness稳健性/鲁棒性Root node根结点Rule Engine规则引擎Rule learning规则学习18、S开头的词汇Saddle point鞍点Sample space样本空间Sampling采样Score function评分函数Self-Driving自动驾驶Self-Organizing Map／SOM自组织映射Semi-naive Bayes classifiers半朴素贝叶斯分类器Semi-Supervised Learning半监督学习semi-Supervised Support Vector Machine半监督支持向量机Sentiment analysis情感分析Separating hyperplane分离超平面Sigmoid function Sigmoid 函数Similarity measure相似度度量Simulated annealing模拟退火Simultaneous localization and mapping同步定位与地图构建Singular Value Decomposition奇异值分解Slack variables松弛变量Smoothing平滑Soft margin软间隔Soft margin maximization软间隔最大化Soft voting软投票Sparse representation稀疏表征Sparsity稀疏性Specialization特化Spectral Clustering谱聚类Speech Recognition语音识别Splitting variable切分变量Squashing function挤压函数Stability-plasticity dilemma可塑性-稳定性困境Statistical learning统计学习Status feature function状态特征函Stochastic gradient descent随机梯度下降Stratified sampling分层采样Structural risk结构风险Structural risk minimization/SRM结构风险最小化Subspace子空间Supervised learning监督学习／有导师学习support vector expansion支持向量展式Support Vector Machine/SVM支持向量机Surrogat loss替代损失Surrogate function替代函数Symbolic learning符号学习Symbolism符号主义Synset同义词集19、T开头的词汇T-Distribution Stochastic Neighbour Embedding/t-SNE T–分布随机近邻嵌入Tensor张量Tensor Processing Units/TPU张量处理单元The least square method最小二乘法Threshold阈值Threshold logic unit阈值逻辑单元Threshold-moving阈值移动Time Step时间步骤Tokenization标记化Training error训练误差Training instance训练示例／训练例Transductive learning直推学习Transfer learning迁移学习Treebank树库Tria-by-error试错法True negative真负类True positive真正类True Positive Rate/TPR真正例率Turing Machine图灵机Twice-learning二次学习20、U开头的词汇Underfitting欠拟合／欠配Undersampling欠采样Understandability可理解性Unequal cost非均等代价Unit-step function单位阶跃函数Univariate decision tree单变量决策树Unsupervised learning无监督学习／无导师学习Unsupervised layer-wise training无监督逐层训练Upsampling上采样21、V开头的词汇Vanishing Gradient Problem梯度消失问题Variational inference变分推断VC Theory VC维理论Version space版本空间Viterbi algorithm维特比算法Von Neumann architecture冯·诺伊曼架构22、W开头的词汇Wasserstein GAN/WGAN Wasserstein生成对抗网络Weak learner弱学习器Weight权重Weight sharing权共享Weighted voting加权投票法Within-class scatter matrix类内散度矩阵Word embedding词嵌入Word sense disambiguation词义消歧23、Z开头的词汇Zero-data learning零数据学习Zero-shot learning零次学习。

复变权函数神经网络研究及其应用

关键词：神经网络；权函数；复 Lagrange 插值；Fejér ຫໍສະໝຸດ 点；FIR；BP；RBFI
南京邮电大学硕士研究生学位论文
Abstract
Abstract
In the real number field and complex number field the traditional network learning algorithm (such as BP algorithm, RBF algorithm) exist many defects, such as local minima, slow convergence, difficult to obtain the global optimal point and the constant weights which are difficult to reflect the sample information; and in the practical application of traditional neural network model is difficult to determine. Based on those defects, a new model and algorithm, named complex-variable weight functions neural networks, is proposed in the monograph, “The new neural networks theory and method”. The new algorithm has overcome the tradition neural network learning algorithm’s flaws and simplified the network architecture. Moreover, along with samples increase in the number, the network's generalization ability is also enhanced. Complex weight function neural network has the characteristics of weight functions neural networks, and it’s the implementation of the weight functions neural networks in the complex field. In the theory part, the approximation problem of the complex Lagrange interpolation based on the Fejér points is given; after that, the model and the weight function are determine of the complex weight function neural network; then analyze the error of the network; Finally, we can conclude that complex weight function neural network has high accuracy and convergence speed through simulation experiments, which is compared with conventional BP, RBF neural networks algorithm. In the application part, the FIR filter design is given, which based on the complex weight function neural network. First the design model is given; then on the basis, the simulation experiments are given which based on the complex weight function neural network algorithm and the BP algorithm; at the last part, by comparing the simulation results, we can find that the new algorithm in this paper have good results. Key words: neural network; complex Lagrange interpolation; Fejér basis points; FIR

基于深度学习的表面缺陷检测技术研究

基于深度学习的表面缺陷检测技术研究摘要随着工业的快速发展，人们对产品的质量要求也越来越关注。

产品表面缺陷检测作为生产过程中最重要的工序之一，它直接影响到产品质量以及用户体验。

产品在生产过程中往往会出现一些缺陷，这些缺陷具有一定的随机性，缺陷类型、形状大小各异。

传统的人工检测虽然方法简单，有些缺陷的特征不够明显，利用人眼难以识别，检测误差较大，并且效率低下；现有的机器视觉方法能够实现自动检测，但其核心算法需要人工提取特征，存在选取特征不合适、算法不通用等诸多问题。

基于此，本文结合图像的特点，对深度卷积神经网络应用于锂电池面板表面的缺陷检测进行研究。

针对数据样本不足的问题，本文使用数据增强扩充锂电池面板数据集，并建立了不同数量的数据集来验证卷积神经网络模型的泛化性能。

此外，本文提出一种结合CycleGAN的算法扩充数据集，将已有的缺陷样本和正常样本进行了充分利用，通过训练生成对抗网络学习正常样本与缺陷样本的特征分布，实现图像跨域转换。

网络可以将缺陷样本的特征迁移到正常样本中生成新的缺陷样本，同时也可以生成自身学习到的正常样本和缺陷样本。

实验结果表明该方法生成图像逼真，能有效提高算法的识别精度。

针对传统的表面缺陷检测算法精度不高，需要手工提取特征等问题，本文对卷积神经网络应用于锂电池面板分类进行研究。

卷积神经网络能够自动提取图像中的特征，无需人为干预，它的局部连接、权值共享等技术能够有效减少模型的参数量，具有很强的泛化能力。

其中影响缺陷分类准确率的关键因素在于卷积神经网络模型的设计，本文综合考虑了模型的复杂度和构建方式，主要从网络的深度、宽度方面进行探索，利用批归一化、残差结构、Inception分支、Senet等设计不同类型的卷积神经网络模型应用在锂电池面板缺陷检测。

分别进行实验验证不同复杂度的网络模型的识别效果，对比不同的数据集对结果产生的影响。

实验结果表明本文设计的最佳深度卷积神经网络模型识别准确率达到99.44%，模型参数量适中。

he参数初始化方法

he参数初始化方法When it comes to initializing parameters, it's essential to consider all possible factors that can affect the performance of the system. 参数初始化是机器学习中非常重要的一步，它直接影响到系统的性能和准确度。

Proper parameter initialization can lead to faster convergence and better generalization of the model, while improper initialization can cause the model to get stuck in local minima or diverge altogether. 合适的参数初始化可以加速模型的收敛并提高泛化能力，而不合适的初始化则可能导致模型陷入局部最小值或者根本无法收敛。

One of the most common methods for parameter initialization is to use small random values. 最常见的参数初始化方法之一就是使用小的随机值。

This can help prevent neurons from becoming highly correlated, as well as reduce the chances of getting stuck in local minima. 这有助于防止神经元高度相关以及减少陷入局部最小值的可能性。

Another approach is to use initialization schemes such as Xavier or He initialization, which take into account the number of input and output connections for each neuron. 另一种方法是使用Xavier或者He 初始化等方案，这些方法考虑了每个神经元的输入和输出连接的数量。

libin英文翻译NEURAL NETWORKS

┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊英文文章NEURAL NETWORKSby Christos Stergiou and Dimitrios SiganosAbstractThis report is an introduction to Artificial Neural Networks. The various types of neural networks are explained and demonstrated, applications of neural networks like ANNs in medicine are described, and a detailed historical background is provided. The connection between the artificial and the real thing is also investigated and explained. Finally, the mathematical models involved are presented and demonstrated.1. Introduction to neural networks1.1 What is a Neural Network?An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurones) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurones. This is true of ANNs as well.1.2 Historical backgroundNeural network simulations appear to be a recent development. However, this field was established before the advent of computers, and has survived at least one major setback and several eras.Many importand advances have been boosted by the use of inexpensive computer emulations. Following an initial period of enthusiasm, the field survived a period of frustration and disrepute. During this period when funding and professional support was minimal, important advances were made by relatively few reserchers. These pioneers were able to develop convincing technology which surpassed the limitations identified by Minsky and Papert. Minsky and Papert, published a book (in 1969) in which they summed up a general feeling of frustration (against neural networks) among researchers, and was thus accepted by most without further analysis. Currently, the neural network field┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊enjoys a resurgence of interest and a corresponding increase in funding. The first artificial neuron was produced in 1943 by the neurophysiologist Warren McCulloch and the logician Walter Pits. But the technology available at that time did not allow them to do too much.1.3 Why use neural networks?Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyse. This expert can then be used to provide projections given new situations of interest and answer "what if" questions.Other advantages include:1.Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience.2.Self-Organisation: An ANN can create its own organisation or representation of the information it receives during learning time.3.Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability.4.Fault Tolerance via Redundant Information Coding: Partial destruction ofa network leads to the corresponding degradation of performance. However, some network capabilities may be retained even with major network damage.1.4 Neural networks versus conventional computersNeural networks take a different approach to problem solving than that of conventional computers. Conventional computers use an algorithmic approach i.e. the computer follows a set of instructions in order to solve a problem. Unless the specific steps that the computer needs to follow are known the computer cannot solve the problem. That restricts the problem solving capability of conventional computers to problems that we already understand and know how to solve. But computers would be so much more useful if they could do things that we don't exactly know how to do.Neural networks process information in a similar way the human brain does. The network is composed of a large number of highly interconnected processing elements(neurones) working in parallel to solve a specific problem. Neural networks learn by example. They cannot be programmed to perform a specific task. The examples must be selected carefully otherwise useful time is wasted┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊or even worse the network might be functioning incorrectly. The disadvantage is that because the network finds out how to solve the problem by itself, its operation can be unpredictable.On the other hand, conventional computers use a cognitive approach to problem solving; the way the problem is to solved must be known and stated in small unambiguous instructions. These instructions are then converted to a high level language program and then into machine code that the computer can understand. These machines are totally predictable; if anything goes wrong is due to a software or hardware fault.Neural networks and conventional algorithmic computers are not in competition but complement each other. There are tasks are more suited to an algorithmic approach like arithmetic operations and tasks that are more suited to neural networks. Even more, a large number of tasks, require systems that use a combination of the two approaches (normally a conventional computer is used to supervise the neural network) in order to perform at maximum efficiency. Neural networks do not perform miracles. But if used sensibly they can produce some amazing results.2. Human and Artificial Neurones - investigating the similarities2.1 How the Human Brain Learns?Much is still unknown about how the brain trains itself to process information, so theories abound. In the human brain, a typical neuron collects signals from others through a host of fine structures called dendrites. The neuron sends out spikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches. At the end of each branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity from the axon into electrical effects that inhibit or excite activity in the connected neurones. When a neuron receives excitatory input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on another changes.┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊Components of a neuronThe synapse2.2 From Human Neurones to Artificial NeuronesWe conduct these neural networks by first trying to deduce the essential features of neurones and their interconnections. We then typically program a computer to simulate these features. However because our knowledge of neurones is incomplete and our computing power is limited, our models arenecessarily gross idealisations of real networks of neurones.The neuron model3. An engineering approach3.1 A simple neuronAn artificial neuron is a device with many inputs and one output. The neuron has two modes of operation; the training mode and the using mode. In the training mode, the neuron can be trained to fire (or not), for particular input patterns. In the using mode, when a taught input pattern is detected at the input, its associated output becomes the current output. If the input pattern does not belong in the taught list of input patterns, the firing rule is used to determine whether to fire or not.┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊A simple neuron3.2 Firing rulesThe firing rule is an important concept in neural networks and accounts for their high flexibility. A firing rule determines how one calculates whether a neuron should fire for any input pattern. It relates to all the input patterns, not only the ones on which the node was trained.A simple firing rule can be implemented by using Hamming distance technique. The rule goes as follows:Take a collection of training patterns for a node, some of which cause it to fire (the 1-taught set of patterns) and others which prevent it from doing so (the 0-taught set). Then the patterns not in the collection cause the node to fire if, on comparison , they have more input elements in common with the 'nearest' pattern in the 1-taught set than with the 'nearest' pattern in the 0-taught set. If there is a tie, then the pattern remains in the undefined state.For example, a 3-input neuron is taught to output 1 when the input (X1,X2 and X3) is 111 or 101 and to output 0 when the input is 000 or 001. Then, beforeAs an example of the way the firing rule is applied, take the pattern 010. It differs from 000 in 1 element, from 001 in 2 elements, from 101 in 3 elements and from 111 in 2 elements. Therefore, the 'nearest' pattern is 000 which belongs in the 0-taught set. Thus the firing rule requires that the neuron should not fire when the input is 001. On the other hand, 011 is equally distant┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊from two taught patterns that have different outputs and thus the output stays undefined (0/1).By applying the firing in every column the following truth table is obtained;The difference between the two truth tables is called the generalisation of the neuron. Therefore the firing rule gives the neuron a sense of similarity and enables it to respond 'sensibly' to patterns not seen during training.3.3 Pattern Recognition - an exampleAn important application of neural networks is pattern recognition. Pattern recognition can be implemented by using a feed-forward (figure 1) neural network that has been trained accordingly. During training, the network is trained to associate outputs with input patterns. When the network is used, it identifies the input pattern and tries to output the associated output pattern. The power of neural networks comes to life when a pattern that has no output associated with it, is given as an input. In this case, the network gives the output that corresponds to a taught input pattern that is least different from the given pattern.Figure 1.For example:The network of figure 1 is trained to recognise the patterns T and H. The associated patterns are all black and all white respectively as shown below.┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊If we represent black squares with 0 and white squares with 1 then the truthTop neuronBottom neuronFrom the tables it can be seen the following associasions can be extracted:In this case, it is obvious that the output should be all blacks since the input pattern is almost the same as the 'T' pattern.┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊Here also, it is obvious that the output should be all whites since theinput pattern is almost the same as the 'H' pattern.Here, the top row is 2 errors away from the a T and 3 from an H. So the top output is black. The middle row is 1 error away from both T and H so the output is random. The bottom row is 1 error away from T and 2 away from H. Therefore the output is black. The total output of the network is still in favour of the T shape.3.4 A more complicated neuronhe previous neuron doesn't do anything that conventional conventional computers don't do already. A more sophisticated neuron (figure 2) is the McCulloch and Pitts model (MCP). The difference from the previous model is that the inputs are 'weighted', the effect that each input has at decision making is dependent on the weight of the particular input. The weight of an input is a number which when multiplied with the input gives the weighted input. These weighted inputs are then added together and if they exceed a pre-set threshold value, the neuron fires. In any other case the neuron does notfire.Figure 2. An MCP neuronIn mathematical terms, the neuron fires if and only if;X1W1 + X2W2 + X3W3 + ... > TThe addition of input weights and of the threshold makes this neuron a very flexible and powerful one. The MCP neuron has the ability to adapt to┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊a particular situation by changing its weights and/or threshold. Various algorithms exist that cause the neuron to 'adapt'; the most used ones are the Delta rule and the back error propagation. The former is used in feed-forward networks and the latter in feedback networks.4 Architecture of neural networks4.1 Feed-forward networksFeed-forward ANNs (figure 1) allow signals to travel one way only; from input to output. There is no feedback (loops) i.e. the output of any layer does not affect that same layer. Feed-forward ANNs tend to be straight forward networks that associate inputs with outputs. They are extensively used in pattern recognition. This type of organisation is also referred to as bottom-up or top-down.4.2 Feedback networksFeedback networks (figure 1) can have signals travelling in both directions by introducing loops in the network. Feedback networks are very powerful and can get extremely complicated. Feedback networks are dynamic; their 'state' is changing continuously until they reach an equilibrium point. They remain at the equilibrium point until the input changes and a new equilibrium needs to be found. Feedback architectures are also referred to as interactive or recurrent, although the latter term is often used to denotefeedback connections in single-layer organisations.Figure 4.1 An example of a simplefeedforward networkFigure 4.2 An example of a complicatednetwork4.3 Network layersThe commonest type of artificial neural network consists of three groups, or layers, of units: a layer of "input" units is connected to a layer of "hidden" units, which is connected to a layer of "output" units. (see Figure┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊4.1)The activity of the input units represents the raw information that is fed into the network.The activity of each hidden unit is determined by the activities of the input units and the weights on the connections between the input and the hidden units.The behaviour of the output units depends on the activity of the hidden units and the weights between the hidden and output units.This simple type of network is interesting because the hidden units are free to construct their own representations of the input. The weights between the input and hidden units determine when each hidden unit is active, and so by modifying these weights, a hidden unit can choose what it represents.We also distinguish single-layer and multi-layer architectures. The single-layer organisation, in which all units are connected to one another, constitutes the most general case and is of more potential computational power than hierarchically structured multi-layer organisations. In multi-layer networks, units are often numbered by layer, instead of following a global numbering.4.4 PerceptronsThe most influential work on neural nets in the 60's went under the heading of 'perceptrons' a term coined by Frank Rosenblatt. The perceptron (figure 4.4) turns out to be an MCP model ( neuron with weighted inputs ) with some additional, fixed, pre--processing. Units labelled A1, A2, Aj , Ap are called association units and their task is to extract specific, localised featured from the input images. Perceptrons mimic the basic idea behind the mammalian visual system. They were mainly used in pattern recognition even though theircapabilities extended a lot more.Figure 4.4In 1969 Minsky and Papert wrote a book in which they described the limitations┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊of single layer Perceptrons. The impact that the book had was tremendous and caused a lot of neural network researchers to loose their interest. The book was very well written and showed mathematically that single layer perceptrons could not do some basic pattern recognition operations like determining the parity of a shape or determining whether a shape is connected or not. What they did not realised, until the 80's, is that given the appropriate training, multilevel perceptrons can do these operations.5. The Learning ProcessThe memorisation of patterns and the subsequent response of the network can be categorised into two general paradigms: associative mapping in which the network learns to produce a particular pattern on the set of input units whenever another particular pattern is applied on the set of input units. The associtive mapping can generally be broken down into two mechanisms: auto-association: an input pattern is associated with itself and the states of input and output units coincide. This is used to provide pattern completition, ie to produce a pattern whenever a portion of it or a distorted pattern is presented. In the second case, the network actually stores pairs of patterns building an association between two sets of patterns.hetero-association: is related to two recall mechanisms:nearest-neighbour recall, where the output pattern produced corresponds to the input pattern stored, which is closest to the pattern presented, and interpolative recall, where the output pattern is a similarity dependent interpolation of the patterns stored corresponding to the pattern presented. Yet another paradigm, which is a variant associative mapping is classification, ie when there is a fixed set of categories into which the input patterns are to be classified.regularity detection in which units learn to respond to particular properties of the input patterns. Whereas in asssociative mapping the network stores the relationships among patterns, in regularity detection the response of each unit has a particular 'meaning'. This type of learning mechanism is essential for feature discovery and knowledge representation.Every neural network posseses knowledge which is contained in the values of the connections weights. Modifying the knowledge stored in the network as a function of experience implies a learning rule for changing the values of the weights.┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊Information is stored in the weight matrix W of a neural network. Learning is the determination of the weights. Following the way learning is performed, we can distinguish two major categories of neural networks: fixed networks in which the weights cannot be changed, ie dW/dt=0. In such networks, the weights are fixed a priori according to the problem to solve.adaptive networks which are able to change their weights, ie dW/dt not= 0.All learning methods used for adaptive neural networks can be classified into two major categories:Supervised learning which incorporates an external teacher, so that each output unit is told what its desired response to input signals ought to be. During the learning process global information may be required. Paradigms of supervised learning include error-correction learning, reinforcement learning and stochastic learning.An important issue conserning supervised learning is the problem of error convergence, ie the minimisation of error between the desired and computed unit values. The aim is to determine a set of weights which minimises the error. One well-known method, which is common to many learning paradigms is the least mean square (LMS) convergence.Unsupervised learning uses no external teacher and is based upon only local information. It is also referred to as self-organisation, in the sense that it self-organises data presented to the network and detects their emergent collective properties. Paradigms of unsupervised learning are Hebbian lerning and competitive learning.Ano2.2 From Human Neurones to Artificial Neuronesther aspect of learning concerns the distinction or not of a seperate phase, during which the network is trained, and a subsequent operation phase. We say that a neural network learns off-line if the learning phase and the operation phase are distinct.A neural network learns on-line if it learns and operates at the same time.┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊Usually, supervised learning is performed off-line, whereas usupervised learning is performed on-line.5.1 Transfer FunctionThe behaviour of an ANN (Artificial Neural Network) depends on both the weights and the input-output function (transfer function) that is specified for the units. This function typically falls into one of three categories: linear (or ramp)thresholdsigmoidFor linear units, the output activity is proportional to the total weighted output.For threshold units, the output is set at one of two levels, depending on whether the total input is greater than or less than some threshold value. For sigmoid units, the output varies continuously but not linearly as the input changes. Sigmoid units bear a greater resemblance to real neurones than do linear or threshold units, but all three must be considered rough approximations.To make a neural network that performs some specific task, we must choose how the units are connected to one another (see figure 4.1), and we must set the weights on the connections appropriately. The connections determine whether it is possible for one unit to influence another. The weights specify the strength of the influence.We can teach a three-layer network to perform a particular task by using the following procedure:We present the network with training examples, which consist of a pattern of activities for the input units together with the desired pattern of activities for the output units.We determine how closely the actual output of the network matches the desired output.We change the weight of each connection so that the network produces a better approximation of the desired output.5.2 An Example to illustrate the above teaching procedure:Assume that we want a network to recognise hand-written digits. We might use an array of, say, 256 sensors, each recording the presence or absence of ink in a small area of a single digit. The network would therefore need 256 input units (one for each sensor), 10 output units (one for each kind of digit) and a number of hidden units.┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊For each kind of digit recorded by the sensors, the network should produce high activity in the appropriate output unit and low activity in the other output units.To train the network, we present an image of a digit and compare the actual activity of the 10 output units with the desired activity. We then calculate the error, which is defined as the square of the difference between the actual and the desired activities. Next we change the weight of each connection so as to reduce the error.We repeat this training process for many different images of each different images of each kind of digit until the network classifies every image correctly.To implement this procedure we need to calculate the error derivative for the weight (EW) in order to change the weight by an amount that is proportional to the rate at which the error changes as the weight is changed. One way to calculate the EW is to perturb a weight slightly and observe how the error changes. But that method is inefficient because it requires a separate perturbation for each of the many weights.Another way to calculate the EW is to use the Back-propagation algorithm which is described below, and has become nowadays one of the most important tools for training neural networks. It was developed independently by two teams, one (Fogelman-Soulie, Gallinari and Le Cun) in France, the other (Rumelhart, Hinton and Williams) in U.S.5.3 The Back-Propagation AlgorithmIn order to train a neural network to perform some task, we must adjust the weights of each unit in such a way that the error between the desired output and the actual output is reduced. This process requires that the neural network compute the error derivative of the weights (EW). In other words, it must calculate how the error changes as each weight is increased or decreased slightly. The back propagation algorithm is the most widely used method for determining the EW.The back-propagation algorithm is easiest to understand if all the units in the network are linear. The algorithm computes each EW by first computing the EA, the rate at which the error changes as the activity level of a unit is changed. For output units, the EA is simply the difference between the actual and the desired output. To compute the EA for a hidden unit in the layer just before the output layer, we first identify all the weights between that hidden unit and the output units to which it is connected. We then multiply those weights by the EAs of those output units and add the products. This sum┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊equals the EA for the chosen hidden unit. After calculating all the EAs in the hidden layer just before the output layer, we can compute in like fashion the EAs for other layers, moving from layer to layer in a direction opposite to the way activities propagate through the network. This is what gives back propagation its name. Once the EA has been computed for a unit, it is straight forward to compute the EW for each incoming connection of the unit. The EW is the product of the EA and the activity through the incoming connection.Note that for non-linear units, (see Appendix C) the back-propagation algorithm includes an extra step. Before back-propagating, the EA must be converted into the EI, the rate at which the error changes as the total input received by a unit is changed.6. Applications of neural networks6.1 Neural Networks in PracticeGiven this description of neural networks and how they work, what real world applications are they suited for? Neural networks have broad applicability to real world business problems. In fact, they have already been successfully applied in many industries.Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including: sales forecastingindustrial process controlcustomer researchdata validationrisk managementtarget marketingBut to give you some more specific examples; ANN are also used in the following specific paradigms: recognition of speakers in communications; diagnosis of hepatitis; recovery of telecommunications from faulty software; interpretation of multimeaning Chinese words; undersea mine detection; texture analysis; three-dimensional object recognition; hand-written word recognition; and facial recognition.6.2 Neural networks in medicineArtificial Neural Networks (ANN) are currently a 'hot' research area in medicine and it is believed that they will receive extensive application to biomedical systems in the next few years. At the moment, the research is mostly on modelling parts of the human body and recognising diseases from various scans (e.g. cardiograms, CAT scans, ultrasonic scans, etc.).┊┊┊┊┊┊┊┊┊┊┊┊┊装┊┊┊┊┊订┊┊┊┊┊线┊┊┊┊┊┊┊┊┊┊┊┊┊Neural networks are ideal in recognising diseases using scans since there is no need to provide a specific algorithm on how to identify the disease. Neural networks learn by example so the details of how to recognise the disease are not needed. What is needed is a set of examples that are representative of all the variations of the disease. The quantity of examples is not as important as the 'quantity'. The examples need to be selected very carefully if the system is to perform reliably and efficiently.6.2.1 Modelling and Diagnosing the Cardiovascular SystemNeural Networks are used experimentally to model the human cardiovascular system. Diagnosis can be achieved by building a model of the cardiovascular system of an individual and comparing it with the real time physiological measurements taken from the patient. If this routine is carried out regularly, potential harmful medical conditions can be detected at an early stage and thus make the process of combating the disease much easier.A model of an individual's cardiovascular system must mimic the relationship among physiological variables (i.e., heart rate, systolic and diastolic blood pressures, and breathing rate) at different physical activity levels. If a model is adapted to an individual, then it becomes a model of the physical condition of that individual. The simulator will have to be able to adapt to the features of any individual without the supervision of an expert. This calls for a neural network.Another reason that justifies the use of ANN technology, is the ability of ANNs to provide sensor fusion which is the combining of values from several different sensors. Sensor fusion enables the ANNs to learn complex relationships among the individual sensor values, which would otherwise be lost if the values were individually analysed. In medical modelling and diagnosis, this implies that even though each sensor in a set may be sensitive only to a specific physiological variable, ANNs are capable of detecting complex medical conditions by fusing the data from the individual biomedical sensors.6.2.2 Electronic nosesANNs are used experimentally to implement electronic noses. Electronic noses have several potential applications in telemedicine. Telemedicine is the practice of medicine over long distances via a communication link. The electronic nose would identify odours in the remote surgical environment. These identified odours would then be electronically transmitted to another site where an door generation system would recreate them. Because the sense。

基于SENET-DEEP的CTR预测方法

隹Isl^iSls V12021年第03期(总第219期)基于SENET-DEEP的CTR预测方法严武尉，马宁，付伟(哈尔滨师范大学计算机科学与信息工程学院,黑龙江哈尔滨150500)摘要:深度神经网络(Deep Neural Networks,DNN)在点击率预测(CTR)领域应用广泛。

这些模型通对特征之间的交互和改变深度网络结构来优化CTR预测模型。

然而现有的方法忽略了特征本身的重要性的对深度网络的影响，限制了模型的学习能力。

为了更好地预测用户可能点击的对象，文章提出了基于SENET机深度网络(Squeeze-and-Excitation Deep Network,SENET-Deep)模型。

该模型利用Squeeze-and-Excitation Networks(SENET)动态学习特征，同时引入深度神经网络提高了模型学习隐式交互的能力，既注重了在浅层网络中学习特征重要性的能力，也引入深层网络提高了模型的泛化能力。

两个真实数据集的实验表明，文中提出的模型在点击率预测性能上有着明显的提升。

关键词：点击率预测(CTR)；深度学习；动态权重;神经网络中图分类号:TP18文献标识码:A文章编号:2096-9759(2021)03-0055-04CTR Prediction Method Based on SENET-DEEPYan Wuwei,Ma Ning,Fu Wei(College of Computer Science and Information Engineering,Harbin Normal University,Harbin150500,China) Abstract:Deep Neural Networks(DNN)are widely used in click-through rate prediction(CTR).These models optimize the CTR prediction model by interacting with features and changing the deep network structure.However,the existing methods ignore the importance of features on the deep network,which limits the learning ability of the model.In order to diet which objects the user may click on better,we present a Squeeze-and-Deep Network(SENET)-based model.The model is the model for learning feature importance dynamically using Squize-and-Director-Networks(SENET).The ability to learn the implicit interaction is improved by introducing deep neural network.The ability to learn feature importance in the shallow network is emphasized and the generalization ability is improved by introducing deep network.Experiments on two real data sets show that the model proposed in this paper has a significant improvement in the performance of click-through rate prediction.Key words:Click-through rate prediction(CTR);Deep leaming;Dynamic weight;The neural network0引言近些年来，随着大数据时代的发展,广告点击率对互联网公司愈发重要，比如电商系统、电影豆瓣电影，内容的精准投放会给公司或者个人带来可观的收益，这些商品推荐任务的背后的技术就是是点击率预测。

一种基于交叉注意力机制多模态遥感图像匹配网络

第１３卷㊀第８期Ｖｏｌ．１３Ｎｏ．８㊀㊀智㊀能㊀计㊀算㊀机㊀与㊀应㊀用ＩｎｔｅｌｌｉｇｅｎｔＣｏｍｐｕｔｅｒａｎｄＡｐｐｌｉｃａｔｉｏｎｓ㊀㊀２０２３年８月㊀Ａｕｇ．２０２３㊀㊀㊀㊀㊀㊀文章编号：２０９５－２１６３（２０２３）０８－０１０７－０７中图分类号：ＴＰ３９１文献标志码：Ａ一种基于交叉注意力机制多模态遥感图像匹配网络石添鑫，曹帆之，韩开杨，邓新蒲，汪㊀璞（国防科技大学电子科学学院，长沙４１００７３）摘㊀要：多模态遥感图像之间复杂的非线性辐射失真和几何形变，给图像匹配带来了巨大的挑战㊂由于传统方法普遍采用人工设计的特征描述子，难以表达更深层次和更抽象的特征，在差异较大的图像间匹配结果较差㊂而现有的深度学习描述符不适合直接用于多模态遥感图像配准且普遍存在正确特征点提取较少，配准不稳定的问题㊂针对上述问题，本文提出一种基于交叉注意力机制的多模态遥感图像匹配网络㊂该网络利用相位一致抑制遥感图像之间的巨大差异，同时通过交叉注意力机制学习多模态图像匹配的描述符在小容量数据集上实现了神经网络的泛化训练㊂实验结果表明，所提算法在公开多模态遥感数据集上性能优异，且在其他领域的多模态数据上仍然有效㊂关键词：相位一致性；多模态；交叉注意力机制Ａｍｕｌｔｉｍｏｄａｌｒｅｍｏｔｅｓｅｎｓｉｎｇｉｍａｇｅｍａｔｃｈｉｎｇｎｅｔｗｏｒｋｂａｓｅｄｏｎｃｒｏｓｓ－ａｔｔｅｎｔｉｏｎｍｅｃｈａｎｉｓｍＳＨＩＴｉａｎｘｉｎ，ＣＡＯＦａｎｚｈｉ，ＨＡＮＫａｉｙａｎｇ，ＤＥＮＧＸｉｎｐｕ，ＷＡＮＧＰｕ（ＣｏｌｌｅｇｅｏｆＥｌｅｃｔｒｏｎｉｃＳｃｉｅｎｃｅａｎｄＴｅｃｈｎｏｌｏｇｙ，ＮａｔｉｏｎａｌＵｎｉｖｅｒｓｉｔｙｏｆＤｅｆｅｎｓｅＴｅｃｈｎｏｌｏｇｙ，Ｃｈａｎｇｓｈａ４１００７３，Ｃｈｉｎａ）ʌＡｂｓｔｒａｃｔɔＣｏｍｐｌｅｘｎｏｎ－ｌｉｎｅａｒｒａｄｉｏｍｅｔｒｉｃｄｉｓｔｏｒｔｉｏｎｓａｎｄｇｅｏｍｅｔｒｉｃｄｅｆｏｒｍａｔｉｏｎｓｂｅｔｗｅｅｎｍｕｌｔｉｍｏｄａｌｒｅｍｏｔｅｓｅｎｓｉｎｇｉｍａｇｅｓｐｏｓｅａｓｉｇｎｉｆｉｃａｎｔｃｈａｌｌｅｎｇｅｔｏｉｍａｇｅｍａｔｃｈｉｎｇ．Ａｓｔｒａｄｉｔｉｏｎａｌｍｅｔｈｏｄｓｇｅｎｅｒａｌｌｙｕｓｅｍａｎｕａｌｌｙｄｅｓｉｇｎｅｄｆｅａｔｕｒｅｄｅｓｃｒｉｐｔｏｒｓ，ｉｔｉｓｄｉｆｆｉｃｕｌｔｔｏｅｘｐｒｅｓｓｄｅｅｐｅｒａｎｄｍｏｒｅａｂｓｔｒａｃｔｆｅａｔｕｒｅｓ，ａｎｄｔｈｅｍａｔｃｈｉｎｇｒｅｓｕｌｔｓｂｅｔｗｅｅｎｉｍａｇｅｓｗｉｔｈｌａｒｇｅｄｉｆｆｅｒｅｎｃｅｓａｒｅｐｏｏｒ．Ｔｈｅｅｘｉｓｔｉｎｇｄｅｅｐｌｅａｒｎｉｎｇｄｅｓｃｒｉｐｔｏｒｓａｒｅｎｏｔｓｕｉｔａｂｌｅｆｏｒｄｉｒｅｃｔｕｓｅｉｎｍｕｌｔｉｍｏｄａｌｒｅｍｏｔｅｓｅｎｓｉｎｇｉｍａｇｅａｌｉｇｎｍｅｎｔａｎｄｇｅｎｅｒａｌｌｙｓｕｆｆｅｒｆｒｏｍｌｏｗｅｘｔｒａｃｔｉｏｎｏｆｃｏｒｒｅｃｔｆｅａｔｕｒｅｐｏｉｎｔｓａｎｄｕｎｓｔａｂｌｅａｌｉｇｎｍｅｎｔ．Ｔｏａｄｄｒｅｓｓｔｈｅｐｒｏｂｌｅｍｓ，ｔｈｅｐａｐｅｒｐｒｏｐｏｓｅｓａｍｕｌｔｉｍｏｄａｌｒｅｍｏｔｅｓｅｎｓｉｎｇｉｍａｇｅｍａｔｃｈｉｎｇｎｅｔｗｏｒｋｂａｓｅｄｏｎｃｒｏｓｓ－ａｔｔｅｎｔｉｏｎｍｅｃｈａｎｉｓｍ．Ａｆｔｅｒｔｈａｔ，ｔｈｅｒｅｓｅａｒｃｈｔａｋｅｓａｄｖａｎｔａｇｅｏｆｐｈａｓｅｃｏｈｅｒｅｎｃｅｔｏｏｂｔａｉｎｍｏｒｅｓｔａｂｌｅｆｅａｔｕｒｅｐｏｉｎｔｓ，ａｎｄｔｈｅｍｕｌｔｉ－ｍｏｄａｌｉｍａｇｅｍａｔｃｈｉｎｇｄｅｓｃｒｉｐｔｏｒｉｓｌｅａｒｎｅｄｂｙｔｈｅｃｒｏｓｓ－ａｔｔｅｎｔｉｏｎｍｅｃｈａｎｉｓｍｔｏｒｅａｌｉｚｅｔｈｅｇｅｎｅｒａｌｉｚａｔｉｏｎｔｒａｉｎｉｎｇｏｆｎｅｕｒａｌｎｅｔｗｏｒｋｓｏｎｓｍａｌｌｄａｔａｓｅｔｓ．Ｅｘｐｅｒｉｍｅｎｔａｌｒｅｓｕｌｔｓｓｈｏｗｔｈａｔｔｈｅｐｒｏｐｏｓｅｄａｌｇｏｒｉｔｈｍｐｅｒｆｏｒｍｓｗｅｌｌｏｎｐｕｂｌｉｃｌｙａｖａｉｌａｂｌｅｍｕｌｔｉ－ｍｏｄａｌｄａｔａｓｅｔｓａｎｄｒｅｍａｉｎｓｅｆｆｅｃｔｉｖｅｏｎｏｔｈｅｒｄｏｍａｉｎｓ．ʌＫｅｙｗｏｒｄｓɔｐｈａｓｅｃｏｎｓｉｓｔｅｎｃｙ；ｍｕｌｔｉ－ｍｏｄａｌ；ｃｒｏｓｓ－ａｔｔｅｎｔｉｏｎｍｅｃｈａｎｉｓｍ作者简介：石添鑫（１９９７－），女，硕士研究生，主要研究方向：图像匹配；曹帆之（１９９２－），男，博士研究生，主要研究方向：图像匹配；韩开杨（１９９９－），男，硕士研究生，主要研究方向：图像匹配；邓新蒲（１９６６－），男，博士，副教授，主要研究方向：视线确定；汪㊀璞（１９８６－），男，博士，讲师，主要研究方向：视线确定㊂通讯作者：汪㊀璞㊀㊀Ｅｍａｉｌ：ｗｐ４２１１８９＠ｎｕｄｔ．ｅｄｕ．ｃｎ收稿日期：２０２２－０９－１１０㊀引㊀言近年来，多模态遥感图像匹配引起了广泛关注㊂该研究目的是在２张或多张由不同的传感器㊁不同的视角或不同的时间获得的图像中识别同名点㊂由于不同传感器成像机制㊁成像条件不同，多模态图像之间存在明显的非线性辐射失真（ＮＲＤ）和几何畸变㊂因此多模态图像之间精确匹配仍然是一个具有挑战性的问题㊂最近研究表明，图像的结构和形状特性在不同的模态之间得以保留㊂Ｙｅ等学者［１］通过捕获了图像之间的形状相似性，提出了一种新的图像匹配相似度度量（ＤＬＳＣ），且与图像间强度无关㊂虽然该研究方法在处理图像间非线性强度差异效果较好，但如果图像包含很少的形状或轮廓信息，则ＤＬＳＣ的性能可能会下降㊂基于此，Ｙｅ等学者［２］又提出一种快速鲁棒的传统匹配框架，在所提框架中，图像的结构和形状属性由像素级特征表示，并将定向相位一致性直方图作为特征描述子，且获得了良好的结果㊂但该框架无法处理具有较大旋转和比例差异的图像㊂Ｌｉ等学者［３］发现相位一致性图（ＰＣ）具有很好的辐射鲁棒性，并构建最大索引图来削弱多模态图像的ＮＲＤ差异，提出了一种具有旋转不变性且对辐射变化不敏感的特征变换方法（ＲＩＦＴ）㊂但是ＲＩＦＴ方法不支持图像的尺度差异㊂Ｘｉｅ等学者［４］提出了基于ｌｏｇＧａｂｏｒ滤波的扩展相位相关算法（ＬＧＥＰＣ），更好地解决了ＮＲＤ以及大尺度差异和旋转变换问题，但该方法配准精度不太令人满意㊂这些传统方法均是人工制作的描述子，而这些描述子通常来自图像的外观信息，如颜色㊁纹理和梯度，难以表达更深层次和更抽象的特征㊂此外，人工特征描述符的系数和最佳的参数需要大量的手动调整㊂因此深度学习的方法渐渐受到人们的关注㊂在图像匹配的领域，基于深度学习的算法吸引了许多关注［５－７］㊂但是在多模态遥感图像匹配中，深度学习的方法并没有表现出极大的优势㊂一方面，因为将图像匹配的任务重新设计为可区分的端到端过程是具有挑战的㊂另一方面，正如文献［８］中所述，当前用于训练的本地多模态数据集还不够多样化，无法学习高质量且广泛适用的描述符㊂目前该领域只有少量深度学习方法是针对多模态设计的，大多仅适用于某一种类型的跨模态，例如可见光与ＳＡＲ图像匹配㊁红外与可见光图像匹配等㊂且现有的多模态匹配深度方法ＳＦｃＮｅｔ［９］㊁ＣＮｅｔ［１０］普遍存在提取正确特征点个数较少的问题㊂针对上述问题，本文提出一种基于交叉注意力机制的多模态遥感图像匹配网络（ＰＣＭ）㊂具体来说，利用相位一致性具有良好辐射鲁棒性，首先构建多模态图像的相位一致图（ＰＣ图），然后利用Ｆａｓｔ算法在ＰＣ图上来获得更多㊁更稳定的特征点，接着通过交叉注意力机制学习多模态图像的共有特征，得到特征点的描述子㊂最后，计算描述子之间的余弦距离，选取距离最短的点作为匹配点㊂实验表明该算法在公开多模态遥感数据集上性能优异，且在其他领域的多模态数据上仍然有效㊂１㊀背景知识１．１㊀注意力机制在２０１７年，Ｇｏｏｇｌｅ团队在论文‘Ａｔｔｅｎｔｉｏｎｉｓａｌｌｙｏｕｎｅｅｄ“［１１］中提出了一个自我注意的结构㊂这引起了巨大的反响，使注意机制成为最近研究的重要主题，该研究在各种ＮＬＰ任务中取得了成功，同时在视觉领域也开始尝试把自我注意的结构应用于各类任务中，如语义分割㊁图像分类㊁人类姿势估计等㊂注意机制旨在自动探索有意义的功能，以增强其表示能力并提高最终性能㊂自注意力机制的计算方式如下：Ａｔｔｅｎｔｉｏｎ（Ｑ，Ｋ，Ｖ）＝ｓｏｆｔｍａｘＱＫＴｄｋæèçöø÷Ｖ（１）Ｑ＝ＷＱＸ（２）Ｋ＝ＷＫＸ（３）Ｖ＝ＷＶＸ（４）㊀㊀其中，Ｘ表示输入的数据，Ｑ，Ｋ，Ｖ的值都是通过Ｘ和超参Ｗ相乘得到的㊂这里，Ｑ可理解为查询的变量，Ｋ为索引的变量，Ｖ为内容的变量㊂１．２㊀相位一致性相位一致性（ｐｈａｓｅｃｏｎｇｒｕｅｎｃｙ，ＰＣ）是将图像傅立叶分量中相位一致的点的集合㊂这是一个无量纲的量，其取值范围被标准化为０１，因此受图像亮度或对比度变化的影响较小㊂最早关注到图像相位信息是Ｏｐｐｅｎｈｅｉｍ等学者［１２］，研究中发现在信号的傅立叶表示中，在某些情况下如果仅保留相位，信号的许多重要特征就会得到保留㊂随后，Ｍｏｒｒｏｎｅ和Ｏｗｅｎｓ［１３］发现能量函数的极大值出现在相位一致的点上，因此提出了一种利用构造局部能量函数来检测和定位特征点算法㊂Ｋｏｖｅｓｉ［１４］对该方法做出了改进，克服了噪声等问题，使该方法的应用得以保证㊂目前，相位一致图已经广泛应用于图像边缘检测中㊂１．３㊀构建相位一致图本文利用相位一致性构建多模态图像的相位一致图（ＰＣ图），如图１所示㊂具体来说，本文使用Ｌｏｇ－Ｇａｂｏｒ小波在多个尺度和方向上计算，计算公式见式（５）：ＰＣ（ｘ，ｙ）＝ðｏðｓＷｏ（ｘ，ｙ）Ａｓｏ（ｘ，ｙ）ΔΦｓｏ（ｘ，ｙ）－ＴðｏðｓＡｓｏ（ｘ，ｙ）＋ε（５）其中，ＰＣ（ｘ，ｙ）表示相位一致性的大小；Ｗｏ是频率分布的权重因子；Ａｓｏ（ｘ，ｙ）为在小波尺度ｓ和方向ｏ上的（ｘ，ｙ）处的振幅；ε是一个很小值，为了防止分母为零；⌊．」运算符防止结果为负值，即封闭的值为正值时结果等于其本身，否则为零㊂ΔΦｓｏ（ｘ，ｙ）是一个敏感的相位偏差函数，定义为：Ａｓｏ（ｘ，ｙ）ΔΦｓｏ（ｘ，ｙ）＝（ｅｓｏ（ｘ，ｙ）ϕ－ｅ（ｘ，ｙ）＋㊀㊀㊀ｒｓｏ（ｘ，ｙ）ϕ－ｄ（ｘ，ｙ））－｜（ｅｓｏ（ｘ，ｙ）ϕ－ｄ（ｘ，ｙ）－㊀㊀㊀ｒｓｏ（ｘ，ｙ）ϕ－ｅ（ｘ，ｙ））（６）㊀ϕ－ｅ（ｘ，ｙ）＝ðｓðｏｅｓｏ（ｘ，ｙ）／Ｅ（ｘ，ｙ）（７）８０１智㊀能㊀计㊀算㊀机㊀与㊀应㊀用㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀第１３卷㊀㊀ϕ－ｄ（ｘ，ｙ）＝ðｓðｏｒｓｏ（ｘ，ｙ）／Ｅ（ｘ，ｙ）（８）其中，ｅｓｏ（ｘ，ｙ），ｒｓｏ（ｘ，ｙ）是将图像与偶对称小波和奇对称Ｌｏｇ－Ｇａｂｏｒ小波分别进行卷积，得到在尺度ｓ和方向ｏ上的响应㊂Ｅ（ｘ，ｙ）是一个局部能量函数，函数中的２部分通过信号和一对正交滤波器进行卷积来得到，即：Ｅ（ｘ，ｙ）＝ðｓðｏｅｓｏ（ｘ，ｙ）()２＋ðｓðｏｒｓｏ（ｘ，ｙ）()２（９）01002003004005000200400o r i g i nP C010020030040050002004000100200300400500200400o r i g i nP C0100200300400500200400图１㊀利用相位一致性构建多模态图像ＰＣ图Ｆｉｇ．１㊀ＣｏｎｓｔｒｕｃｔｉｏｎｏｆｍｕｌｔｉｍｏｄａｌｉｍａｇｅＰＣｍａｐｓｕｓｉｎｇｐｈａｓｅｃｏｈｅｒｅｎｃｅ２㊀模型构建在本节中，阐述了所提出的多模态遥感图像匹配方法㊂算法流程如图２所示㊂由图２可看到，本文算法主要由３个阶段组成，包括：特征点检测㊁特征描述符获取和特征点匹配㊂２．１㊀特征点检测在图像匹配的过程中，如何提取重复率高㊁分布均匀㊁且稳定的特征点也是近来的研究热点㊂在多模态图像匹配中由于存在较大的非线性辐射畸变，在自然图像上表现较好的特征点检测方法并不能完全适用㊂因此，本文利用相位一致性具有辐射鲁棒性，考虑构建多模态图像的ＰＣ图㊂通过构建的ＰＣ图，多模态图像之间共有的结构特性被保留下来㊂接着在ＰＣ图上进行特征点检测，具体来说，通过１．３节中式（５）获得图像的相位一致图，接着利用Ｆａｓｔ特征提取算法在ＰＣ图上提取一定数量的特征点㊂在ＰＣ图上利用Ｆａｓｔ算法提取特征点如图３所示㊂需要说明的是，在训练阶段本文选取了利用上述方法提取的特征点中，均匀分布的３０个特征点进行训练㊂输出匹配结果计算描述符之间的余弦距离，相似度最大的为对应点利用注意力机制网络获得特征点的描述符感知图像参考图像相位一致性相位一致性F as t 提取特征点F a s t 提取特征点P C 图P C 图图２㊀本文算法流程图Ｆｉｇ．２㊀Ｆｌｏｗｃｈａｒｔｏｆｔｈｅａｌｇｏｒｉｔｈｍｉｎｔｈｉｓｐａｐｅｒ图３㊀在ＰＣ图上利用Ｆａｓｔ算法提取特征点Ｆｉｇ．３㊀ＥｘｔｒａｃｔｉｏｎｏｆｆｅａｔｕｒｅｐｏｉｎｔｓｏｎＰＣｇｒａｐｈｕｓｉｎｇＦａｓｔａｌｇｏｒｉｔｈｍ２．２㊀特征描述符获取通过第一步得到特征点位置后，还要知道特征点的描述符，考虑采用人工设计的特征描述子，难以表达更深层次和更抽象的特征㊂并且人工特征描述符的系数和最佳参数需要大量的手动调整㊂因此本文利用深度学习的方法获得具有更好特征表达能力的描述子㊂本文算法提出一种基于交叉注意力机制的卷积神经网络㊂由于注意力机制是一种搜索全局特征的结构，需要的计算量和内存都较大，为了减少计算量和内存，考虑首先学习半稠密的描述符㊂具体网络结构如图４所示㊂首先，参考图像与感知图像经过一个卷积核大小为１１ˑ１１的大尺度卷积，提取浅层特征，此时特征维数为６４，接着经过３层ＶＧＧ－Ｂａｓｉｃｂｌｏｃｋ提取深度特征，每层网络包含２个卷积层㊁２个ＢＮ层㊁１个ｄｒｏｐｏｕｔ层，特征维数扩展为１２８㊂然后，再经过１个卷积核大小为１５ˑ１５的９０１第８期石添鑫，等：一种基于交叉注意力机制多模态遥感图像匹配网络大尺度卷积，获得全局特征，最后通过１个ｄｒｏｐｏｕｔ层，丢弃一些无用特征，这样就得到了大小为原图大小八分之一的特征图，特征通道为１２８维㊂但是由于图像之间差异较大，因此采用了互注意力机制，更好地学习彼此的共有的特征㊂通过上述步骤得到了半稠密描述符，此时的特征图尺寸为原图大小的八分之一㊂除此之外，还需要得到每个特征点对应的描述符，由于得到的特征图尺寸为原图大小的八分之一，无法利用特征点的位置直接在特征图上提取特征㊂因此，本文首先对原图上特征点的坐标进行归一化，接着根据输入特征图的尺度按比例恢复特征点坐标，见式（１０）：Ｘ，Ｙ()＝（ｘｈＨ２，ｙｗＷ２）（１０）㊀㊀其中，（Ｘ，Ｙ）为归一化后的特征点坐标；（ｘ，ｙ）为特征在原图的坐标位置；Ｈ，Ｗ分别为原图和特征图的长宽；ｈ和ｗ分别为特征图的长宽㊂但是这个新的坐标位置可能并非为整像素，此时要对其进行双线性插值补齐，然后其余特征通道按照同样的方式进行双线性插值㊂通过上述方法即得到了每个特征点对应的描述符㊂描述子计算描述子计算C r o s sA t t e n t i o nC r o s sA t t e n t i o nF a s tD r o p o u tD r o p o u tC o n v C o n vF a s tP CV G GB a s i c b l o c kV G GB a s i c b l o c kC o n v C o n vP C R e fS e nC o n vB NC o n vB NC o n vD r o p o u t(a )T h e p r o p o s e d o v e r a l l f r a m e w o r kA t t e n t i o n M a pQ KV(c 1)S e l f -A t t e n t i o n(c )C r o s s A t t e n t i o n S e l f -A t t e n t i o nS e l f -A t t e n t i o n F l a t t e nF l a t t e nV (S )K (S )Q (S )Q (R )K (R )V (R )(b )V G GB a s i c b l o c k S h a r e d w e i g h t s M u l t i p l i c a t i o nD o t p r o d u c t i o n S o f t m a xV :V a l u e K :K e yQ :Q u e r y图４㊀ＰＣＭ网络结构图Ｆｉｇ．４㊀ＰＣＭｎｅｔｗｏｒｋｓｔｒｕｃｔｕｒｅ２．３㊀特征点匹配在训练阶段，本文采用有监督训练，每对图像的标签已知㊂首先，利用２．１节中介绍的特征点检测的方法获得参考图像上的特征点位置（ｘｒ，ｙｒ），然后利用图像标签计算得到感知图像上的对应点位置（ｘｓ，ｙｓ），具体见式（３）：ｘ(ｓ，ｙｓ，１）＝Ｈˑ（ｘｒ，ｙｒ，１）（１１）㊀㊀其中，Ｈ为一个３ˑ３大小的矩阵，即为图像的标签㊂因此在特征匹配阶段，只需要计算考虑描述子间的损失函数，降低了训练的难度㊂本文损失函数参考ＳｕｐｅｒＰｏｉｎｔ［５］研究中给出的损失函数，将损失函数定义为合页损失（Ｈｉｎｇｅ－Ｌｏｓｓ），具体计算公式为：ＬｄＤ，Ｄᶄ，Ｓ()＝１ＨｃＷｃ()２ðＨｃ，Ｗｃｈ＝１ｗ＝１ðＨｃ，Ｗｃｈᶄ＝１ｗᶄ＝１ｌｄｄｈｗ，ｄᶄｈᶄｗᶄ；ｓｈｗｈᶄｗᶄ()（１２）其中，㊀ｌｄｄ，ｄᶄ；ｓ()＝λｄ∗ｓ∗ｍａｘ０，ｍｐ－ｄＴｄᶄ()＋１－ｓ()∗ｍａｘ０，ｄＴｄᶄ－ｍｎ()（１３）㊀㊀ｓｈｗｈᶄｗᶄ＝１㊀ｉｆ＾ＨＰｈｗ－Ｐｈᶄｗᶄ ɤ５０㊀ｏｔｈｅｒｗｉｓｅ{（１４）其中，λｄ为定义的权重；ｓｈｗｈᶄｗᶄ判断对应点是否匹配；Ｐｈᶄｗᶄ为双三次插值后特征点坐标；＾ＨＰｈｗ是对Ｐｈᶄｗᶄ做单应性变换Ｈ㊂ｄｈｗ为预测点的描述子；ｄᶄｈᶄｗᶄ为真值点的描述子㊂当ｄｈｗ和ｄᶄｈᶄｗᶄ越相似时，损失０１１智㊀能㊀计㊀算㊀机㊀与㊀应㊀用㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀第１３卷㊀函数越小㊂在本文中，设置λｄ＝２５０，ｍｐ＝１，ｍｎ＝０．２，λ＝０．０００１㊂３㊀实验与分析本节中，将本文所提方法与其它主流方法在匹配的性能㊁计算复杂度和推理时间等方面与进行比较㊂最后，在计算机视觉领域以及医学图像领域验证本算法的泛化性能㊂３．１㊀实验数据本文的训练集是从Ｌａｎｄｓａｔ８卫星影像上获取的不同波段的图像，对地分辨率为３０ｍ㊂训练集包含１１５３对大小为２５６ˑ２５６的图像㊂测试数据集选用了Ｊｉａｎｇ等学者［１５］提出的多模态图像匹配数据集㊂该数据集包括３个不同领域的多模态数据：计算机视觉领域㊁医学领域㊁遥感领域㊂本文的对比实验主要在其中的遥感数据上测试㊂同时，为了验证该算法的鲁棒性，在医学数据集中进行了泛化性能测试㊂实验设置在２４ＧＢＮＶＩＤＩＡ３０９０上，并进行网络训练测试㊂３．２㊀实验设置实验的性能指标主要为匹配精度（ＡＣＣ）㊁正确匹配点个数（ＮＣＭ）㊁匹配运行时间（ＲＴ），其中匹配正确点是指预测匹配点与真实匹配点之间距离不超过５个像素的点，而匹配精度是指正确匹配点个数与算法总匹配点个数的百分比㊂对比实验选取了４种对比算法，分别为ＲＩＦＴ［３］，ＨＡＰＣＧ［１６］，３ＭＲＳ［１７］，ＤＦＭ［７］，其中ＤＦＭ为深度学习的方法，但是其在论文中介绍该方法无需进行训练㊂上述方法均在Ｊｉａｎｇ等学者［１５］提出的多模态图像匹配数据集测试㊂为了更好地比较不同算法的性能，所有传统对比算法与本文算法均未使用误差点剔除模块，同时保证初始检测特征点数量相同，均设置为５０００个㊂３．３㊀算法性能比较表１展示了本算法与现有传统算法与深度算法在匹配精度上的对比结果㊂可以看出，本算法在光学图像与ＳＡＲ图像类型匹配中取得了最高的匹配精度，而同为深度学习方法的ＤＦＭ算法在地图图像与光学图像上匹配精度最大，其余３种传统方法则是在红外与光学图像上有最好的匹配精度㊂本文算法在所有类型上均优于传统算法，但是在某些多模态类型下的精度并没有ＤＦＭ算法高㊂不过通过具体的实验数据，５种方法在多模态图像匹配数据集的匹配精度对比如图５所示，可以发现ＤＦＭ算法在某些图像上匹配结果很好，但是在一些难度较大的图像上匹配精度为０㊂因此通过表１和图５可以看出，本文算法不仅具有较好精度，同时也具有很好的稳定性㊂表１㊀５种方法在多模态数据集上的匹配精度（ＡＣＣ）Ｔａｂ．１㊀Ｍａｔｃｈｉｎｇａｃｃｕｒａｃｙ（ＡＣＣ）ｏｆｔｈｅｆｉｖｅｍｅｔｈｏｄｓｏｎｔｈｅｍｕｌｔｉｍｏｄａｌｄａｔａｓｅｔ％方法类型ＲＩＦＴＨＡＰＣＧ３ＭＲＳＤＦＭＯＵＲＣｒｏｓｓＳｅａｓｏｎ１２．６０６．９３１３．０９２４．６６２５．６７ｄａｙｎｉｇｈｔ２０．１６５．２４２０．１１４０．００４１．６６ＤｅｐｔｈＯｐｔｉｃａｌ３０．４８６．１８３１．６８６１．７７５４．４５ＩｎｆｒａｒｅｄＯｐｔｉｃａｌ３５．０５１３．４３３６．１１２０．８３４４．０３ＭａｐＯｐｔｉｃａｌ２８．８５９．４７３１．４７４４．２９３４．６８ＯｐｔｉｃａｌＯｐｔｉｃａｌ３４．０４７．９５３３．５８８０．２７５８．９２ＳＡＲＯｐｔｉｃａｌ３４．４１５．１５３３．７０７１．１６７２．２２㊀㊀表２展示了本算法与现有传统算法及深度算法在匹配正确点个数上对比结果㊂从表２可以看出，不管哪种类型数据，在匹配正确点个数上本文算法均取得了最好的效果，同时在所有类型数据中，可见光与可见光匹配效果最好㊂㊀㊀５种算法在多模态数据集上的匹配时间对比结果见表３㊂从表３可以看出，不管哪种类型数据，本文算法运行速度较传统算法均提高了４１０倍，与深度方法对比也在大部分数据类型上都有更快的运行速度㊂表２㊀５种方法在多模态数据集上的匹配正确点个数（ＮＣＭ）Ｔａｂ．２㊀Ｎｕｍｂｅｒｏｆｃｏｒｒｅｃｔｌｙｍａｔｃｈｅｄｐｏｉｎｔｓ（ＮＣＭ）ｏｆｔｈｅｆｉｖｅｍｅｔｈｏｄｓｏｎｔｈｅｍｕｌｔｉｍｏｄａｌｄａｔａｓｅｔ方法类型ＲＩＦＴＨＡＰＣＧ３ＭＲＳＤＦＭＯＵＲＣｒｏｓｓＳｅａｓｏｎ１２８２０１１２５１８３３８ｄａｙｎｉｇｈｔ２６７１７４２５９１４９７ＤｅｐｔｈＯｐｔｉｃａｌ３９３２７１４８８１０８３１ＩｎｆｒａｒｅｄＯｐｔｉｃａｌ３５２４７１３４７３４１２ＭａｐＯｐｔｉｃａｌ３９２４０３４８８２３２８ＯｐｔｉｃａｌＯｐｔｉｃａｌ４６９２５６４４６１４６９１９ＳＡＲＯｐｔｉｃａｌ３９０２１１３５２１２４８０１１１第８期石添鑫，等：一种基于交叉注意力机制多模态遥感图像匹配网络表３㊀５种方法在多模态数据集上的匹配时间（ＲＴ）Ｔａｂ．３㊀Ｍａｔｃｈｉｎｇｔｉｍｅｓ（ＲＴ）ｏｆｔｈｅｆｉｖｅｍｅｔｈｏｄｓｏｎｔｈｅｍｕｌｔｉｍｏｄａｌｄａｔａｓｅｔ方法类型ＲＩＦＴＨＡＰＣＧ３ＭＲＳＤＦＭＯＵＲＣｒｏｓｓＳｅａｓｏｎ９．５２７．３４９．９４１．３８２．４３ｄａｙｎｉｇｈｔ１０．４８５．００１０．１７１．３０１．２２ＤｅｐｔｈＯｐｔｉｃａｌ１１．１０６．１２１２．７９１．５８１．４９ＩｎｆｒａｒｅｄＯｐｔｉｃａｌ７．２３４．６３６．７９１．５２１．２７ＭａｐＯｐｔｉｃａｌ１１．６５６．２６１３．０５１．８４１．３９ＯｐｔｉｃａｌＯｐｔｉｃａｌ１０．０２４．４６９．９２１．７３１．２７ＳＡＲＯｐｔｉｃａｌ８．５７５．６０７．９６１．９６１．５６３．４㊀算法鲁棒性实验表４为该算法在医学多模态数据上的实验结果㊂由表４可以看出，本文算法即使在医学多模态图像上测试，在３种指标下都有不错的结果，证明本算法具有较高的鲁棒性㊂表４㊀本文算法在医学多模态数据上的实验结果Ｔａｂ．４㊀Ｅｘｐｅｒｉｍｅｎｔａｌｒｅｓｕｌｔｓｏｆｔｈｉｓｐｒｏｐｏｓｅｄａｌｇｏｒｉｔｈｍｏｎｍｅｄｉｃａｌｍｕｌｔｉｍｏｄａｌｄａｔａ类型指标ＡＣＣ／％ＮＣＭＲＴ／ｓＭＲ＿ＰＥＴ１３．００２６０．５２ＰＤ＿Ｔ１９７．３１４０００．３０ＰＤ＿Ｔ２９８．４９５０４０．２７Ｒｅｔｉｎａ６１．３６１６８８３．７０ＳＰＥＣＴ＿ＣＴ２３．５３５１０．４９Ｔ１＿Ｔ２９９．１０５２２０．２９平均值６５．４７５３２０．９３R I F TH A P C G3M R S O U R D F M100908070605040302010匹配精度/%C S 1C S 2C S 3C S 4C S 5D N 1D N 2D N 3D N 4D N 5D O 1D O 2D O 3D O 4D O 5D O 6D O 7D O 8I O 1I O 2I O 3I O 4M O 7M O 1M O 2M O 3M O 4M O 5M O 6O O 1O O 2O O 3O O 4O O 5O O 6S O 1S O 2S O 3S O 4S O 5S O 6C S :不同季节D N :不同光照条件D O ：深度图与可见光图I O ：红外图与可见光图M O ：地图与可见光图O O ：可见光图与可见光图S O :合成孔径雷达图与可见光图图５㊀５种方法在多模态图像匹配数据集的匹配精度图Ｆｉｇ．５㊀Ｐｌｏｔｏｆｍａｔｃｈｉｎｇａｃｃｕｒａｃｙｏｆｔｈｅｆｉｖｅｍｅｔｈｏｄｓｏｎｔｈｅｍｕｌｔｉｍｏｄａｌｉｍａｇｅｍａｔｃｈｉｎｇｄａｔａｓｅｔ４㊀结束语针对多模态遥感数据匹配的难点问题，图像间存在非线性辐射差异，本文提出一种基于交叉注意力机制的多模态遥感图像匹配网络㊂该网络利用相位一致性获得更稳定的特征点，同时利用交叉注意力机制学习多模态图像共有特征，在更容易获得的多波段遥感小容量数据集上进行训练㊂实验结果表明，本文方法在公开数据集上匹配性能优异，并在其他领域的多模态数据上仍然有效㊂但是当图像间有较大的旋转或者尺度差异性能会下降，后续将考虑对训练数据进行增强，同时优化网络结构进一步提高匹配速度㊂参考文献［１］ＹＥＹｕａｎｘｉｎ，ＬｉＳｈｅｎ，ＨＡＯＭｉｎｇ，ｅｔａｌ．Ｒｏｂｕｓｔｏｐｔｉｃａｌ－ｔｏ－ＳＡＲｉｍａｇｅｍａｔｃｈｉｎｇｂａｓｅｄｏｎｓｈａｐｅｐｒｏｐｅｒｔｉｅｓ［Ｊ］．ＩＥＥＥＧｅｏｓｃｉｅｎｃｅａｎｄＲｅｍｏｔｅＳｅｎｓｉｎｇＬｅｔｔｅｒｓ，２０１７，１４（４）：５６４－５６８．［２］ＹＥＹｕａｎｘｉｎ，ＢＲＵＺＺＯＮＥＬ，ＳＨＡＮＪｉｅ，ｅｔａｌ．Ｆａｓｔａｎｄｒｏｂｕｓｔｍａｔｃｈｉｎｇｆｏｒｍｕｌｔｉｍｏｄａｌｒｅｍｏｔｅｓｅｎｓｉｎｇｉｍａｇｅｒｅｇｉｓｔｒａｔｉｏｎ［Ｊ］．ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＧｅｏｓｃｉｅｎｃｅａｎｄＲｅｍｏｔｅＳｅｎｓｉｎｇ，２０１９，５７（１１）：９０５９－９０７０．［３］ＬＩＪｉａｙｕａｎ，ＨＵＱｉｎｇｗｕ，ＡＩＭｉｎｇｙａｏ．ＲＩＦＴ：Ｍｕｌｔｉ－ｍｏｄａｌｉｍａｇｅｍａｔｃｈｉｎｇｂａｓｅｄｏｎｒａｄｉａｔｉｏｎ－ｖａｒｉａｔｉｏｎｉｎｓｅｎｓｉｔｉｖｅｆｅａｔｕｒｅｔｒａｎｓｆｏｒｍ［Ｊ］．ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＩｍａｇｅＰｒｏｃｅｓｓｉｎｇ，２０２０，２９：３２９６－３３１０．２１１智㊀能㊀计㊀算㊀机㊀与㊀应㊀用㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀㊀第１３卷㊀［４］ＸＩＥＸｕｎｗｅｉ，ＺＨＡＮＧＹｏｎｇｊｕｎ，ＸＩＡＯＬｉｎｇ，ｅｔａｌ．Ａｎｏｖｅｌｅｘｔｅｎｄｅｄｐｈａｓｅｃｏｒｒｅｌａｔｉｏｎａｌｇｏｒｉｔｈｍｂａｓｅｄｏｎｌｏｇ－Ｇａｂｏｒｆｉｌｔｅｒｉｎｇｆｏｒｍｕｌｔｉｍｏｄａｌｒｅｍｏｔｅｓｅｎｓｉｎｇｉｍａｇｅｒｅｇｉｓｔｒａｔｉｏｎ［Ｊ］．ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＲｅｍｏｔｅＳｅｎｓｉｎｇ，２０１９，４０（１４）：５４２９–５４５３．［５］ＤＥＴＯＮＥＤ，ＭＡＬＩＳＩＥＷＩＣＺＴ，ＲＡＢＩＮＯＶＩＣＨＡ．ＳｕｐｅｒＰｏｉｎｔ：Ｓｅｌｆ－ｓｕｐｅｒｖｉｓｅｄＩｎｔｅｒｅｓｔｐｏｉｎｔｄｅｔｅｃｔｉｏｎａｎｄｄｅｓｃｒｉｐｔｉｏｎ［Ｊ］．ａｒＸｉｖｐｒｅｐｒｉｎｔａｒＸｉｖ：１７１２．０７６２９，２０１８．［６］ＳＡＲＬＩＮＰＥ，ＤＥＴＯＮＥＤ，ＭＡＬＩＳＩＥＷＩＣＺＴ，ｅｔａｌ．ＳｕｐｅｒＧｌｕｅ：ＬｅａｒｎｉｎｇｆｅａｔｕｒｅｍａｔｃｈｉｎｇｗｉｔｈｇｒａｐｈＮｅｕｒａｌＮｅｔｗｏｒｋｓ［Ｃ］／／２０２０ＩＥＥＥ／ＣＶＦＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ（ＣＶＰＲ）．Ｓｅａｔｔｌｅ，ＷＡ，ＵＳＡ：ＩＥＥＥ，２０２０：４９３７－４９４６．［７］ＥＦＥＵ，ＩＮＣＥＫＧ，ＡＬＡＴＡＮＡＡ．ＤＦＭ：Ａｐｅｒｆｏｒｍａｎｃｅｂａｓｅｌｉｎｅｆｏｒｄｅｅｐｆｅａｔｕｒｅｍａｔｃｈｉｎｇ［Ｊ］．２０２１ＩＥＥＥ／ＣＶＦＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎＷｏｒｋｓｈｏｐｓ（ＣＶＰＲＷ）．Ｎａｓｈｖｉｌｌｅ，ＴＮ，ＵＳＡ：ＩＥＥＥ，２０２１：４２７９－４２８８．［８］ＳＣＨＯＮＢＥＲＧＥＲＪＬ，ＨＡＲＤＭＥＩＥＲＨ，ＳＡＴＴＬＥＲＴ，ｅｔａｌ．Ｃｏｍｐａｒａｔｉｖｅｅｖａｌｕａｔｉｏｎｏｆｈａｎｄ－ｃｒａｆｔｅｄａｎｄｌｅａｒｎｅｄｌｏｃａｌｆｅａｔｕｒｅｓ［Ｃ］／／２０１７ＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ（ＣＶＰＲ）．Ｈｏｎｏｌｕｌｕ，ＨＩ：ＩＥＥＥ，２０１７：６９５９－６９６８．［９］ＺＨＡＮＧＨａｎ，ＮＩＷｅｉｐｉｎｇ，ＹＡＮＷｅｉｄｏｎｇ，ｅｔａｌ．Ｒｅｇｉｓｔｒａｔｉｏｎｏｆｍｕｌｔｉｍｏｄａｌｒｅｍｏｔｅｓｅｎｓｉｎｇｉｍａｇｅｂａｓｅｄｏｎｄｅｅｐｆｕｌｌｙｃｏｎｖｏｌｕｔｉｏｎａｌｎｅｕｒａｌｎｅｔｗｏｒｋ［Ｊ］．ＩＥＥＥＪｏｕｒｎａｌｏｆＳｅｌｅｃｔｅｄＴｏｐｉｃｓｉｎＡｐｐｌｉｅｄＥａｒｔｈＯｂｓｅｒｖａｔｉｏｎｓａｎｄＲｅｍｏｔｅＳｅｎｓｉｎｇ，２０１９，１２（８）：３０２８－３０４２．［１０］ＱＵＡＮＤｏｕ，ＷＡＮＧＳｈｕａｎｇ，ＧＵＹｕ，ｅｔａｌ．Ｄｅｅｐｆｅａｔｕｒｅｃｏｒｒｅｌａｔｉｏｎｌｅａｒｎｉｎｇｆｏｒｍｕｌｔｉ－ｍｏｄａｌｒｅｍｏｔｅｓｅｎｓｉｎｇｉｍａｇｅｒｅｇｉｓｔｒａｔｉｏｎ［Ｊ］．ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＧｅｏｓｃｉｅｎｃｅａｎｄＲｅｍｏｔｅＳｅｎｓｉｎｇ，２０２２，６０：１－１６．［１１］ＶＡＳＷＡＮＩＡ，ＳＨＡＺＥＥＲＮ，ＰＡＲＭＡＲＮ，ｅｔａｌ．Ａｔｔｅｎｔｉｏｎｉｓａｌｌｙｏｕｎｅｅｄ［Ｃ］／／ＡｄｖａｎｃｅｓｉｎＮｅｕｒａｌＩｎｆｏｒｍａｔｉｏｎＰｒｏｃｅｓｓｉｎｇＳｙｓｔｅｍ．ＬｏｎｇＢｅａｃｈ：ＮＩＰＳＦｏｕｎｄａｔｉｏｎ，２０１７：５９９８－６００８．［１２］ＯＰＰＥＮＨＥＩＭＡＶ，ＬＩＭＪＳ．Ｔｈｅｉｍｐｏｒｔａｎｃｅｏｆｐｈａｓｅｉｎｓｉｇｎａｌｓ［Ｊ］．ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩＥＥＥ，１９８１，６９（５）：５２９－５４１．［１３］ＭＯＲＲＯＮＥＭＣ，ＯＷＥＮＳＲＡ．Ｆｅａｔｕｒｅｄｅｔｅｃｔｉｏｎｆｒｏｍｌｏｃａｌｅｎｅｒｇｙ［Ｊ］．ＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎＬｅｔｔｅｒｓ，１９８７，６（５）：３０３－３１３．［１４］ＫＯＶＥＳＩＰ．Ｉｍａｇｅｆｅａｔｕｒｅｓｆｒｏｍｐｈａｓｅｃｏｎｇｒｕｅｎｃｙ［Ｊ］．ＱｕａｒｔｅｒｌｙＪｏｕｒｎａｌ，１９９９，１（３）：１－２７．［１５］ＪＩＡＮＧＸｉｎｇｙｕ，ＭＡＪｉａｙｉ，ＸＩＡＯＧｕｏｂａｏ，ｅｔａｌ．Ａｒｅｖｉｅｗｏｆｍｕｌｔｉｍｏｄａｌｉｍａｇｅｍａｔｃｈｉｎｇ：Ｍｅｔｈｏｄｓａｎｄａｐｐｌｉｃａｔｉｏｎｓ［Ｊ］．ＩｎｆｏｒｍａｔｉｏｎＦｕｓｉｏｎ，２０２１，７３：２２－７１．［１６］姚永祥，张永军，万一，等．顾及各向异性加权力矩与绝对相位方向的异源影像匹配［Ｊ］．武汉大学学报（信息科学版），２０２１，４６（１１）：１７２７－１７３６．［１７］ＦＡＮＺｈｏｎｇｌｉ，ＬＩＵＹｕｘｉａｎ，ＬＩＵＹｕｘｕａｎ，ｅｔａｌ．３ＭＲＳ：Ａｎｅｆｆｅｃｔｉｖｅｃｏａｒｓｅ－ｔｏ－ｆｉｎｅｍａｔｃｈｉｎｇｍｅｔｈｏｄｆｏｒｍｕｌｔｉｍｏｄａｌｒｅｍｏｔｅｓｅｎｓｉｎｇｉｍａｇｅｒｙ［Ｊ］．ＲｅｍｏｔｅＳｅｎｓｉｎｇ，２０２２，１４（３）：４７８．［１８］ＱＵＡＮＤｏｕ，ＷＡＮＧＳｈｕａｎｇ，ＧＵＹｕ，ｅｔａｌ．Ｄｅｅｐｆｅａｔｕｒｅｃｏｒｒｅｌａｔｉｏｎｌｅａｒｎｉｎｇｆｏｒｍｕｌｔｉ－ｍｏｄａｌｒｅｍｏｔｅｓｅｎｓｉｎｇｉｍａｇｅｒｅｇｉｓｔｒａｔｉｏｎ［Ｊ］．ＴｒａｎｓａｃｔｉｏｎｓｏｎＧｅｏｓｃｉｅｎｃｅ＆ＲｅｍｏｔｅＳｅｎｓｉｎｇ，２０２２，６０：１－１６．（上接第１０６页）［４］茆训诚，崔百胜，王周伟．多元旋转自回归条件异方差模型的构建与应用研究以九种人民币汇率波动为例［Ｊ］．上海经济研究，２０１４（０１）：７０－８２．［５］ＷＡＮＧＹＦ．Ｐｒｅｄｉｃｔｉｎｇｓｔｏｃｋｐｒｉｃｅｕｓｉｎｇｆｕｚｚｙｇｒｅｙｐｒｅｄｉｃｔｉｏｎｓｙｓｔｅｍ［Ｊ］．ＥｘｐｅｒｔＳｙｓｔｅｍｓＷｉｔｈＡｐｐｌｉｃａｔｉｏｎｓ，２００２，２２（１）：３３－３８．［６］ＪＩＬＡＮＩＴＡ，ＢＵＲＮＥＹＳＭＡ．Ａｒｅｆｉｎｅｄｆｕｚｚｙｔｉｍｅｓｅｒｉｅｓｍｏｄｅｌｆｏｒｓｔｏｃｋｍａｒｋｅｔｆｏｒｅｃａｓｔｉｎｇ［Ｊ］．ＰｈｙｓｉｃａＡ：ＳｔａｔｉｓｔｉｃａｌＭｅｃｈａｎｉｃｓａｎｄｉｔｓＡｐｐｌｉｃａｔｉｏｎｓ，２００８，３８７（１２）：２８５７－２８６２．［７］ＫＡＲＡＴＺＩＮＩＳＧＤ，ＢＯＵＴＡＬＩＳＹＳ．Ｆｕｚｚｙｃｏｇｎｉｔｉｖｅｎｅｔｗｏｒｋｓｗｉｔｈｆｕｎｃｔｉｏｎａｌｗｅｉｇｈｔｓｆｏｒｔｉｍｅｓｅｒｉｅｓａｎｄｐａｔｔｅｒｎｒｅｃｏｇｎｉｔｉｏｎａｐｐｌｉｃａｔｉｏｎｓ［Ｊ］．ＡｐｐｌｉｅｄＳｏｆｔＣｏｍｐｕｔｉｎｇＪｏｕｒｎａｌ，２０２１，１０６：１０７４１５．［８］ＦＥＮＧＧｕｏｌｉａｎｇ，ＺＨＡＮＧＬｉｙｏｎｇ，ＹＡＮＧＪｉａｎｈｕａ，ｅｔａｌ．Ｌｏｎｇ－ｔｅｒｍｐｒｅｄｉｃｔｉｏｎｏｆｔｉｍｅｓｅｒｉｅｓｕｓｉｎｇｆｕｚｚｙｃｏｇｎｉｔｉｖｅｍａｐｓ［Ｊ］．ＥｎｇｉｎｅｅｒｉｎｇＡｐｐｌｉｃａｔｉｏｎｓｏｆＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ，２０２１，１０２：１０４２７４．［９］马楠，杨炳儒，邱正强，等．基于测度递进的模糊认知图及其应用［Ｊ］．计算机工程与设计，２０１２，３３（０５）：１９５８－１９６２．［１０］ＹＡＮＧＳｈａｎｃｈａｏ，ＬＩＵＪｉｎｇ．Ｔｉｍｅ－ｓｅｒｉｅｓｆｏｒｅｃａｓｔｉｎｇｂａｓｅｄｏｎｈｉｇｈ－ｏｒｄｅｒｆｕｚｚｙｃｏｇｎｉｔｉｖｅｍａｐｓａｎｄｗａｖｅｌｅｔｔｒａｎｓｆｏｒｍ［Ｊ］．ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＦｕｚｚｙＳｙｓｔｅｍｓ，２０１８，２６（６）：３３９１－３４０２．［１１］ＬＩＵＺｏｎｇｄｏｎｇ，ＬＩＵＪｉｎｇ．Ａｒｏｂｕｓｔｔｉｍｅｓｅｒｉｅｓｐｒｅｄｉｃｔｉｏｎｍｅｔｈｏｄｂａｓｅｄｏｎｅｍｐｉｒｉｃａｌｍｏｄｅｄｅｃｏｍｐｏｓｉｔｉｏｎａｎｄｈｉｇｈ－ｏｒｄｅｒｆｕｚｚｙｃｏｇｎｉｔｉｖｅｍａｐｓ［Ｊ］．Ｋｎｏｗｌｅｄｇｅ－ＢａｓｅｄＳｙｓｔｅｍｓ，２０２０，２０３（ｐｒｅｐｕｂｌｉｓｈ）．［１２］ＨＵＡＮＧＹｕｓｈｅｎｇ，ＧＡＯＹｅｌｉｎ，ＧＡＮＹａｎ，ｅｔａｌ．Ａｎｅｗｆｉｎａｎｃｉａｌｄａｔａｆｏｒｅｃａｓｔｉｎｇｍｏｄｅｌｕｓｉｎｇｇｅｎｅｔｉｃａｌｇｏｒｉｔｈｍａｎｄｌｏｎｇｓｈｏｒｔ－ｔｅｒｍｍｅｍｏｒｙｎｅｔｗｏｒｋ［Ｊ］．Ｎｅｕｒｏｃｏｍｐｕｔｉｎｇ，２０２０（ｐｒｅｐｕｂｌｉｓｈ）．［１３］林春梅，何跃，汤兵勇，等．模糊认知图在股票市场预测中的应用研究［Ｊ］．计算机应用，２００６，２６（０１）：１９５－１９７，２０１．３１１第８期石添鑫，等：一种基于交叉注意力机制多模态遥感图像匹配网络。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

GeneralizationbyNeuralNetworksShashiShekharMineshB.Amin

DepartmentofComputerScienceUniversityofMinnesotaMinneapolis,MN55455.

ABSTRACTNeuralnetworkshavetraditionallybeenappliedtorecognitionproblems,andmostlearn-ingalgorithmsaretailoredtothoseproblems.Wediscusstherequirementsoflearningforgen-eralization,wherethetraditionalmethodsbasedongradientdescenthavelimitedsuccess.Wepresentanewstochasticlearningalgorithmbasedonsimulatedannealinginweightspace.Weverifytheconvergencepropertiesandfeasibilityofthealgorithm.Wealsodescribeanimple-mentationofthealgorithmandvalidationexperiments.

1.IntroductionNeuralnetworksarebeingappliedtoawidevarietyofapplicationsfromspeechgeneration[1],tohandwritingrecognition[2].Lastdecadehasseengreatadvancesindesignofneuralnetworksforaclassofproblemscalledrecognitionproblems,andindesignoflearningalgorithms[3-5,5-7].Thelearningofweightsforneuralnetworkformanyrecognitionproblemisnolongeradifficulttask.However,designinganeuralnetworkforgeneralizationproblemisnotwellunderstood.

Domainsofneuralnetworkapplicationscanbeclassifiedintotwobroadcategories--recognitionandgen-eralization[1,8].Forbothclasses,wefirsttraintheneuralnetworkonasetofinput-outputpairs(I1,O1),(I2,O2),...,(In,On).Inrecognitionproblems,thetrainednetworkistestedwithapreviouslyseeninputIj(1≤j≤n)corruptedbynoiseasshowninFig.1.ThetrainednetworkisexpectedtoreproducetheoutputOjcorrespondingtoIj,inspiteofthenoise.Shaperecognition[9,10],andhandwritingrecognition[2]areexamplesofrecognitionprob-lems.Ontheotherhand,ingeneralizationproblems,thetrainedneuralnetworkistestedwithinputIn+1,whichisdistinctfromtheinputsI1,I2,...,Inusedfortrainingthenetwork,asshowninFig.1.ThenetworkisexpectedtocorrectlypredicttheoutputOn+1fortheinputIn+1fromthemodelithaslearnedthroughtraining.TypicalexamplesofgeneralizationproblemsareBondRating[11]andRobotics[12].Neuralnetworksforgeneralizationproblemsareimportant,sincetherearemanyapplications,ofenormousimportanceintherealworld[13-15]whichwouldbenefitfromthiswork.Inmanyoftheseapplicationsitisdifficulttosuccessfullyapplyeitherconventionalmathematicaltechniques(e.g.,statisticalregression)orstandardAIapproaches(e.g.,rulebasedsystems).Aneuralnetworkwithgeneralizationabilitywillbeusefulforsuchdomains[11],becauseitdoesnotrequireanapriorispecificationofafunctionaldomainmodel;ratheritattemptstolearntheunderlyingdomainmodelfromthetraininginput-outputexamples.−2−TrainingofNeuralnet

TestingExampleNetworkNeuralofTrainingNetworkNeuralTrainedI1I2O1O2InOn****LearningExamplesRecognitionProblemNetworkNeuralofTrainingExampleTestingNetworkNeuralTrainedNoiseI1I2O1O2InOn****LearningExamples**In+1On+1In+2On+2

GeneralizationProblemFigure1.:ClassesofProblemsThelearningalgorithmforgeneralizationproblems,shouldbedifferentfromthelearningalgorithmforrecognitionproblems.Inrecognitionproblems,thenetworkisexpectedtoreproduceoneofthepreviouslyseenout-puts.Thenetworkmayremembertheoutputsandinputsbyfittingacurvethroughthe(Ii,Oi)pairsusedfortrain-ing.Toremembertheoutputs,oneoftenuseslargenetworkswithmanynodesandweights.Howevermemoriza-tionoflearningsamplesisnotsuitedforgeneralizationproblems,sinceitcanleadtoworseperformanceduringpredictionofoutputsonunseeninputs.Furthermore,generalizationproblemsallowasmallamountoferrorintheoutputpredictedbythenetworkandhencethefittedcurveneednotpassthroughany(Ii,Oi)pairusedfortraining.Networksaddressinggeneralizationproblemmayinsteadfitasimple‡curve(e.g.alowdegreepolynomial,orbasicanalyticalfunctionslikelog(x),sine(x),tangent(x)etc.)throughtheinput-outputpairsratherthanfittingacrookedcurve.Theneuralnetworkusedingeneralizationproblemstendtobesimplerwithsmallnumberofhiddennodes,layers,andinterconnectionedgesandweights,enablingonetousecomputationallysophisticatedalgorithms.

Mostoftheearlierworkinneuralnetworks[4,9]isrelatedtorecognitionproblems.Therehasbeenlittleresearchtowardsdevelopingneuralnetworkmodelsforgeneralizationproblems[16,17].−3−Wepresentanewlearningalgorithm,stochasticbackpropagationforgeneralizationproblems.Weverifytheconvergenceofthealgorithmandprovidetheoreticalargumentstowardsthecapabilityoftheproposedalgorithmtodiscoveroptimalweights.Wealsodescribeanimplementationofthealgorithmandourexperiencewiththealgo-rithmsinsolvinggeneralizationproblems.

2.ProblemFormulationGeneralizationproblemsforneuralnetworkshavebeenformulatedinthreedifferentways:(a)analyticalformalism[18-24],(b)constructivefunctionlearning[25-27],and(c)symbolicsemanticnetwork[8].Theanalyticalformalismfocusesontheexistenceofnetworkswithacapabilitytogeneralize.Italsoprovideworstcasetimecom-plexitytodiscoversuchnetworkstosolvearbitrarygeneralizationproblems.Itdoesnotprovideawaytodiscoverthenetworks.Theconstructivefunctionlearningformalismapproachesgeneralizationproblemsinacomplemen-taryfashion.Itstudiesalgorithmstodiscovernetworkswhichcansolveaclassofgeneralizationproblems.Itsaimsatdiscoveringafunctionfmappingtheinputdomaintotheoutputdomainfromasetoflearningexamples.Thefunctiontobediscoveredmaybedefinedoverbooleannumbersoroverrealnumbers.Theinputsandoutputsareassumedtobenumberswithnosymbolicmeaning.Thefunctionandnetworkdonotrepresentsymbolicmeaningbeyondthenumericcomputation.Thethirdapproachofsymbolicsemanticnetworkassociatessymbolicmeaningtothenetwork.Generalizationoccursbyattachinganewnodetoanappropriateparentnodeinthenetworktoinheritthepropertiesoftheparent.