Incremental Semi-Supervised Subspace Learning for Image Retrieval

合集下载

迁移学习中的领域自适应方法-JindongWangisHere

极视角学术分享王晋东中国科学院计算技术研究所2017年12月14日1迁移学习简介23451 迁移学习的背景⏹智能大数据时代⏹数据量，以及数据类型不断增加⏹对机器学习模型的要求：快速构建和强泛化能力⏹虽然数据量多，但是大部分数据往往没有标注⏹收集标注数据，或者从头开始构建每一个模型，代价高昂且费时⏹对已有标签的数据和模型进行重用成为了可能⏹传统机器学习方法通常假定这些数据服从相同分布，不再适用文本图片及视频音频行为1 迁移学习简介⏹迁移学习⏹通过减小源域(辅助领域)到目标域的分布差异，进行知识迁移，从而实现数据标定。

⏹核心思想⏹找到不同任务之间的相关性⏹“举一反三”、“照猫画虎”，但不要“东施效颦”（负迁移）减小差异知识迁移135源域数据标记数据难获取1 迁移学习应用场景⏹应用前景广阔⏹模式识别、计算机视觉、语音识别、自然语言处理、数据挖掘…不同视角、不同背景、不同光照的图像识别语料匮乏条件下不同语言的相互翻译学习不同用户、不同设备、不同位置的行为识别不同领域、不同背景下的文本翻译、舆情分析不同用户、不同接口、不同情境的人机交互不同场景、不同设备、不同时间的室内定位⏹数据为王，计算是核心⏹数据爆炸的时代！⏹计算机更强大了！⏹但是⏹大数据、大计算能力只是有钱人的游戏⏹云+端的模型被普遍应用⏹通常需要对设备、环境、用户作具体优化⏹个性化适配通常很复杂、很耗时⏹对于不同用户，需要不同的隐私处理方式⏹特定的机器学习应用⏹推荐系统中的冷启动问题：没有数据，如何作推荐？⏹为什么需要迁移学习⏹数据的角度⏹收集数据很困难⏹为数据打标签很耗时⏹训练一对一的模型很繁琐⏹模型的角度⏹个性化模型很复杂⏹云+端的模型需要作具体化适配⏹应用的角度⏹冷启动问题：没有足够用户数据，推荐系统无法工作因此，迁移学习是必要的1 迁移学习简介：迁移学习方法常见的迁移学习方法分类基于实例的迁移(instance based TL)•通过权重重用源域和目标域的样例进行迁移基于特征的迁移(feature based TL)•将源域和目标域的特征变换到相同空间基于模型的迁移(parameter based TL)•利用源域和目标域的参数共享模型基于关系的迁移(relation based TL)•利用源域中的逻辑网络关系进行迁移1 迁移学习简介：迁移学习方法研究领域常见的迁移学习研究领域与方法分类12领域自适应问题345⏹领域自适应问题⏹按照目标域有无标签⏹目标域全部有标签：supervised DA⏹目标域有一些标签：semi-supervised DA⏹目标域全没有标签：unsupervised DA⏹Unsupervised DA最有挑战性，是我们的关注点123领域自适应方法453 领域自适应：方法概览⏹基本假设⏹数据分布角度：源域和目标域的概率分布相似⏹最小化概率分布距离⏹特征选择角度：源域和目标域共享着某些特征⏹选择出这部分公共特征⏹特征变换角度：源域和目标域共享某些子空间⏹把两个域变换到相同的子空间⏹解决思路概率分布适配法(Distribution Adaptation)特征选择法(Feature Selection)子空间学习法(Subspace Learning)数据分布特征选择特征变换假设：条件分布适配(Conditional distribution假设：联合分布适配(Joint distribution adaptation)假设：源域数据目标域数据(1)目标域数据(2)⏹边缘分布适配(1)⏹迁移成分分析(Transfer Component Analysis,TCA)[Pan, TNN-11]⏹优化目标：⏹最大均值差异(Maximum Mean Discrepancy,MMD)⏹边缘分布适配(2)⏹迁移成分分析(TCA)方法的一些扩展⏹Adapting Component Analysis (ACA) [Dorri, ICDM-12]⏹最小化MMD，同时维持迁移过程中目标域的结构⏹Domain Transfer Multiple Kernel Learning (DTMKL) [Duan, PAMI-12]⏹多核MMD⏹Deep Domain Confusion (DDC) [Tzeng, arXiv-14]⏹把MMD加入到神经网络中⏹Deep Adaptation Networks (DAN) [Long, ICML-15]⏹把MKK-MMD加入到神经网络中⏹Distribution-Matching Embedding (DME) [Baktashmotlagh, JMLR-16]⏹先计算变换矩阵，再进行映射⏹Central Moment Discrepancy (CMD) [Zellinger, ICLR-17]⏹不只是一阶的MMD，推广到了k阶⏹条件分布适配⏹Domain Adaptation of Conditional Probability Models viaFeature Subsetting[Satpal, PKDD-07]⏹条件随机场+分布适配⏹优化目标：⏹Conditional Transferrable Components (CTC) [Gong,ICML-15]⏹定义条件转移成分，对其进行建模⏹联合分布适配(1)⏹联合分布适配(Joint Distribution Adaptation,JDA)[Long, ICCV-13]⏹直接继承于TCA，但是加入了条件分布适配⏹优化目标：⏹问题：如何获得估计条件分布？⏹充分统计量：用类条件概率近似条件概率⏹用一个弱分类器生成目标域的初始软标签⏹最终优化形式⏹联合分布适配的结果普遍优于比单独适配边缘或条件分布⏹联合分布适配(2)⏹联合分布适配(JDA)方法的一些扩展⏹Adaptation Regularization (ARTL) [Long, TKDE-14]⏹分类器学习+联合分布适配⏹Visual Domain Adaptation (VDA)[Tahmoresnezhad, KIS-17]⏹加入类内距、类间距⏹Joint Geometrical and Statistical Alignment (JGSA)[Zhang, CVPR-17]⏹加入类内距、类间距、标签适配⏹[Hsu,TIP-16]：加入结构不变性控制⏹[Hsu, AVSS-15]：目标域选择⏹Joint Adaptation Networks (JAN)[Long, ICML-17]⏹提出JMMD度量，在深度网络中进行联合分布适配平衡因子当，表示边缘分布更占优，应该优先适配⏹联合分布适配(4)⏹平衡分布适配(BDA)：平衡因子的重要性⏹平衡分布适配(BDA)：平衡因子的求解与估计⏹目前尚无精确的估计方法;我们采用A-distance来进行估计⏹求解源域和目标域整体的A-distance⏹对目标域聚类，计算源域和目标域每个类的A-distance ⏹计算上述两个距离的比值，则为平衡因子⏹对于不同的任务，边缘分布和条件分布并不是同等重要，因此，BDA 方法可以有效衡量这两个分布的权重，从而达到最好的结果⏹概率分布适配：总结⏹方法⏹基础：大多数方法基于MMD距离进行优化求解⏹分别进行边缘/条件/联合概率适配⏹效果：平衡(BDA)>联合(JDA)>边缘(TCA)>条件⏹使用⏹数据整体差异性大(相似度较低)，边缘分布更重要⏹数据整体差异性小(协方差漂移)，条件分布更重要⏹最新成果⏹深度学习+分布适配往往有更好的效果(DDC、DAN、JAN)BDA、JDA、TCA精度比较DDC、DAN、JAN与其他方法结果比较⏹特征选择法(Feature Selection)⏹从源域和目标域中选择提取共享的特征，建立统一模型⏹Structural Correspondence Learning (SCL) [Blitzer, ECML-06]⏹寻找Pivot feature，将源域和目标域进行对齐⏹特征选择法其他扩展⏹Joint feature selection and subspace learning [Gu, IJCAI-11]⏹特征选择/变换+子空间学习⏹优化目标：⏹Transfer Joint Matching (TJM) [Long, CVPR-14]⏹MMD分布适配+源域样本选择⏹优化目标：⏹Feature Selection and Structure Preservation (FSSL) [Li, IJCAI-16]⏹特征选择+信息不变性⏹优化目标：⏹特征选择法：总结⏹从源域和目标域中选择提取共享的特征，建立统一模型⏹通常与分布适配进行结合⏹选择特征通常利用稀疏矩阵⏹子空间学习法(Subspace Learning)⏹将源域和目标域变换到相同的子空间，然后建立统一的模型⏹统计特征变换(Statistical Feature Transformation)⏹将源域和目标域的一些统计特征进行变换对齐⏹流形学习(Manifold Learning)⏹在流形空间中进行子空间变换统计特征变换流形学习⏹统计特征变换(1)⏹子空间对齐法(Subspace Alignment,SA)[Fernando, ICCV-13]⏹直接寻求一个线性变换，把source变换到target空间中⏹优化目标：⏹直接获得线性变换的闭式解：⏹子空间分布对齐法(Subspace Distribution Alignment,SDA)[Sun, BMVC-15]⏹子空间对齐+概率分布适配⏹空间对齐法：方法简洁，计算高效⏹统计特征变换(2)⏹关联对齐法(CORrelation Alignment,CORAL)[Sun, AAAI-15]⏹最小化源域和目标域的二阶统计特征⏹优化目标：⏹形式简单，求解高效⏹深度关联对齐(Deep-CORAL) [Sun, ECCV-16]⏹在深度网络中加入CORAL⏹CORAL loss:⏹流形学习(1)⏹采样测地线流方法(Sample Geodesic Flow, SGF) [Gopalan, ICCV-11]⏹把领域自适应的问题看成一个增量式“行走”问题⏹从源域走到目标域就完成了一个自适应过程⏹在流形空间中采样有限个点，构建一个测地线流⏹测地线流式核方法(Geodesic Flow Kernel,GFK)[Gong, CVPR-12]⏹继承了SGF方法，采样无穷个点⏹转化成Grassmann流形中的核学习，构建了GFK⏹优化目标：SGF方法GFK方法⏹流形学习(2)⏹域不变映射(Domain-Invariant Projection,DIP)[Baktashmotlagh,CVPR-13]⏹直接度量分布距离是不好的：原始空间特征扭曲⏹仅作流形子空间学习：无法刻画分布距离⏹解决方案：流形映射+分布度量⏹统计流形法(Statistical Manifold) [Baktashmotlagh, CVPR-14]⏹在统计流形(黎曼流形)上进行分布度量⏹用Fisher-Rao distance (Hellinger distance)进行度量⏹子空间学习法：总结⏹主要包括统计特征对齐和流形学习方法两大类⏹和分布适配结合效果更好⏹趋势：与神经网络结合1234最新研究成果5⏹领域自适应的最新研究成果(1)⏹与深度学习进行结合⏹Deep Adaptation Networks (DAN)[Long, ICML-15]⏹深度网络+MMD距离最小化⏹Joint Adaptation Networks (JAN)[Long, ICML-17]⏹深度网络+联合分布距离最小化⏹Simultaneous feature and task transfer[Tzeng, ICCV-15]⏹特征和任务同时进行迁移⏹Deep Hashing Network (DHN) [CVPR-17]⏹在深度网络中同时学习域适应和深度Hash特征⏹Label Efficient Learning of Transferable Representations acrossDomains and Tasks [Luo, NIPS-17]⏹在深度网络中进行任务迁移⏹领域自适应的最新研究成果(2)⏹与对抗学习进行结合⏹Domain-adversarial neural network[Ganin, JMLR-16]⏹深度网络中加入对抗[Tzeng, arXiv-17]⏹Adversarial Discriminative Domain Adaptation (ADDA)⏹对抗+判别⏹开放世界领域自适应⏹Open set domain adaptation[Busto, ICCV-17]⏹当源域和目标域只共享一部分类别时如何迁移？⏹与张量(Tensor)表示相结合⏹When DA Meets tensor representation[Lu, ICCV-17]⏹用tensor的思想来做领域自适应⏹与增量学习结合⏹Learning to Transfer (L2T) [Wei, arXiv-17]⏹提取已有的迁移学习经验，应用于新任务12345参考资料图：Office+Caltech、USPS+MNIST、ImageNet+VOC、COIL20数据集•[Pan, TNN‐11] Pan S J, Tsang I W, Kwok J T, et al. Domain adaptation via transfer component analysis[J]. IEEE Transactions on Neural Networks, 2011, 22(2): 199‐210.•[Dorri, ICDM‐12] Dorri F, Ghodsi A. Adapting component analysis[C]//Data Mining (ICDM), 2012 IEEE 12th International Conference on. IEEE, 2012: 846‐851.•[Duan, PAMI‐12] Duan L, Tsang I W, Xu D. Domain transfer multiple kernel learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(3): 465‐479.•[Long, ICML‐15] Long M, Cao Y, Wang J, et al. Learning transferable features with deep adaptation networks[C]//International Conference on Machine Learning.2015: 97‐105.•[Baktashmotlagh, JMLR‐16] Baktashmotlagh M, Harandi M, Salzmann M. Distribution‐matching embedding for visual domain adaptation[J]. The Journal of Machine Learning Research, 2016, 17(1): 3760‐3789.•[Zellinger, ICLR‐17] Zellinger W, Grubinger T, Lughofer E, et al. Central moment discrepancy (CMD) for domain‐invariant representation learning[J]. arXiv preprint arXiv:1702.08811, 2017.•[Satpal, PKDD‐07] Satpal S, Sarawagi S. Domain adaptation of conditional probability models via feature subsetting[C]//PKDD. 2007, 4702: 224‐235.•[Gong, ICML‐15] Gong M, Zhang K, Liu T, et al. Domain adaptation with conditional transferable components[C]//International Conference on Machine Learning.2016: 2839‐2848.•[Long, ICCV‐13] M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu, “Transfer feature learning with joint distribution adaptation,”in ICCV, 2013, pp. 2200–2207.•[Long, TKDE‐14] Long M, Wang J, Ding G, et al. Adaptation regularization: A general framework for transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(5): 1076‐1089.•[Tahmoresnezhad, KIS‐17] J. Tahmoresnezhad and S. Hashemi, “Visual domain adaptation via transfer feature learning,” Knowl. Inf. Syst., 2016.•[Zhang, CVPR‐17] Zhang J, Li W, Ogunbona P. Joint Geometrical and Statistical Alignment for Visual Domain Adaptation, CVPR 2017.•[Hsu, AVSS‐15] T. Ming Harry Hsu, W. Yu Chen, C.‐A. Hou, and H. T. et al., “Unsupervised domain adaptation with imbalanced cross‐domain data,” in ICCV, 2015, pp. 4121–4129.•[Hsu, TIP‐16] P.‐H. Hsiao, F.‐J. Chang, and Y.‐Y. Lin, “Learning discriminatively reconstructed source data for object recognition with few examples,” TIP, vol. 25, no.8, pp. 3518–3532, 2016.•[Long, ICML‐17] Long M, Wang J, Jordan M I. Deep transfer learning with joint adaptation networks. ICML 2017.•[Wang, ICDM‐17] Wang J, Chen Y, Hao S, Feng W, Shen Z. Balanced Distribution Adaptation for Transfer Learning. ICDM 2017. pp.1129‐1134.•[Blitzer, ECML‐06] Blitzer J, McDonald R, Pereira F. Domain adaptation with structural correspondence learning[C]//Proceedings of the 2006 conference on empirical methods in natural language processing. Association for Computational Linguistics, 2006: 120‐128.•[Gu, IJCAI‐11] Gu Q, Li Z, Han J. Joint feature selection and subspace learning[C]//IJCAI Proceedings‐International Joint Conference on Artificial Intelligence. 2011, 22(1): 1294.•[Long, CVPR‐14] Long M, Wang J, Ding G, et al. Transfer joint matching for unsupervised domain adaptation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 1410‐1417.•[Li, IJCAI‐16] Li J, Zhao J, Lu K. Joint Feature Selection and Structure Preservation for Domain Adaptation[C]//IJCAI. 2016: 1697‐1703.•[Fernando, ICCV‐13] Fernando B, Habrard A, Sebban M, et al. Unsupervised visual domain adaptation using subspace alignment[C]//Proceedings of the IEEE international conference on computer vision. 2013: 2960‐2967.•[Sun, BMVC‐15] Sun B, Saenko K. Subspace Distribution Alignment for Unsupervised Domain Adaptation[C]//BMVC. 2015: 24.1‐24.10.•[Sun, AAAI‐16] Sun B, Feng J, Saenko K. Return of Frustratingly Easy Domain Adaptation[C]//AAAI. 2016, 6(7): 8.•[Sun, ECCV‐16] Sun B, Saenko K. Deep coral: Correlation alignment for deep domain adaptation[C]//Computer Vision–ECCV 2016 Workshops. Springer International Publishing, 2016: 443‐450.•[Gopalan, ICCV‐11] Gopalan R, Li R, Chellappa R. Domain adaptation for object recognition: An unsupervised approach[C]//Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011: 999‐1006.•[Gong, CVPR‐12] Gong B, Shi Y, Sha F, et al. Geodesic flow kernel for unsupervised domain adaptation[C]//Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012: 2066‐2073.•[Baktashmotlagh, CVPR‐13] Baktashmotlagh M, Harandi M T, Lovell B C, et al. Unsupervised domain adaptation by domain invariant projection[C]//Proceedings of the IEEE International Conference on Computer Vision. 2013: 769‐776.•[Baktashmotlagh, CVPR‐14] Baktashmotlagh M, Harandi M T, Lovell B C, et al. Domain adaptation on the statistical manifold[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 2481‐2488.•[Ganin, JMLR‐16] Ganin Y, Ustinova E, Ajakan H, et al. Domain‐adversarial training of neural networks[J]. Journal of Machine Learning Research, 2016, 17(59): 1‐35.•[Busto, ICCV‐17] Panareda Busto P, Gall J. Open Set Domain Adaptation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017: 754‐763.•[Lu, ICCV‐17] Lu H, Zhang L, Cao Z, et al. When unsupervised domain adaptation meets tensor representations. ICCV 2017.•[Tzeng, arXiv‐17] Tzeng E, Hoffman J, Saenko K, et al. Adversarial discriminative domain adaptation[J]. arXiv preprint arXiv:1702.05464, 2017.•[Wei, arXiv‐17] Wei Y, Zhang Y, Yang Q. Learning to Transfer. arXiv1708.05629, 2017.。

文献 (10)Semi-supervised and unsupervised extreme learning

Semi-supervised and unsupervised extreme learningmachinesGao Huang,Shiji Song,Jatinder N.D.Gupta,and Cheng WuAbstract—Extreme learning machines(ELMs)have proven to be an efﬁcient and effective learning paradigm for pattern classiﬁcation and regression.However,ELMs are primarily applied to supervised learning problems.Only a few existing research studies have used ELMs to explore unlabeled data. In this paper,we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization,thus greatly expanding the applicability of ELMs.The key advantages of the proposed algorithms are1)both the semi-supervised ELM (SS-ELM)and the unsupervised ELM(US-ELM)exhibit the learning capability and computational efﬁciency of ELMs;2) both algorithms naturally handle multi-class classiﬁcation or multi-cluster clustering;and3)both algorithms are inductive and can handle unseen data at test time directly.Moreover,it is shown in this paper that all the supervised,semi-supervised and unsupervised ELMs can actually be put into a uniﬁed framework. This provides new perspectives for understanding the mechanism of random feature mapping,which is the key concept in ELM theory.Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efﬁciency.Index Terms—Clustering,embedding,extreme learning ma-chine,manifold regularization,semi-supervised learning,unsu-pervised learning.I.I NTRODUCTIONS INGLE layer feedforward networks(SLFNs)have been intensively studied during the past several decades.Most of the existing learning algorithms for training SLFNs,such as the famous back-propagation algorithm[1]and the Levenberg-Marquardt algorithm[2],adopt gradient methods to optimize the weights in the network.Some existing works also use forward selection or backward elimination approaches to con-struct network dynamically during the training process[3]–[7].However,neither the gradient based methods nor the grow/prune methods guarantee a global optimal solution.Al-though various methods,such as the generic and evolutionary algorithms,have been proposed to handle the local minimum This work was supported by the National Natural Science Foundation of China under Grant61273233,the Research Fund for the Doctoral Program of Higher Education under Grant20120002110035and20130002130010, the National Key Technology R&D Program under Grant2012BAF01B03, the Project of China Ocean Association under Grant DY125-25-02,and Tsinghua University Initiative Scientiﬁc Research Program under Grants 2011THZ07132.Gao Huang,Shiji Song,and Cheng Wu are with the Department of Automation,Tsinghua University,Beijing100084,China(e-mail:huang-g09@;shijis@; wuc@).Jatinder N.D.Gupta is with the College of Business Administration,The University of Alabama in Huntsville,Huntsville,AL35899,USA.(e-mail: guptaj@).problem,they basically introduce high computational cost. One of the most successful algorithms for training SLFNs is the support vector machines(SVMs)[8],[9],which is a maximal margin classiﬁer derived under the framework of structural risk minimization(SRM).The dual problem of SVMs is a quadratic programming and can be solved conveniently.Due to its simplicity and stable generalization performance,SVMs have been widely studied and applied to various domains[10]–[14].Recently,Huang et al.[15],[16]proposed the extreme learning machines(ELMs)for training SLFNs.In contrast to most of the existing approaches,ELMs only update the output weights between the hidden layer and the output layer, while the parameters,i.e.,the input weights and biases,of the hidden layer are randomly generated.By adopting squared loss on the prediction error,the training of output weights turns into a regularized least squares(or ridge regression)problem which can be solved efﬁciently in closed form.It has been shown that even without updating the parameters of the hidden layer,the SLFN with randomly generated hidden neurons and tunable output weights maintains its universal approximation capability[17]–[19].Compared to gradient based algorithms, ELMs are much more efﬁcient and usually lead to better generalization performance[20]–[22].Compared to SVMs, solving the regularized least squares problem in ELMs is also faster than solving the quadratic programming problem in standard SVMs.Moreover,ELMs can be used for multi-class classiﬁcation problems directly.The predicting accuracy achieved by ELMs is comparable with or even higher than that of SVMs[16],[22]–[24].The differences and similarities between ELMs and SVMs are discussed in[25]and[26], and new algorithms are proposed by combining the advan-tages of both models.In[25],an extreme SVM(ESVM) model is proposed by combining ELMs and the proximal SVM(PSVM).The ESVM algorithm is shown to be more accurate than the basic ELMs model due to the introduced regularization technique,and much more efﬁcient than SVMs since there is no kernel matrix multiplication in ESVM.In [26],the traditional RBF kernel are replaced by ELM kernel, leading to an efﬁcient algorithm with matched accuracy of SVMs.In the past years,researchers from variesﬁelds have made substantial contribution to ELM theories and applications.For example,the universal approximation ability of ELMs has been further studied in a classiﬁcation context[23].The gen-eralization error bound of ELMs has been investigated from the perspective of the Vapnik-Chervonenkis(VC)dimension theory and the initial localized generalization error model(LGEM)[27],[28].Varies extensions have been made to the basic ELMs to make it more efﬁcient and more suitable for speciﬁc problems,such as ELMs for online sequential data [29]–[31],ELMs for noisy/missing data[32]–[34],ELMs for imbalanced data[35],etc.From the implementation aspect, ELMs has recently been implemented using parallel tech-niques[36],[37],and realized on hardware[38],which made ELMs feasible for large data sets and real time reasoning. Though ELMs have become popular in a wide range of domains,they are primarily used for supervised learning tasks such as classiﬁcation and regression,which greatly limits their applicability.In some cases,such as text classiﬁcation, information retrieval and fault diagnosis,obtaining labels for fully supervised learning is time consuming and expensive, while a multitude of unlabeled data are easy and cheap to collect.To overcome the disadvantage of supervised learning al-gorithms that they cannot make use of unlabeled data,semi-supervised learning(SSL)has been proposed to leverage both labeled and unlabeled data[39],[40].The SSL algorithms assume that the input patterns from both labeled and unlabeled data are drawn from the same marginal distribution.Therefore, the unlabeled data naturally provide useful information for exploring the data structure in the input space.By assuming that the input data follows some cluster structure or manifold in the input space,SSL algorithms can incorporate both la-beled and unlabeled data into the learning process.Since SSL requires less effort to collect labeled data and can offer higher accuracy,it has been applied to various domains[41]–[43].In some other cases where no labeled data are available,people may be interested in exploring the underlying structure of the data.To this end,unsupervised learning(USL)techniques, such as clustering,dimension reduction or data representation, are widely used to fulﬁll these tasks.In this paper,we extend ELMs to handle both semi-supervised and unsupervised learning problems by introducing the manifold regularization framework.Both the proposed semi-supervised ELM(SS-ELM)and unsupervised ELM(US-ELM)inherit the computational efﬁciency and the learn-ing capability of traditional pared with existing algorithms,SS-ELM and US-ELM are not only inductive (straightforward extension for out-of-sample examples at test time),but also can be used for multi-class classiﬁcation or multi-cluster clustering directly.We test our algorithms on a variety of data sets,and make comparisons with other related algorithms.The results show that the proposed algorithms are competitive with state-of-the-art algorithms in terms of accuracy and efﬁciency.It is worth to mention that all the supervised,semi-supervised and unsupervised ELMs can actually be put into a uniﬁed framework,that is all the algorithms consist of two stages:1)random feature mapping;and2)output weights solving.Theﬁrst stage is to construct the hidden layer using randomly generated hidden neurons.This is the key concept in the ELM theory,which differs it from many existing feature learning methods.Generating feature mapping randomly en-ables ELMs for fast nonlinear feature learning and alleviates the problem of over-ﬁtting.The second stage is to solve the weights between the hidden layer and the output layer, and this is where the main difference of supervised,semi-supervised and unsupervised ELMs lies.We believe that the uniﬁed framework for the three types of ELMs might provide us a new perspective to understand the underlying behavior of the random feature mapping in ELMs.The rest of the paper is organized as follows.In Section II,we give a brief review of related existing literature on semi-supervised and unsupervised learning.Section III and IV introduce the basic formulation of ELMs and the man-ifold regularization framework,respectively.We present the proposed SS-ELM and US-ELM algorithms in Sections V and VI.Experiment results are given in Section VII,and Section VIII concludes the paper.II.R ELATED WORKSOnly a few existing research studies on ELMs have dealt with the problem of semi-supervised learning or unsupervised learning.In[44]and[45],the manifold regularization frame-work was introduce into the ELMs model to leverage both labeled and unlabeled data,thus extended ELMs for semi-supervised learning.However,both of these two works are limited to binary classiﬁcation problems,thus they haven’t explore the full power of ELMs.Moreover,both algorithms are only effective when the number of training patterns is more than the number of hidden neurons.Unfortunately,this condition is usually violated in semi-supervised learning since the training data is relatively scarce compared to the hidden neurons,whose number is commonly set to several hundreds or several thousands.Recently,a co-training approach have been proposed to train ELMs in a semi-supervised setting [46].In this algorithm,the labeled training sets are augmented gradually by moving a small set of most conﬁdently predicted unlabeled data to the labeled set at each loop,and ELMs are trained repeatedly on the pseudo-labeled set.Since the algo-rithm need to train ELMs repeatedly,it introduces considerable extra computational cost.The proposed SS-ELM is related to a few other mani-fold assumption based semi-supervised learning algorithms, such as the Laplacian support vector machines(LapSVMs) [47],the Laplacian regularized least squares(LapRLS)[47], semi-supervised neural networks(SSNNs)[48],and semi-supervised deep embedding[49].It has been shown in these works that manifold regularization is effective in a wide range of domains and often leads to a state-of-the-art performance in terms of accuracy and efﬁciency.The US-ELM proposed in this paper are related to the Laplacian Eigenmaps(LE)[50]and spectral clustering(SC) [51]in that they both use spectral techniques for embedding and clustering.In all these algorithms,an afﬁnity matrix is ﬁrst built from the input patterns.The SC performs eigen-decomposition on the normalized afﬁnity matrix,and then embeds the original data into a d-dimensional space using the ﬁrst d eigenvectors(each row is normalized to have unit length and represents a point in the embedded space)corresponding to the d largest eigenvalues.The LE algorithm performs generalized eigen-decomposition on the graph Laplacian,anduses the d eigenvectors corresponding to the second through the(d+1)th smallest eigenvalues for embedding.When LE and SC are used for clustering,then k-means is adopted to cluster the data in the embedded space.Similar to LE and SC,the US-ELM are also based on the afﬁnity matrix,and it is converted to solving a generalized eigen-decomposition problem.However,the eigenvectors obtained in US-ELM are not used for data representation directly,but are used as the parameters of the network,i.e.,the output weights.Note that once the US-ELM model is trained,it can be applied to any presented data in the original input space.In this way,US-ELM provide a straightforward way for handling new patterns without recomputing eigenvectors as in LE and SC.III.E XTREME LEARNING MACHINES Consider a supervised learning problem where we have a training set with N samples,{X,Y}={x i,y i}N i=1.Herex i∈R n i,y i is a n o-dimensional binary vector with only one entry(correspond to the class that x i belongs to)equal to one for multi-classiﬁcation tasks,or y i∈R n o for regression tasks,where n i and n o are the dimensions of input and output respectively.ELMs aim to learn a decision rule or an approximation function based on the training data. Generally,the training of ELMs consists of two stages.The ﬁrst stage is to construct the hidden layer using aﬁxed number of randomly generated mapping neurons,which can be any nonlinear piecewise continuous functions,such as the Sigmoid function and Gaussian function given below.1)Sigmoid functiong(x;θ)=11+exp(−(a T x+b));(1)2)Gaussian functiong(x;θ)=exp(−b∥x−a∥);(2) whereθ={a,b}are the parameters of the mapping function and∥·∥denotes the Euclidean norm.A notable feature of ELMs is that the parameters of the hidden mapping functions can be randomly generated ac-cording to any continuous probability distribution,e.g.,the uniform distribution on(-1,1).This makes ELMs distinct from the traditional feedforward neural networks and SVMs. The only free parameters that need to be optimized in the training process are the output weights between the hidden neurons and the output nodes.By doing so,training ELMs is equivalent to solving a regularized least squares problem which is considerately more efﬁcient than the training of SVMs or backpropagation algorithms.In theﬁrst stage,a number of hidden neurons which map the data from the input space into a n h-dimensional feature space (n h is the number of hidden neurons)are randomly generated. We denote by h(x i)∈R1×n h the output vector of the hidden layer with respect to x i,andβ∈R n h×n o the output weights that connect the hidden layer with the output layer.Then,the outputs of the network are given byf(x i)=h(x i)β,i=1,...,N.(3)In the second stage,ELMs aim to solve the output weights by minimizing the sum of the squared losses of the prediction errors,which leads to the following formulationminβ∈R n h×n o12∥β∥2+C2N∑i=1∥e i∥2s.t.h(x i)β=y T i−e T i,i=1,...,N,(4)where theﬁrst term in the objective function is a regularization term which controls the complexity of the model,e i∈R n o is the error vector with respect to the i th training pattern,and C is a penalty coefﬁcient on the training errors.By substituting the constraints into the objective function, we obtain the following equivalent unconstrained optimization problem:minβ∈R n h×n oL ELM=12∥β∥2+C2∥Y−Hβ∥2(5)where H=[h(x1)T,...,h(x N)T]T∈R N×n h.The above problem is widely known as the ridge regression or regularized least squares.By setting the gradient of L ELM with respect toβto zero,we have∇L ELM=β+CH H T(Y−Hβ)=0(6) If H has more rows than columns and is of full column rank,which is usually the case where the number of training patterns are more than the number of the hidden neurons,the above equation is overdetermined,and we have the following closed form solution for(5):β∗=(H T H+I nhC)−1H T Y,(7)where I nhis an identity matrix of dimension n h.Note that in practice,rather than explicitly inverting the n h×n h matrix in the above expression,we can use Gaussian elimination to directly solve a set of linear equations in a more efﬁcient and numerically stable manner.If the number of training patterns are less than the number of hidden neurons,then H will have more columns than rows, which often leads to an underdetermined least squares prob-lem.In this case,βmay have inﬁnite number of solutions.To handle this problem,we restrictβto be a linear combination of the rows of H:β=H Tα(α∈R N×n o).Notice that when H has more columns than rows and is of full row rank,then H H T is invertible.Multiplying both side of(6) by(H H T)−1H,we getα+C(Y−H H Tα)=0,(8) This yieldsβ∗=H Tα∗=H T(H H T+I NC)−1Y(9)where I N is an identity matrix of dimension N. Therefore,in the case where training patterns are plentiful compared to the hidden neurons,we use(7)to compute the output weights,otherwise we use(9).IV.T HE MANIFOLD REGULARIZATION FRAMEWORK Semi-supervised learning is built on the following two assumptions:(1)both the label data X l and the unlabeled data X u are drawn from the same marginal distribution P X ;and (2)if two points x 1and x 2are close to each other,then the conditional probabilities P (y |x 1)and P (y |x 2)should be similar as well.The latter assumption is widely known as the smoothness assumption in machine learning.To enforce this assumption on the data,the manifold regularization framework proposes to minimize the following cost functionL m=12∑i,jw ij ∥P (y |x i )−P (y |x j )∥2,(10)where w ij is the pair-wise similarity between two patterns x iand x j .Note that the similarity matrix W =[w ij ]is usually sparse,since we only place a nonzero weight between two patterns x i and x j if they are close,e.g.,x i is among the k nearest neighbors of x j or x j is among the k nearest neighbors of x i .The nonzero weights are usually computed using Gaussian function exp (−∥x i −x j ∥2/2σ2),or simply ﬁxed to 1.Intuitively,the formulation (10)penalizes large variation in the conditional probability P (y |x )when x has a small change.This requires that P (y |x )vary smoothly along the geodesics of P (x ).Since it is difﬁcult to compute the conditional probability,we can approximate (10)with the following expression:ˆLm =12∑i,jw ij ∥ˆyi −ˆy j ∥2,(11)where ˆyi and ˆy j are the predictions with respect to pattern x i and x j ,respectively.It is straightforward to simplify the above expression in a matrix form:ˆL m =Tr (ˆY T L ˆY ),(12)where Tr (·)denotes the trace of a matrix,L =D −W isknown as the graph Laplacian ,and D is a diagonal matrixwith its diagonal elements D ii =l +u∑j =1w i,j .As discussed in [52],instead of using L directly,we can normalize it byD −12L D −12or replace it by L p (p is an integer),based on some prior knowledge.V.S EMI -SUPERVISED ELMIn the semi-supervised setting,we have few labeled data and plenty of unlabeled data.We denote the labeled data in the training set as {X l ,Y l }={x i ,y i }l i =1,and unlabeled dataas X u ={x i }ui =1,where l and u are the number of labeled and unlabeled data,respectively.The proposed SS-ELM incorporates the manifold regular-ization to leverage unlabeled data to improve the classiﬁcation accuracy when labeled data are scarce.By modifying the ordinary ELM formulation (4),we give the formulation ofSS-ELM as:minβ∈R n h ×n o12∥β∥2+12l∑i =1C i ∥e i ∥2+λ2Tr (F T L F )s.t.h (x i )β=y T i −e T i ,i =1,...,l,f i =h (x i )β,i =1,...,l +u(13)where L ∈R (l +u )×(l +u )is the graph Laplacian built fromboth labeled and unlabeled data,and F ∈R (l +u )×n o is the output matrix of the network with its i th row equal to f (x i ),λis a tradeoff parameter.Note that similar to the weighted ELM algorithm (W-ELM)introduced in [35],here we associate different penalty coefﬁ-cient C i on the prediction errors with respect to patterns from different classes.This is because we found that when the data is skewed,i.e.,some classes have signiﬁcantly more training patterns than other classes,traditional ELMs tend to ﬁt the classes that having the majority of patterns quite well but ﬁts other classes poorly.This usually leads to poor generalization performance on the testing set (while the prediction accuracy may be high,but the some classes are neglected).Therefore,we propose to alleviate this problem by re-weighting instances from different classes.Suppose that x i belongs to class t i ,which has N t i training patterns,then we associate e i with a penalty ofC i =C 0N t i.(14)where C 0is a user deﬁned parameter as in traditional ELMs.In this way,the patterns from the dominant classes will not be over ﬁtted by the algorithm,and the patterns from a class with less samples will not be neglected.We substitute the constraints into the objective function,and rewrite the above formulation in a matrix form:min β∈R n h×n o 12∥β∥2+12∥C 12( Y −Hβ)∥2+λ2Tr (βT H TL Hβ)(15)where Y∈R (l +u )×n o is the training target with its ﬁrst l rows equal to Y l and the rest equal to 0,C is a (l +u )×(l +u )diagonal matrix with its ﬁrst l diagonal elements [C ]ii =C i ,i =1,...,l and the rest equal to 0.Again,we compute the gradient of the objective function with respect to β:∇L SS −ELM =β+H T C ( Y−H β)+λH H T L H β.(16)By setting the gradient to zero,we obtain the solution tothe SS-ELM:β∗=(I n h +H T C H +λH H T L H )−1H TC Y .(17)As in Section III,if the number of labeled data is fewer thanthe number of hidden neurons,which is common in SSL,we have the following alternative solution:β∗=H T (I l +u +C H H T +λL L H H T )−1C Y .(18)where I l +u is an identity matrix of dimension l +u .Note that by settingλto be zero and the diagonal elements of C i(i=1,...,l)to be the same constant,(17)and (18)reduce to the solutions of traditional ELMs(7)and(9), respectively.Based on the above discussion,the SS-ELM algorithm is summarized as Algorithm1.Algorithm1The SS-ELM algorithmInput:The labeled patterns,{X l,Y l}={x i,y i}l i=1;The unlabeled patterns,X u={x i}u i=1;Output:The mapping function of SS-ELM:f:R n i→R n oStep1:Construct the graph Laplacian L from both X l and X u.Step2:Initiate an ELM network of n h hidden neurons with random input weights and biases,and calculate the output matrix of the hidden neurons H∈R(l+u)×n h.Step3:Choose the tradeoff parameter C0andλ.Step4:•If n h≤NCompute the output weightsβusing(17)•ElseCompute the output weightsβusing(18)return The mapping function f(x)=h(x)β.VI.U NSUPERVISED ELMIn this section,we introduce the US-ELM algorithm for unsupervised learning.In an unsupervised setting,the entire training data X={x i}N i=1are unlabeled(N is the number of training patterns)and our target is toﬁnd the underlying structure of the original data.The formulation of US-ELM follows from the formulation of SS-ELM.When there is no labeled data,(15)is reduced tomin β∈R n h×n o ∥β∥2+λTr(βT H T L Hβ)(19)Notice that the above formulation always attains its mini-mum atβ=0.As suggested in[50],we have to introduce addtional constraints to avoid a degenerated solution.Speciﬁ-cally,the formulation of US-ELM is given bymin β∈R n h×n o ∥β∥2+λTr(βT H T L Hβ)s.t.(Hβ)T Hβ=I no(20)Theorem1:An optimal solution to problem(20)is given by choosingβas the matrix whose columns are the eigenvectors (normalized to satisfy the constraint)corresponding to theﬁrst n o smallest eigenvalues of the generalized eigenvalue problem:(I nh +λH H T L H)v=γH H T H v.(21)Proof:We can rewrite the problem(20)asminβ∈R n h×n o,ββT Bβ=I no Tr(βT Aβ),(22)Algorithm2The US-ELM algorithmInput:The training data:X∈R N×n i;Output:•For embedding task:The embedding in a n o-dimensional space:E∈R N×n o;•For clustering task:The label vector of cluster index:y∈N N×1+.Step1:Construct the graph Laplacian L from X.Step2:Initiate an ELM network of n h hidden neurons withrandom input weights,and calculate the output matrix of thehidden neurons H∈R N×n h.Step3:•If n h≤NFind the generalized eigenvectors v2,v3,...,v no+1of(21)corresponding to the second through the n o+1smallest eigenvalues.Letβ=[ v2, v3,..., v no+1],where v i=v i/∥H v i∥,i=2,...,n o+1.•ElseFind the generalized eigenvectors u2,u3,...,u no+1of(24)corresponding to the second through the n o+1smallest eigenvalues.Letβ=H T[ u2, u3,..., u no+1],where u i=u i/∥H H T u i∥,i=2,...,n o+1.Step4:Calculate the embedding matrix:E=Hβ.Step5(For clustering only):Treat each row of E as a point,and cluster the N points into K clusters using the k-meansalgorithm.Let y be the label vector of cluster index for allthe points.return E(for embedding task)or y(for clustering task);where A=I nh+λH H T L H and B=H T H.It is easy to verify that both A and B are Hermitianmatrices.Thus,according to the Rayleigh-Ritz theorem[53],the above trace minimization problem attains its optimum ifand only if the column span ofβis the minimum span ofthe eigenspace corresponding to the smallest n o eigenvaluesof(21).Therefore,by stacking the normalized eigenvectors of(21)corresponding to the smallest n o generalized eigenvalues,we obtain an optimal solution to(20).In the algorithm of Laplacian eigenmaps,theﬁrst eigenvec-tor is discarded since it is always a constant vector proportionalto1(corresponding to the smallest eigenvalue0)[50].In theUS-ELM algorithm,theﬁrst eigenvector of(21)also leadsto small variations in embedding and is not useful for datarepresentation.Therefore,we suggest to discard this trivialsolution as well.Letγ1,γ2,...,γno+1(γ1≤γ2≤...≤γn o+1)be the(n o+1)smallest eigenvalues of(21)and v1,v2,...,v no+1be their corresponding eigenvectors.Then,the solution to theoutput weightsβis given byβ∗=[ v2, v3,..., v no+1],(23)where v i=v i/∥H v i∥,i=2,...,n o+1are the normalizedeigenvectors.If the number of labeled data is fewer than the numberTABLE ID ETAILS OF THE DATA SETS USED FOR SEMI-SUPERVISED LEARNINGData set Class Dimension|L||U||V||T|G50C2505031450136COIL20(B)2102440100040360USPST(B)225650140950498COIL2020102440100040360USPST1025650140950498of hidden neurons,problem(21)is underdetermined.In this case,we have the following alternative formulation by using the same trick as in previous sections:(I u+λL L H H T )u=γH H H T u.(24)Again,let u1,u2,...,u no +1be generalized eigenvectorscorresponding to the(n o+1)smallest eigenvalues of(24), then theﬁnal solution is given byβ∗=H T[ u2, u3,..., u no +1],(25)where u i=u i/∥H H T u i∥,i=2,...,n o+1are the normal-ized eigenvectors.If our task is clustering,then we can adopt the k-means algorithm to perform clustering in the embedded space.We summarize the proposed US-ELM in Algorithm2. Remark:Comparing the supervised ELM,the semi-supervised ELM and the unsupervised ELM,we can observe that all the algorithms have two similar stages in the training process,that is the random feature learning stage and the out-put weights learning stage.Under this two-stage framework,it is easy toﬁnd the differences and similarities between the three algorithms.Actually,all the algorithms share the same stage of random feature learning,and this is the essence of the ELM theory.This also means that no matter the task is a supervised, semi-supervised or unsupervised learning problem,we can always follow the same step to generate the hidden layer. The differences of the three types of ELMs lie in the second stage on how the output weights are computed.In supervised ELM and SS-ELM,the output weights are trained by solving a regularized least squares problem;while the output weights in the US-ELM are obtained by solving a generalized eigenvalue problem.The uniﬁed framework for the three types of ELMs might provide new perspectives to further develop the ELM theory.VII.E XPERIMENTAL RESULTSWe evaluated our algorithms on wide range of semi-supervised and unsupervised parisons were made with related state-of-the-art algorithms, e.g.,Transductive SVM(TSVM)[54],LapSVM[47]and LapRLS[47]for semi-supervised learning;and Laplacian Eigenmap(LE)[50], spectral clustering(SC)[51]and deep autoencoder(DA)[55] for unsupervised learning.All algorithms were implemented using Matlab R2012a on a2.60GHz machine with4GB of memory.TABLE IIIT RAINING TIME(IN SECONDS)COMPARISON OF TSVM,L AP RLS,L AP SVM AND SS-ELMData set TSVM LapRLS LapSVM SS-ELMG50C0.3240.0410.0450.035COIL20(B)16.820.5120.4590.516USPST(B)68.440.9210.947 1.029COIL2018.43 5.841 4.9460.814USPST68.147.1217.259 1.373A.Semi-supervised learning results1)Data sets:We tested the SS-ELM onﬁve popular semi-supervised learning benchmarks,which have been widely usedfor evaluating semi-supervised algorithms[52],[56],[57].•The G50C is a binary classiﬁcation data set of which each class is generated by a50-dimensional multivariate Gaus-sian distribution.This classiﬁcation problem is explicitlydesigned so that the true Bayes error is5%.•The Columbia Object Image Library(COIL20)is a multi-class image classiﬁcation data set which consists1440 gray-scale images of20objects.Each pattern is a32×32 gray scale image of one object taken from a speciﬁc view.The COIL20(B)data set is a binary classiﬁcation taskobtained from COIL20by grouping theﬁrst10objectsas Class1,and the last10objects as Class2.•The USPST data set is a subset(the testing set)of the well known handwritten digit recognition data set USPS.The USPST(B)data set is a binary classiﬁcation task obtained from USPST by grouping theﬁrst5digits as Class1and the last5digits as Class2.2)Experimental setup:We followed the experimental setup in[57]to evaluate the semi-supervised algorithms.Speciﬁ-cally,each of the data sets is split into4folds,one of which was used for testing(denoted by T)and the rest3folds for training.Each of the folds was used as the testing set once(4-fold cross-validation).As in[57],this random fold generation process were repeated3times,resulted in12different splits in total.Every training set was further partitioned into a labeled set L,a validation set V,and an unlabeled set U.When we train a semi-supervised learning algorithm,the labeled data from L and the unlabeled data from U were used.The validation set which consists of labeled data was only used for model selection,i.e.,ﬁnding the optimal hyperparameters C0andλin the SS-ELM algorithm.The characteristics of the data sets used in our experiment are summarized in Table I. The training of SS-ELM consists of two stages:1)generat-ing the random hidden layer;and2)training the output weights using(17)or(18).In theﬁrst stage,we adopted the Sigmoid function for nonlinear mapping,and the input weights and biases were generated according to the uniform distribution on(-1,1).The number of hidden neurons n h wasﬁxed to 1000for G50C,and2000for the rest four data sets.In the second stage,weﬁrst need to build the graph Laplacian L.We followed the methods discussed in[52]and[57]to compute L,and the hyperparameter settings can be found in[47],[52] and[57].The trade off parameters C andλwere selected from。

多尺度上采样方法的轻量级图像超分辨率重建

第 22卷第 4期2023年 4月Vol.22 No.4Apr.2023软件导刊Software Guide多尺度上采样方法的轻量级图像超分辨率重建蔡靖，曾胜强（上海理工大学光电信息与计算机工程学院，上海 200093）摘要：目前，大多数图像超分辨率网络通过加深卷积神经网络层数与拓展网络宽度提升重建能力，但极大增加了模型复杂度。

为此，提出一种轻量级图像超分辨率算法，通过双分支特征提取算法可使网络模型一次融合并输出不同尺度的特征信息，组合像素注意力分支分别对各像素添加权重，仅以较少参数为代价增强像素细节的特征表达。

同时，上采样部分结合亚像素卷积与邻域插值方法，分别提取特征深度、空间尺度信息，输出最终图像。

此外，组合注意力机制的亚像素卷积分支也进一步强化了重要信息，使输出图像具有更好的视觉效果。

实验表明，该模型在参数量仅为351K的情况下达到了与参数量为1 592K的CARN模型相似的重建性能，在部分测试集中的SSIM值高于CARN，证实了所提方法的有效性，可为轻量级图像超分辨率重建提供新的解决方法。

关键词：图像超分辨率重建；轻量级；像素注意力；多尺度上采样；图像处理DOI：10.11907/rjdk.221516开放科学（资源服务）标识码（OSID）：中图分类号：TP391.41 文献标识码：A文章编号：1672-7800（2023）004-0168-07Lightweight Image Super-resolution Reconstruction using Multi-scaleUpsampling MethodCAI Jing， ZENG Sheng-qiang（School of Optical-Electrical and Computer Engineering，University of Shanghai for Science and Technology，Shanghai 200093， China）Abstract：At present， most image super-resolution networks improve the reconstruction ability by deepening the convolution neural network layers and expanding the network width， but greatly increase the model complexity. To this end， a lightweight image super-resolution algo‐rithm is proposed. Through the two-branch feature extraction algorithm， the network model can be fused and output the feature information of different scales at one time， and the pixel attention branches are combined to add weights to each pixel respectively， which only enhances the feature expression of pixel details at the cost of fewer parameters. In addition， the up-sampling part combines subpixel convolution and neigh‐borhood interpolation methods to extract feature depth and spatial scale information respectively， and output the final image. In addition， the subpixel convolution integral branch of the combined attention mechanism further strengthens the important information and makes the output image have better visual effect. The experimental results show that the model achieves similar reconstruction performance to the CARN model with a parameter quantity of 1 592K when the parameter quantity is only 351K， and the SSIM value in some test sets is higher than the CARN value， which confirms the effectiveness of the proposed method and can provide a new solution for lightweight image super-resolution recon‐struction.Key Words：image super-resolution； lightweight； pixel attention； multi-scale upsampling； image processing0 引言图像超分辨率重建是指将低分辨率图像重建为与之对应的高分辨率图像重建，在机器视觉和图像处理领域是非常重要的课题。

深度强化学习算法在高维状态空间中的应用

深度强化学习算法在高维状态空间中的应用深度强化学习（Deep Reinforcement Learning）是人工智能领域中的一项重要技术，已经在许多领域展现出巨大的应用潜力。

本文将探讨深度强化学习算法在高维状态空间中的应用，以及其在解决实际问题中所面临的挑战。

一、引言在传统的强化学习算法中，状态空间通常是由有限个数的离散状态组成。

然而，在许多实际问题中，状态空间往往是高维且连续的，如机器人控制、无人驾驶汽车等。

这种情况下，传统的强化学习算法无法有效处理这样的状态空间。

深度强化学习算法的出现填补了这一空白，为高维状态空间中的问题提供了新的解决思路。

二、深度强化学习算法概述深度强化学习算法是将深度学习方法与强化学习相结合的一种技术。

它通常通过深度神经网络来近似值函数或策略函数，以实现对于高维状态空间的建模和学习。

深度神经网络具有强大的表征能力，能够对复杂的状态空间进行表示和提取特征，从而更好地学习到最优的策略。

三、深度强化学习在高维状态空间中的应用1. 机器人控制深度强化学习在机器人控制领域中有着广泛的应用。

通过构建一个具有高维状态空间的环境模拟器，深度强化学习算法可以让机器人自主地学习到在复杂环境中的最优策略。

例如，在走迷宫的任务中，机器人需要通过学习不同的动作来寻找出口。

深度强化学习可以通过对状态空间的建模和训练，使机器人能够高效地找到最优路径。

2. 无人驾驶汽车无人驾驶汽车是一个典型的高维状态空间问题。

在实现无人驾驶汽车时，需要通过对各种信息（如图像、雷达数据等）的处理和分析，建立起对于当前环境状态的认知。

深度强化学习算法可以通过对大量真实驾驶数据的学习和训练，使无人驾驶汽车具备自主决策和规避障碍物的能力。

四、深度强化学习在高维状态空间中面临的挑战1. 维度灾难高维状态空间中存在着维度灾难的问题，即随着状态空间维度的增加，样本空间会呈指数级增长。

这给深度强化学习算法的训练和学习带来了巨大困难，容易导致算法的不稳定性和低效性。

mt4中文帮助命令中文手册

MQL4 Reference MQL4命令手册（本手册采用Office2007编写）2010年2月目录MQL4 Reference (1)MQL4命令手册 (1)Basics基础 (12)Syntax语法 (12)Comments注释 (12)Identifiers标识符 (12)Reserved words保留字 (13)Data types数据类型 (13)Type casting类型转换 (14)Integer constants整数常量 (14)Literal constants字面常量 (14)Boolean constants布尔常量 (15)Floating-point number constants (double)浮点数常量（双精度） (15)String constants字符串常量 (15)Color constants颜色常数 (16)Datetime constants日期时间常数 (16)Operations & Expressions操作表达式 (17)Expressions表达式 (17)Arithmetical operations算术运算 (17)Assignment operation赋值操作 (17)Operations of relation操作关系 (18)Boolean operations布尔运算 (18)Bitwise operations位运算 (19)Other operations其他运算 (19)Precedence rules优先规则 (20)Operators操作符 (21)Compound operator复合操作符 (21)Expression operator表达式操作符 (21)Break operator终止操作符 (21)Continue operator继续操作符 (22)Return operator返回操作符 (22)Conditional operator if-else条件操作符 (23)Switch operator跳转操作符 (23)Cycle operator while循环操作符while (24)Cycle operator for循环操作符for (24)Functions函数 (25)Function call函数调用 (26)Special functions特殊函数 (27)Variables变量 (27)Local variables局部变量 (28)Formal parameters形式变量 (28)Static variables静态变量 (29)Global variables全局变量 (29)Defining extern variables外部定义变量 (30)Initialization of variables初始化变量 (30)External functions definition外部函数的定义 (30)Preprocessor预处理 (31)Constant declaration常量声明 (31)Controlling compilation编译控制 (32)Including of files包含文件 (32)Importing of functions导入功能 (33)Standard constants标准常数 (35)Series arrays系列数组 (35)Timeframes图表周期时间 (35)Trade operations交易操作 (36)Price constants价格常数 (36)MarketInfo市场信息识别符 (36)Drawing styles画线风格 (37)Arrow codes预定义箭头 (38)Wingdings宋体 (39)Web colors颜色常数 (39)Indicator lines指标线 (40)Ichimoku Kinko Hyo (41)Moving Average methods移动平均方法 (41)MessageBox信息箱 (41)Object types对象类型 (43)Object properties对象属性 (44)Object visibility (45)Uninitialize reason codes撤销初始化原因代码 (45)Special constants特别常数 (46)Error codes错误代码 (46)Predefined variables预定义变量 (50)Ask最新卖价 (50)Bars柱数 (50)Bid最新买价 (50)Close[]收盘价 (51)Digits汇率小数位 (51)High[]最高价 (51)Low[]最低价 (52)Open[]开盘价 (53)Point点值 (53)Time[]开盘时间 (53)Volume[]成交量 (54)Program Run程序运行 (56)Program Run程序运行 (56)Imported functions call输入函数调用 (57)Runtime errors运行错误 (57)Account information账户信息 (68)AccountBalance( )账户余额 (68)AccountCredit( )账户信用点数 (68)AccountCompany( )账户公司名 (68)AccountCurrency( )基本货币 (68)AccountEquity( )账户资产净值 (68)AccountFreeMargin( )账户免费保证金 (69)AccountFreeMarginCheck()账户当前价格自由保证金 (69)AccountFreeMarginMode( )账户免费保证金模式 (69)AccountLeverage( )账户杠杆 (69)AccountMargin( )账户保证金 (69)AccountName( )账户名称 (70)AccountNumber( )账户数字 (70)AccountProfit( )账户利润 (70)AccountServer( )账户连接服务器 (70)AccountStopoutLevel( )账户停止水平值 (70)AccountStopoutMode( )账户停止返回模式 (71)Array functions数组函数 (72)ArrayBsearch()数组搜索 (72)ArrayCopy()数组复制 (72)ArrayCopyRates()数组复制走势 (73)ArrayCopySeries()数组复制系列走势 (74)ArrayDimension()返回数组维数 (75)ArrayGetAsSeries()返回数组序列 (75)ArrayInitialize()数组初始化 (75)ArrayIsSeries()判断数组连续 (75)ArrayMaximum()数组最大值定位 (76)ArrayMinimum()数组最小值定位 (76)ArrayRange()返回数组指定维数数量 (76)ArrayResize()改变数组维数 (77)ArraySetAsSeries()设定系列数组 (77)ArraySize()返回数组项目数 (78)ArraySort()数组排序 (78)Checkup检查 (79)GetLastError( )返回最后错误 (79)IsConnected( )返回联机状态 (79)IsDemo( )返回模拟账户 (79)IsDllsAllowed( )返回dll允许调用 (80)IsExpertEnabled( )返回智能交易开启状态 (80)IsLibrariesAllowed( )返回数据库函数调用 (80)IsOptimization( )返回策略测试中优化模式 (81)IsStopped( )返回终止业务 (81)IsTesting( )返回测试模式状态 (81)IsTradeAllowed( )返回允许智能交易 (81)IsTradeContextBusy( )返回其他智能交易忙 (82)IsVisualMode( )返回智能交易“图片模式” (82)UninitializeReason( )返回智能交易初始化原因 (82)Client terminal客户端信息 (83)TerminalCompany( )返回客户端所属公司 (83)TerminalName( )返回客户端名称 (83)TerminalPath( )返回客户端文件路径 (83)Common functions常规命令函数 (84)Alert弹出警告窗口 (84)Comment显示信息在走势图左上角 (84)GetTickCount获取时间标记 (84)MarketInfo在市场观察窗口返回不同数据保证金列表 (85)MessageBox创建信息窗口 (85)PlaySound播放声音 (86)Print窗口中显示文本 (86)SendFTP设置FTP (86)SendMail设置Email (87)Sleep指定的时间间隔内暂停交易业务 (87)Conversion functions格式转换函数 (88)CharToStr字符转换成字符串 (88)DoubleToStr双精度浮点转换成字符串 (88)NormalizeDouble给出环绕浮点值的精确度 (88)StrToDouble字符串型转换成双精度浮点型 (89)StrToInteger字符串型转换成整型 (89)StrToTime字符串型转换成时间型 (89)TimeToStr时间类型转换为"yyyy.mm.dd hh:mi"格式 (89)Custom indicators自定义指标 (91)IndicatorBuffers (91)IndicatorCounted (92)IndicatorDigits (92)IndicatorShortName (93)SetIndexArrow (94)SetIndexBuffer (94)SetIndexDrawBegin (95)SetIndexEmptyValue (95)SetIndexLabel (96)SetIndexShift (97)SetIndexStyle (98)SetLevelStyle (98)SetLevelValue (99)Date & Time functions日期时间函数 (100)Day (100)DayOfWeek (100)Hour (100)Minute (101)Month (101)Seconds (101)TimeCurrent (101)TimeDay (102)TimeDayOfWeek (102)TimeDayOfYear (102)TimeHour (102)TimeLocal (102)TimeMinute (103)TimeMonth (103)TimeSeconds (103)TimeYear (103)Year (104)File functions文件函数 (105)FileClose关闭文件 (105)FileDelete删除文件 (105)FileFlush将缓存中的数据刷新到磁盘上去 (106)FileIsEnding文件结尾 (106)FileIsLineEnding (107)FileOpen打开文件 (107)FileOpenHistory历史目录中打开文件 (108)FileReadArray将二进制文件读取到数组中 (108)FileReadDouble从文件中读取浮点型数据 (109)FileReadInteger从当前二进制文件读取整形型数据 (109)FileReadNumber (109)FileReadString从当前文件位置读取字串符 (110)FileSeek文件指针移动 (110)FileSize文件大小 (111)FileTell文件指针的当前位置 (111)FileWrite写入文件 (112)FileWriteArray一个二进制文件写入数组 (112)FileWriteDouble一个二进制文件以浮动小数点写入双重值 (113)FileWriteInteger一个二进制文件写入整数值 (113)FileWriteString当前文件位置函数写入一个二进制文件字串符 (114)Global variables全局变量 (115)GlobalVariableCheck (115)GlobalVariableDel (115)GlobalVariableGet (115)GlobalVariableName (116)GlobalVariableSet (116)GlobalVariableSetOnCondition (116)GlobalVariablesTotal (117)Math & Trig数学和三角函数 (119)MathAbs (119)MathArccos (119)MathArcsin (119)MathArctan (120)MathCeil (120)MathCos (120)MathExp (121)MathFloor (121)MathLog (122)MathMax (122)MathMin (122)MathMod (122)MathPow (123)MathRand (123)MathRound (123)MathSin (124)MathSqrt (124)MathSrand (124)MathTan (125)Object functions目标函数 (126)ObjectCreate建立目标 (126)ObjectDelete删除目标 (127)ObjectDescription目标描述 (127)ObjectFind查找目标 (127)ObjectGet目标属性 (128)ObjectGetFiboDescription斐波纳契描述 (128)ObjectGetShiftByValue (128)ObjectGetValueByShift (129)ObjectMove移动目标 (129)ObjectName目标名 (129)ObjectsDeleteAll删除所有目标 (130)ObjectSet改变目标属性 (130)ObjectSetFiboDescription改变目标斐波纳契指标 (131)ObjectSetText改变目标说明 (131)ObjectsTotal返回目标总量 (131)ObjectType返回目标类型 (132)String functions字符串函数 (133)StringConcatenate字符串连接 (133)StringFind字符串搜索 (133)StringGetChar字符串指定位置代码 (133)StringLen字符串长度 (134)StringSubstr提取子字符串 (134)StringTrimLeft (135)StringTrimRight (135)Technical indicators技术指标 (136)iAC比尔.威廉斯的加速器或减速箱振荡器 (136)iAD离散指标 (136)iAlligator比尔・威廉斯的鳄鱼指标 (136)iADX移动定向索引 (137)iATR平均真实范围 (137)iAO比尔.威廉斯的振荡器 (138)iBearsPower熊功率指标 (138)iBands保力加通道技术指标 (138)iBandsOnArray保力加通道指标 (139)iBullsPower牛市指标 (139)iCCI商品通道索引指标 (139)iCCIOnArray商品通道索引指标 (140)iCustom指定的客户指标 (140)iDeMarker (140)iEnvelopes包络指标 (141)iEnvelopesOnArray包络指标 (141)iForce强力索引指标 (142)iFractals分形索引指标 (142)iGator随机震荡指标 (142)iIchimoku (143)iBWMFI比尔.威廉斯市场斐波纳契指标 (143)iMomentum动量索引指标 (143)iMomentumOnArray (144)iMFI资金流量索引指标 (144)iMA移动平均指标 (144)iMAOnArray (145)iOsMA移动振动平均震荡器指标 (145)iMACD移动平均数汇总/分离指标 (146)iOBV能量潮指标 (146)iSAR抛物线状止损和反转指标 (146)iRSI相对强弱索引指标 (147)iRSIOnArray (147)iRVI相对活力索引指标 (147)iStdDev标准偏差指标 (148)iStdDevOnArray (148)iStochastic随机震荡指标 (148)iWPR威廉指标 (149)Timeseries access时间序列图表数据 (150)iBars柱的数量 (150)iClose (150)iHigh (151)iHighest (151)iLow (152)iLowest (152)iOpen (152)iTime (153)iVolume (153)Trading functions交易函数 (155)Execution errors (155)OrderClose (157)OrderCloseBy (158)OrderClosePrice (158)OrderCloseTime (158)OrderComment (159)OrderCommission (159)OrderDelete (159)OrderExpiration (160)OrderLots (160)OrderMagicNumber (160)OrderModify (160)OrderOpenPrice (161)OrderOpenTime (161)OrderPrint (162)OrderProfit (162)OrderSelect (162)OrderSend (163)OrdersHistoryTotal (164)OrderStopLoss (164)OrdersTotal (164)OrderSwap (165)OrderSymbol (165)OrderTakeProfit (165)OrderTicket (166)OrderType (166)Window functions窗口函数 (167)HideTestIndicators隐藏指标 (167)Period使用周期 (167)RefreshRates刷新预定义变量和系列数组的数据 (167)Symbol当前货币对 (168)WindowBarsPerChart可见柱总数 (168)WindowExpertName智能交易系统名称 (169)WindowFind返回名称 (169)WindowFirstVisibleBar第一个可见柱 (169)WindowHandle (169)WindowIsVisible图表在子窗口中可见 (170)WindowOnDropped (170)WindowPriceMax (170)WindowPriceMin (171)WindowPriceOnDropped (171)WindowRedraw (172)WindowScreenShot (172)WindowTimeOnDropped (173)WindowsTotal指标窗口数 (173)WindowXOnDropped (173)WindowYOnDropped (174)Obsolete functions过时的函数 (175)MetaQuotes Language 4 (MQL4) 是一种新的内置型程序用来编写交易策略。

fassis 聚类算法

fassis 聚类算法FASISS（Fast and Scalable Incremental Subspace Clustering）是一种增量式子空间聚类算法。

与传统的聚类算法不同，FASISS能够在数据增量的情况下进行高效的子空间聚类。

本文将对FASISS算法进行详细介绍，并逐步回答与该算法相关的问题。

1. 什么是聚类算法？聚类算法是一种将数据分为多个组别的无监督学习方法。

聚类算法旨在通过将具有相似特征的数据点分组，来揭示数据的内在结构，帮助我们更好地理解数据。

2. 什么是子空间聚类？子空间聚类是一种基于数据点在不同特征空间中的分布进行聚类的方法。

相比传统聚类算法，子空间聚类更适用于高维数据，因为它能够考虑到数据在不同维度上的相关性。

3. FASISS算法的原理是什么？FASISS算法的核心原理是基于局部距离和全局距离相结合的增量式子空间聚类。

具体来说，FASISS使用一种称为距离累积的方法来衡量数据点之间的相似性，并通过管道机制将新的数据点逐步地添加到聚类中。

4. FASISS算法的步骤是什么？FASISS算法的步骤如下：- 步骤1：初始化阶段。

在此阶段，FASISS会选择一些数据点作为初始聚类中心，并计算它们之间的距离。

- 步骤2：增量式聚类阶段。

在此阶段，FASISS会逐步添加新的数据点，并将它们分配到合适的聚类中心。

对于每个新的数据点，FASISS会计算其局部距离和全局距离，并将其添加到距离最小的聚类中心。

- 步骤3：聚类更新阶段。

在此阶段，FASISS会更新聚类中心，并重新计算数据点之间的距离。

如果某个聚类中心变得不稳定，FASISS会将其剔除，并选择一个新的聚类中心。

5. FASISS算法与传统聚类算法的区别是什么？与传统聚类算法相比，FASISS算法有以下几个不同点：- FASISS算法是一种增量式聚类算法，可以高效地处理数据增量的情况。

- FASISS算法是基于子空间聚类的，能够应对高维数据，并考虑到数据在不同维度上的相关性。

人工智能词汇

常用英语词汇 -andrew Ng课程average firing rate均匀激活率intensity强度average sum-of-squares error均方差Regression回归backpropagation后向流传Loss function损失函数basis 基non-convex非凸函数basis feature vectors特点基向量neural network神经网络batch gradient ascent批量梯度上涨法supervised learning监察学习Bayesian regularization method贝叶斯规则化方法regression problem回归问题办理的是连续的问题Bernoulli random variable伯努利随机变量classification problem分类问题bias term偏置项discreet value失散值binary classfication二元分类support vector machines支持向量机class labels种类标记learning theory学习理论concatenation级联learning algorithms学习算法conjugate gradient共轭梯度unsupervised learning无监察学习contiguous groups联通地区gradient descent梯度降落convex optimization software凸优化软件linear regression线性回归convolution卷积Neural Network神经网络cost function代价函数gradient descent梯度降落covariance matrix协方差矩阵normal equations DC component直流重量linear algebra线性代数decorrelation去有关superscript上标degeneracy退化exponentiation指数demensionality reduction降维training set训练会合derivative导函数training example训练样本diagonal对角线hypothesis假定，用来表示学习算法的输出diffusion of gradients梯度的弥散LMS algorithm “least mean squares最小二乘法算eigenvalue特点值法eigenvector特点向量batch gradient descent批量梯度降落error term残差constantly gradient descent随机梯度降落feature matrix特点矩阵iterative algorithm迭代算法feature standardization特点标准化partial derivative偏导数feedforward architectures前馈构造算法contour等高线feedforward neural network前馈神经网络quadratic function二元函数feedforward pass前馈传导locally weighted regression局部加权回归fine-tuned微调underfitting欠拟合first-order feature一阶特点overfitting过拟合forward pass前向传导non-parametric learning algorithms无参数学习算forward propagation前向流传法Gaussian prior高斯先验概率parametric learning algorithm参数学习算法generative model生成模型activation激活值gradient descent梯度降落activation function激活函数Greedy layer-wise training逐层贪心训练方法additive noise加性噪声grouping matrix分组矩阵autoencoder自编码器Hadamard product阿达马乘积Autoencoders自编码算法Hessian matrix Hessian矩阵hidden layer隐含层hidden units隐蔽神经元Hierarchical grouping层次型分组higher-order features更高阶特点highly non-convex optimization problem高度非凸的优化问题histogram直方图hyperbolic tangent双曲正切函数hypothesis估值，假定identity activation function恒等激励函数IID 独立同散布illumination照明inactive克制independent component analysis独立成份剖析input domains输入域input layer输入层intensity亮度/灰度intercept term截距KL divergence相对熵KL divergence KL分别度k-Means K-均值learning rate学习速率least squares最小二乘法linear correspondence线性响应linear superposition线性叠加line-search algorithm线搜寻算法local mean subtraction局部均值消减local optima局部最优解logistic regression逻辑回归loss function损失函数low-pass filtering低通滤波magnitude幅值MAP 极大后验预计maximum likelihood estimation极大似然预计mean 均匀值MFCC Mel 倒频系数multi-class classification多元分类neural networks神经网络neuron 神经元Newton’s method牛顿法non-convex function非凸函数non-linear feature非线性特点norm 范式norm bounded有界范数norm constrained范数拘束normalization归一化numerical roundoff errors数值舍入偏差numerically checking数值查验numerically reliable数值计算上稳固object detection物体检测objective function目标函数off-by-one error缺位错误orthogonalization正交化output layer输出层overall cost function整体代价函数over-complete basis超齐备基over-fitting过拟合parts of objects目标的零件part-whole decompostion部分-整体分解PCA 主元剖析penalty term处罚因子per-example mean subtraction逐样本均值消减pooling池化pretrain预训练principal components analysis主成份剖析quadratic constraints二次拘束RBMs 受限 Boltzman 机reconstruction based models鉴于重构的模型reconstruction cost重修代价reconstruction term重构项redundant冗余reflection matrix反射矩阵regularization正则化regularization term正则化项rescaling缩放robust 鲁棒性run 行程second-order feature二阶特点sigmoid activation function S型激励函数significant digits有效数字singular value奇怪值singular vector奇怪向量smoothed L1 penalty光滑的L1 范数处罚Smoothed topographic L1 sparsity penalty光滑地形L1 稀少处罚函数smoothing光滑Softmax Regresson Softmax回归sorted in decreasing order降序摆列source features源特点Adversarial Networks抗衡网络sparse autoencoder消减归一化Affine Layer仿射层Sparsity稀少性Affinity matrix亲和矩阵sparsity parameter稀少性参数Agent 代理 /智能体sparsity penalty稀少处罚Algorithm 算法square function平方函数Alpha- beta pruningα - β剪枝squared-error方差Anomaly detection异样检测stationary安稳性（不变性）Approximation近似stationary stochastic process安稳随机过程Area Under ROC Curve／ AUC Roc 曲线下边积step-size步长值Artificial General Intelligence/AGI通用人工智supervised learning监察学习能symmetric positive semi-definite matrix Artificial Intelligence/AI人工智能对称半正定矩阵Association analysis关系剖析symmetry breaking对称无效Attention mechanism注意力体制tanh function双曲正切函数Attribute conditional independence assumptionthe average activation均匀活跃度属性条件独立性假定the derivative checking method梯度考证方法Attribute space属性空间the empirical distribution经验散布函数Attribute value属性值the energy function能量函数Autoencoder自编码器the Lagrange dual拉格朗日对偶函数Automatic speech recognition自动语音辨别the log likelihood对数似然函数Automatic summarization自动纲要the pixel intensity value像素灰度值Average gradient均匀梯度the rate of convergence收敛速度Average-Pooling均匀池化topographic cost term拓扑代价项Backpropagation Through Time经过时间的反向流传topographic ordered拓扑次序Backpropagation/BP反向流传transformation变换Base learner基学习器translation invariant平移不变性Base learning algorithm基学习算法trivial answer平庸解Batch Normalization/BN批量归一化under-complete basis不齐备基Bayes decision rule贝叶斯判断准则unrolling组合扩展Bayes Model Averaging／ BMA 贝叶斯模型均匀unsupervised learning无监察学习Bayes optimal classifier贝叶斯最优分类器variance 方差Bayesian decision theory贝叶斯决议论vecotrized implementation向量化实现Bayesian network贝叶斯网络vectorization矢量化Between-class scatter matrix类间散度矩阵visual cortex视觉皮层Bias 偏置 /偏差weight decay权重衰减Bias-variance decomposition偏差 - 方差分解weighted average加权均匀值Bias-Variance Dilemma偏差–方差窘境whitening白化Bi-directional Long-Short Term Memory/Bi-LSTMzero-mean均值为零双向长短期记忆Accumulated error backpropagation积累偏差逆传Binary classification二分类播Binomial test二项查验Activation Function激活函数Bi-partition二分法Adaptive Resonance Theory/ART自适应谐振理论Boltzmann machine玻尔兹曼机Addictive model加性学习Bootstrap sampling自助采样法／可重复采样Bootstrapping自助法Break-Event Point／ BEP 均衡点Calibration校准Cascade-Correlation级联有关Categorical attribute失散属性Class-conditional probability类条件概率Classification and regression tree/CART分类与回归树Classifier分类器Class-imbalance类型不均衡Closed -form闭式Cluster簇/ 类/ 集群Cluster analysis聚类剖析Clustering聚类Clustering ensemble聚类集成Co-adapting共适应Coding matrix编码矩阵COLT 国际学习理论会议Committee-based learning鉴于委员会的学习Competitive learning竞争型学习Component learner组件学习器Comprehensibility可解说性Computation Cost计算成本Computational Linguistics计算语言学Computer vision计算机视觉Concept drift观点漂移Concept Learning System /CLS观点学习系统Conditional entropy条件熵Conditional mutual information条件互信息Conditional Probability Table／ CPT 条件概率表Conditional random field/CRF条件随机场Conditional risk条件风险Confidence置信度Confusion matrix混杂矩阵Connection weight连结权Connectionism 连结主义Consistency一致性／相合性Contingency table列联表Continuous attribute连续属性Convergence收敛Conversational agent会话智能体Convex quadratic programming凸二次规划Convexity凸性Convolutional neural network/CNN卷积神经网络Co-occurrence同现Correlation coefficient有关系数Cosine similarity余弦相像度Cost curve成本曲线Cost Function成本函数Cost matrix成本矩阵Cost-sensitive成本敏感Cross entropy交错熵Cross validation交错考证Crowdsourcing众包Curse of dimensionality维数灾害Cut point截断点Cutting plane algorithm割平面法Data mining数据发掘Data set数据集Decision Boundary决议界限Decision stump决议树桩Decision tree决议树／判断树Deduction演绎Deep Belief Network深度信念网络Deep Convolutional Generative Adversarial NetworkDCGAN深度卷积生成抗衡网络Deep learning深度学习Deep neural network/DNN深度神经网络Deep Q-Learning深度Q 学习Deep Q-Network深度Q 网络Density estimation密度预计Density-based clustering密度聚类Differentiable neural computer可微分神经计算机Dimensionality reduction algorithm降维算法Directed edge有向边Disagreement measure不合胸怀Discriminative model鉴别模型Discriminator鉴别器Distance measure距离胸怀Distance metric learning距离胸怀学习Distribution散布Divergence散度Diversity measure多样性胸怀／差别性胸怀Domain adaption领域自适应Downsampling下采样D-separation（ Directed separation）有向分别Dual problem对偶问题Dummy node 哑结点General Problem Solving通用问题求解Dynamic Fusion 动向交融Generalization泛化Dynamic programming动向规划Generalization error泛化偏差Eigenvalue decomposition特点值分解Generalization error bound泛化偏差上界Embedding 嵌入Generalized Lagrange function广义拉格朗日函数Emotional analysis情绪剖析Generalized linear model广义线性模型Empirical conditional entropy经验条件熵Generalized Rayleigh quotient广义瑞利商Empirical entropy经验熵Generative Adversarial Networks/GAN生成抗衡网Empirical error经验偏差络Empirical risk经验风险Generative Model生成模型End-to-End 端到端Generator生成器Energy-based model鉴于能量的模型Genetic Algorithm/GA遗传算法Ensemble learning集成学习Gibbs sampling吉布斯采样Ensemble pruning集成修剪Gini index基尼指数Error Correcting Output Codes／ ECOC纠错输出码Global minimum全局最小Error rate错误率Global Optimization全局优化Error-ambiguity decomposition偏差 - 分歧分解Gradient boosting梯度提高Euclidean distance欧氏距离Gradient Descent梯度降落Evolutionary computation演化计算Graph theory图论Expectation-Maximization希望最大化Ground-truth实情／真切Expected loss希望损失Hard margin硬间隔Exploding Gradient Problem梯度爆炸问题Hard voting硬投票Exponential loss function指数损失函数Harmonic mean 调解均匀Extreme Learning Machine/ELM超限学习机Hesse matrix海塞矩阵Factorization因子分解Hidden dynamic model隐动向模型False negative假负类Hidden layer隐蔽层False positive假正类Hidden Markov Model/HMM 隐马尔可夫模型False Positive Rate/FPR假正例率Hierarchical clustering层次聚类Feature engineering特点工程Hilbert space希尔伯特空间Feature selection特点选择Hinge loss function合页损失函数Feature vector特点向量Hold-out 留出法Featured Learning特点学习Homogeneous 同质Feedforward Neural Networks/FNN前馈神经网络Hybrid computing混杂计算Fine-tuning微调Hyperparameter超参数Flipping output翻转法Hypothesis假定Fluctuation震荡Hypothesis test假定考证Forward stagewise algorithm前向分步算法ICML 国际机器学习会议Frequentist频次主义学派Improved iterative scaling/IIS改良的迭代尺度法Full-rank matrix满秩矩阵Incremental learning增量学习Functional neuron功能神经元Independent and identically distributed/独Gain ratio增益率立同散布Game theory博弈论Independent Component Analysis/ICA独立成分剖析Gaussian kernel function高斯核函数Indicator function指示函数Gaussian Mixture Model高斯混杂模型Individual learner个体学习器Induction归纳Inductive bias归纳偏好Inductive learning归纳学习Inductive Logic Programming／ ILP归纳逻辑程序设计Information entropy信息熵Information gain信息增益Input layer输入层Insensitive loss不敏感损失Inter-cluster similarity簇间相像度International Conference for Machine Learning/ICML国际机器学习大会Intra-cluster similarity簇内相像度Intrinsic value固有值Isometric Mapping/Isomap等胸怀映照Isotonic regression平分回归Iterative Dichotomiser迭代二分器Kernel method核方法Kernel trick核技巧Kernelized Linear Discriminant Analysis／KLDA核线性鉴别剖析K-fold cross validation k折交错考证／k 倍交错考证K-Means Clustering K–均值聚类K-Nearest Neighbours Algorithm/KNN K近邻算法Knowledge base 知识库Knowledge Representation知识表征Label space标记空间Lagrange duality拉格朗日对偶性Lagrange multiplier拉格朗日乘子Laplace smoothing拉普拉斯光滑Laplacian correction拉普拉斯修正Latent Dirichlet Allocation隐狄利克雷散布Latent semantic analysis潜伏语义剖析Latent variable隐变量Lazy learning懒散学习Learner学习器Learning by analogy类比学习Learning rate学习率Learning Vector Quantization/LVQ学习向量量化Least squares regression tree最小二乘回归树Leave-One-Out/LOO留一法linear chain conditional random field线性链条件随机场Linear Discriminant Analysis／ LDA 线性鉴别剖析Linear model线性模型Linear Regression线性回归Link function联系函数Local Markov property局部马尔可夫性Local minimum局部最小Log likelihood对数似然Log odds／ logit对数几率Logistic Regression Logistic回归Log-likelihood对数似然Log-linear regression对数线性回归Long-Short Term Memory/LSTM 长短期记忆Loss function损失函数Machine translation/MT机器翻译Macron-P宏查准率Macron-R宏查全率Majority voting绝对多半投票法Manifold assumption流形假定Manifold learning流形学习Margin theory间隔理论Marginal distribution边沿散布Marginal independence边沿独立性Marginalization边沿化Markov Chain Monte Carlo/MCMC马尔可夫链蒙特卡罗方法Markov Random Field马尔可夫随机场Maximal clique最大团Maximum Likelihood Estimation/MLE极大似然预计／极大似然法Maximum margin最大间隔Maximum weighted spanning tree最大带权生成树Max-Pooling 最大池化Mean squared error均方偏差Meta-learner元学习器Metric learning胸怀学习Micro-P微查准率Micro-R微查全率Minimal Description Length/MDL最小描绘长度Minimax game极小极大博弈Misclassification cost误分类成本Mixture of experts混杂专家Momentum 动量Moral graph道德图／正直图Multi-class classification多分类Multi-document summarization多文档纲要One shot learning一次性学习Multi-layer feedforward neural networks One-Dependent Estimator／ ODE 独依靠预计多层前馈神经网络On-Policy在策略Multilayer Perceptron/MLP多层感知器Ordinal attribute有序属性Multimodal learning多模态学习Out-of-bag estimate包外预计Multiple Dimensional Scaling多维缩放Output layer输出层Multiple linear regression多元线性回归Output smearing输出调制法Multi-response Linear Regression／ MLR Overfitting过拟合／过配多响应线性回归Oversampling 过采样Mutual information互信息Paired t-test成对 t查验Naive bayes 朴实贝叶斯Pairwise 成对型Naive Bayes Classifier朴实贝叶斯分类器Pairwise Markov property成对马尔可夫性Named entity recognition命名实体辨别Parameter参数Nash equilibrium纳什均衡Parameter estimation参数预计Natural language generation/NLG自然语言生成Parameter tuning调参Natural language processing自然语言办理Parse tree分析树Negative class负类Particle Swarm Optimization/PSO粒子群优化算法Negative correlation负有关法Part-of-speech tagging词性标明Negative Log Likelihood负对数似然Perceptron感知机Neighbourhood Component Analysis/NCA Performance measure性能胸怀近邻成分剖析Plug and Play Generative Network即插即用生成网Neural Machine Translation神经机器翻译络Neural Turing Machine神经图灵机Plurality voting相对多半投票法Newton method牛顿法Polarity detection极性检测NIPS 国际神经信息办理系统会议Polynomial kernel function多项式核函数No Free Lunch Theorem／ NFL 没有免费的午饭定理Pooling池化Noise-contrastive estimation噪音对照预计Positive class正类Nominal attribute列名属性Positive definite matrix正定矩阵Non-convex optimization非凸优化Post-hoc test后续查验Nonlinear model非线性模型Post-pruning后剪枝Non-metric distance非胸怀距离potential function势函数Non-negative matrix factorization非负矩阵分解Precision查准率／正确率Non-ordinal attribute无序属性Prepruning 预剪枝Non-Saturating Game非饱和博弈Principal component analysis/PCA主成分剖析Norm 范数Principle of multiple explanations多释原则Normalization归一化Prior 先验Nuclear norm核范数Probability Graphical Model概率图模型Numerical attribute数值属性Proximal Gradient Descent/PGD近端梯度降落Letter O Pruning剪枝Objective function目标函数Pseudo-label伪标记Oblique decision tree斜决议树Quantized Neural Network量子化神经网络Occam’s razor奥卡姆剃刀Quantum computer 量子计算机Odds 几率Quantum Computing量子计算Off-Policy离策略Quasi Newton method拟牛顿法Radial Basis Function／ RBF 径向基函数Random Forest Algorithm随机丛林算法Random walk随机闲步Recall 查全率／召回率Receiver Operating Characteristic/ROC受试者工作特点Rectified Linear Unit/ReLU线性修正单元Recurrent Neural Network循环神经网络Recursive neural network递归神经网络Reference model 参照模型Regression回归Regularization正则化Reinforcement learning/RL加强学习Representation learning表征学习Representer theorem表示定理reproducing kernel Hilbert space/RKHS重生核希尔伯特空间Re-sampling重采样法Rescaling再缩放Residual Mapping残差映照Residual Network残差网络Restricted Boltzmann Machine/RBM受限玻尔兹曼机Restricted Isometry Property/RIP限制等距性Re-weighting重赋权法Robustness稳重性 / 鲁棒性Root node根结点Rule Engine规则引擎Rule learning规则学习Saddle point鞍点Sample space样本空间Sampling采样Score function评分函数Self-Driving自动驾驶Self-Organizing Map／ SOM自组织映照Semi-naive Bayes classifiers半朴实贝叶斯分类器Semi-Supervised Learning半监察学习semi-Supervised Support Vector Machine半监察支持向量机Sentiment analysis感情剖析Separating hyperplane分别超平面Sigmoid function Sigmoid函数Similarity measure相像度胸怀Simulated annealing模拟退火Simultaneous localization and mapping同步定位与地图建立Singular Value Decomposition奇怪值分解Slack variables废弛变量Smoothing光滑Soft margin软间隔Soft margin maximization软间隔最大化Soft voting软投票Sparse representation稀少表征Sparsity稀少性Specialization特化Spectral Clustering谱聚类Speech Recognition语音辨别Splitting variable切分变量Squashing function挤压函数Stability-plasticity dilemma可塑性 - 稳固性窘境Statistical learning统计学习Status feature function状态特点函Stochastic gradient descent随机梯度降落Stratified sampling分层采样Structural risk构造风险Structural risk minimization/SRM构造风险最小化Subspace子空间Supervised learning监察学习／有导师学习support vector expansion支持向量展式Support Vector Machine/SVM支持向量机Surrogat loss代替损失Surrogate function代替函数Symbolic learning符号学习Symbolism符号主义Synset同义词集T-Distribution Stochastic Neighbour Embeddingt-SNE T–散布随机近邻嵌入Tensor 张量Tensor Processing Units/TPU张量办理单元The least square method最小二乘法Threshold阈值Threshold logic unit阈值逻辑单元Threshold-moving阈值挪动Time Step时间步骤Tokenization标记化Training error训练偏差Training instance训练示例／训练例Transductive learning直推学习Transfer learning迁徙学习Treebank树库algebra线性代数Tria-by-error试错法asymptotically无症状的True negative真负类appropriate适合的True positive真切类bias 偏差True Positive Rate/TPR真切例率brevity简洁，简洁；短暂Turing Machine图灵机[800 ] broader宽泛Twice-learning二次学习briefly简洁的Underfitting欠拟合／欠配batch 批量Undersampling欠采样convergence收敛，集中到一点Understandability可理解性convex凸的Unequal cost非均等代价contours轮廓Unit-step function单位阶跃函数constraint拘束Univariate decision tree单变量决议树constant常理Unsupervised learning无监察学习／无导师学习commercial商务的Unsupervised layer-wise training无监察逐层训练complementarity增补Upsampling上采样coordinate ascent同样级上涨Vanishing Gradient Problem梯度消逝问题clipping剪下物；剪报；修剪Variational inference变分推测component重量；零件VC Theory VC维理论continuous连续的Version space版本空间covariance协方差Viterbi algorithm维特比算法canonical正规的，正则的Von Neumann architecture冯· 诺伊曼架构concave非凸的Wasserstein GAN/WGAN Wasserstein生成抗衡网络corresponds相切合；相当；通讯Weak learner弱学习器corollary推论Weight权重concrete详细的事物，实在的东西Weight sharing权共享cross validation交错考证Weighted voting加权投票法correlation互相关系Within-class scatter matrix类内散度矩阵convention商定Word embedding词嵌入cluster一簇Word sense disambiguation词义消歧centroids质心，形心Zero-data learning零数据学习converge收敛Zero-shot learning零次学习computationally计算(机)的approximations近似值calculus计算arbitrary任意的derive获取，获得affine仿射的dual 二元的arbitrary任意的duality二元性；二象性；对偶性amino acid氨基酸derivation求导；获取；发源amenable 经得起查验的denote预示，表示，是的标记；意味着，[逻]指称axiom 公义，原则divergence散度；发散性abstract提取dimension尺度，规格；维数architecture架构，系统构造；建筑业dot 小圆点absolute绝对的distortion变形arsenal军械库density概率密度函数assignment分派discrete失散的人工智能词汇discriminative有辨别能力的indicator指示物，指示器diagonal对角interative重复的，迭代的dispersion分别，散开integral积分determinant决定要素identical相等的；完整同样的disjoint不订交的indicate表示，指出encounter碰到invariance不变性，恒定性ellipses椭圆impose把强加于equality等式intermediate中间的extra 额外的interpretation解说，翻译empirical经验；察看joint distribution结合概率ennmerate例举，计数lieu 代替exceed超出，越出logarithmic对数的，用对数表示的expectation希望latent潜伏的efficient奏效的Leave-one-out cross validation留一法交错考证endow 给予magnitude巨大explicitly清楚的mapping 画图，制图；映照exponential family指数家族matrix矩阵equivalently等价的mutual互相的，共同的feasible可行的monotonically单一的forary首次试试minor较小的，次要的finite有限的，限制的multinomial多项的forgo 摒弃，放弃multi-class classification二分类问题fliter过滤nasty厌烦的frequentist最常发生的notation标记，说明forward search前向式搜寻na?ve 朴实的formalize使定形obtain获取generalized归纳的oscillate摇动generalization归纳，归纳；广泛化；判断（依据不optimization problem最优化问题足）objective function目标函数guarantee保证；抵押品optimal最理想的generate形成，产生orthogonal(矢量，矩阵等 ) 正交的geometric margins几何界限orientation方向gap 裂口ordinary一般的generative生产的；有生产力的occasionally有时的heuristic启迪式的；启迪法；启迪程序partial derivative偏导数hone 怀恋；磨property性质hyperplane超平面proportional成比率的initial最先的primal原始的，最先的implement履行permit同意intuitive凭直觉获知的pseudocode 伪代码incremental增添的permissible可同意的intercept截距polynomial多项式intuitious直觉preliminary预备instantiation例子precision精度人工智能词汇perturbation不安，搅乱theorem定理poist 假定，假想tangent正弦positive semi-definite半正定的unit-length vector单位向量parentheses圆括号valid 有效的，正确的posterior probability后验概率variance方差plementarity增补variable变量；变元pictorially图像的vocabulary 词汇parameterize确立的参数valued经估价的；可贵的poisson distribution柏松散布wrapper 包装pertinent有关的总计 1038 词汇quadratic二次的quantity量，数目；重量query 疑问的regularization使系统化；调整reoptimize从头优化restrict限制；限制；拘束reminiscent回想旧事的；提示的；令人联想的（ of ）remark 注意random variable随机变量respect考虑respectively各自的；分其他redundant过多的；冗余的susceptible敏感的stochastic可能的；随机的symmetric对称的sophisticated复杂的spurious假的；假造的subtract减去；减法器simultaneously同时发生地；同步地suffice知足scarce罕有的，难得的split分解，分别subset子集statistic统计量successive iteratious连续的迭代scale标度sort of有几分的squares 平方trajectory轨迹temporarily临时的terminology专用名词tolerance容忍；公差thumb翻阅threshold阈，临界。

半监督深度学习图像分类方法研究综述

半监督深度学习图像分类方法研究综述吕昊远+，俞璐，周星宇，邓祥陆军工程大学通信工程学院，南京210007+通信作者E-mail:*******************摘要：作为人工智能领域近十年来最受关注的技术之一，深度学习在诸多应用中取得了优异的效果，但目前的学习策略严重依赖大量的有标记数据。

在许多实际问题中，获得众多有标记的训练数据并不可行，因此加大了模型的训练难度，但容易获得大量无标记的数据。

半监督学习充分利用无标记数据，提供了在有限标记数据条件下提高模型性能的解决思路和有效方法，在图像分类任务中达到了很高的识别精准度。

首先对于半监督学习进行概述，然后介绍了分类算法中常用的基本思想，重点对近年来基于半监督深度学习框架的图像分类方法，包括多视图训练、一致性正则、多样混合和半监督生成对抗网络进行全面的综述，总结多种方法共有的技术，分析比较不同方法的实验效果差异，最后思考当前存在的问题并展望未来可行的研究方向。

关键词：半监督深度学习；多视图训练；一致性正则；多样混合；半监督生成对抗网络文献标志码：A中图分类号：TP391.4Review of Semi-supervised Deep Learning Image Classification MethodsLYU Haoyuan +,YU Lu,ZHOU Xingyu,DENG XiangCollege of Communication Engineering,Army Engineering University of PLA,Nanjing 210007,ChinaAbstract:As one of the most concerned technologies in the field of artificial intelligence in recent ten years,deep learning has achieved excellent results in many applications,but the current learning strategies rely heavily on a large number of labeled data.In many practical problems,it is not feasible to obtain a large number of labeled training data,so it increases the training difficulty of the model.But it is easy to obtain a large number of unlabeled data.Semi-supervised learning makes full use of unlabeled data,provides solutions and effective methods to improve the performance of the model under the condition of limited labeled data,and achieves high recognition accuracy in the task of image classification.This paper first gives an overview of semi-supervised learning,and then introduces the basic ideas commonly used in classification algorithms.It focuses on the comprehensive review of image classification methods based on semi-supervised deep learning framework in recent years,including multi-view training,consistency regularization,diversity mixing and semi-supervised generative adversarial networks.It summarizes the common technologies of various methods,analyzes and compares the differences of experimental results of different methods.Finally,this paper thinks about the existing problems and looks forward to the feasible research direction in the future.Key words:semi-supervised deep learning;multi-view training;consistency regularization;diversity mixing;semi-supervised generative adversarial networks计算机科学与探索1673-9418/2021/15(06)-1038-11doi:10.3778/j.issn.1673-9418.2011020基金项目：国家自然科学基金(61702543)。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Categories and Subject Descriptors
H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing – Algorithms, Indexing methods.
General Terms
1. INTRODUCTION
Image representation and indexing have been fundamental problems for efficient clustering [4][7], classification and retrieval [3][12][13][14]. An image can be represented as a point in the vector space Rn. Throughout this paper, we denote by image
Subspace learning techniques are widespread in pattern recognition research. They include Principal Component Analysis (PCA), Locality Preserving Projection (LPP), etc. These techniques are generally unsupervised which allows them to model data in the absence of labels or categories. In relevance feedback driven image retrieval system, the user provided information can be used to better describe the intrinsic semantic relationships between images. In this paper, we propose a semi-supervised subspace learning algorithm which incrementally learns an adaptive subspace by preserving the semantic structure of the image space, based on user interactions in a relevance feedback driven query-by-example system. Our algorithm is capable of accumulating knowledge from users, which could result in new feature representations for images in the database so that the system’s future retrieval performance can be enhanced. Experiments on a large collection of images have shown the effectiveness and efficiency of our proposed algorithm. space the set of all image vectors. The image space is typically a subspace of Rn, either linear or non-linear. It would be optimal to utilize clustering, classification and retrieval techniques in the subspace rather than the ambient space. The typical subspace learning algorithms used for image indexing include Linear Discriminant Analysis (LDA) [1][15], Principal Component Analysis (PCA) [14][18], Locality Preserving Projections (LPP) [9][10], etc. PCA is an eigenvector method designed to model linear variation in high-dimensional data. PCA performs dimensionality reduction by projecting the original n-dimensional data onto the k (<< n)-dimensional linear subspace spanned by the leading eigenvectors of the data’s covariance matrix. Its goal is to find a set of mutually orthogonal basis functions that capture the directions of maximum variance in the data. If the image space is a linearly embedded manifold, PCA is guaranteed to uncover the dimensionality of the manifold and produces a compact representation. Unfortunately, the image space is probably highly nonlinear. In such a case, PCA fails to uncover the intrinsic manifold structure of the image space. Moreover, PCA is unsupervised. Therefore, the subspace obtained by PCA can not reflect human perception. Different from PCA which is unsupervised, LDA is a supervised learning algorithm. Instead of maximizing the overall variance, LDA maximize the variance between clusters, and minimize the variance inside clusters. LDA is optimal in the sense of classification. However, in image retrieval, the images in database are unlabelled. Therefore, LDA can not be simply applied to learn a semantic subspace for image retrieval. In this paper, we propose a semi-supervised algorithm to learn a semantic subspace for image retrieval based on LPP [9]. LPP is a linear dimensionality reduction algorithm. Different from PCA which implicitly assumes that the data space is Euclidean, LPP assumes that the data space is a manifold, either linear or nonlinear. LPP aims to preserve the local structure of the image space. Since the neighboring images (data points in high dimensional space) probably are related to the same semantics, LPP can have more discriminating power than PCA. To be specific, an adjacency graph is constructed to model the local structure of the image space. Once the graph structure is obtained, LPP finds a projection which respects the graph structure. LPP can be performed in either supervised or unsupervised manner. It depends on how the adjacency graph is constructed. During the process of image retrieval based on user’s relevance feedback, the accumulated knowledge can be incorporated into the adjacency graph, which could result in new feature representations for
Algorithms, Measurement, Performance, Experimentation, Theory.
Keywords
Locality Preserving Projections, Image Retrrning, Principal Component Analysis, Linear Discriminat Analysis