人工神经网络之英语大写字母识别

合集下载

《人工神经网络》课件

添加项标题
动量法：在梯度下降法的基础上，引入动量项，加速收敛速度
添加项标题
RMSProp：在AdaGrad的基础上，引入指数加权移动平均，提高了算法的稳定性和收敛速度
添加项标题
随机梯度下降法：在梯度下降法的基础上，每次只使用一个样本进行更新，提高了训练速度
添加项标题
AdaGrad：自适应学习率算法，根据历史梯度的平方和来调整学习率，解决了学习率衰减的问题
情感分析：分析文本中的情感倾向，如正面评价、负面评价等
推荐系统
推荐系统是一种基于用户历史行为和偏好的个性化推荐服务
推荐算法：协同过滤、深度学习、矩阵分解等
添加标题
添加标题
添加标题
添加标题
应用场景：电商、社交媒体、视频网站等
应用效果：提高用户满意度、增加用户粘性、提高转化率等
Part Six
类型：Sigmoid、 Tanh、ReLU等
特点：非线性、可微分
应用：深度学习、机器学习等领域
权重调整
权重调整的目的：优化神经网络的性能权重调整的方法：梯度下降法、随机梯度下降法等权重调整的步骤：计算损失函数、计算梯度、更新权重权重调整的影响因素：学习率、批次大小、优化器等
Part Four
《人工神经网络》PPT 课件
,
汇报人：
目录
01 添加目录项标题 03 神经网络基础知识 05 神经网络应用案例 07 未来发展趋势与挑战
02 人工神经网络概述 04 人工神经网络算法 06 神经网络优化与改进
深度学习算法
卷积神经网络（CNN）：用于图像处理和识别循环神经网络（RNN）：用于处理序列数据，如语音识别和自然语言处理长短期记忆网络（LSTM）：改进的RNN，用于处理长序列数据生成对抗网络（GAN）：用于生成新数据，如图像生成和文本生成

基于深度学习的手写英文字体识别研究

英文字体的类别信息，证明该算法具有良好的手写英文字体识别效果。
关键词：深度学习；手写英文字体；自动编码器；组合自编码网络
中图分类号：TP391
文献标识码：A
文章编号：1001-5922(2021)07-0084-04
Research on Handwritten English Font Recognition Baesd on Deep Learning
的数据特征，导致识别准确率不高，故本研究对该算法
进行了改进，将结合标准降噪自动编码与分类降噪自
动编码，形成组合自编码网络算法，以提高算法识别的
准确率。
1.2 算法改进组合自编码网络算法包括降噪自动编码器、分类
降噪自动编码器、组合特征分类器 3 个部分[7]。预训练过程中，降噪自动编码与分类降噪自动编码各自独立完成数据特征提取和类别特征提取，并按照特征比例进行拼接得到组合特征。然后，组合特征会进入分类器进行训练，此时，算法会根据最小化代价函数对模型参数进行更新。最后，通过一定迭代次数的训练，得到识别结果。组合自编码网络算法结构如图 1 所示。
ADHESION 粘学术论文接 Academic papers
数据信息与智能
收稿日期：2020-10-19 作者简介：高燕超（1986-）女，汉族，河北保定人，硕士，研究方向：英语翻译、英语信息化。
基于深度学习的手写英文字体识别研究
高燕超（宝鸡职业技术学院，宝鸡 721000）
摘要：针对化学信息手写英文字体识别准确率低，缺少类别信息的问题，本研究基于深度学习，在传统
是图像去噪常用的方法之一，其通过将原始图像像素
至与模板进行对应，计算出输出图像的像素值。
2.3 二值化处理

人工神经网络概述

2.1 感知器
单层感知器的学习法：
2.1 感知器
多层感知器：
在输入层和输出层之间加入一层或多层隐单元，构成多层感知器。提高感知器的分类能力。
两层感知器可以解决“异或”问题的分类及识别任一凸多边形或无界的凸区域。
更多层感知器网络，可识别更为复杂的图形。
2.2 BP网络
多层前馈网络的反向传播（BP）学习算法，简称BP 算法，是有导师的学习，它是梯度下降法在多层前馈网中的应用。
基本感知器
是一个具有单层计算神经元的两层网络。只能对线性可分输入矢量进行分类。
n个输入向量x1,x2, …, xn 均为实数，w1i，w2i,…,wni 分别是n个输入的连接权值，b是感知器的阈值，传递函数f一般是阶跃函数，y 是感知器的输出。通过对网络权值的训练，可以使感知器对一组输入矢量的响应成为0或1的目标输出，从而达到对输入矢量分类识别的目的。
网络结构见图，u、y是网络的输
入、输出向量，神经元用节点表示，网络由输入层、隐层和输出层节点组成，隐层可一层，也可多层（图中是单隐层），前层至后层节点通过权联接。由于用BP学习算法，所以常称BP神经网络。
2.2 BP网络
已知网络的输入/输出样本，即导师信号。
BP学习算法由正向传播和反向传播组成：
net.trainparam.goal=0.00001;
网络可能根本不能训
% 进行网络训练和仿真：
练或网络性能很差；
[net,tr]=train(net,X,Y);
若隐层节点数太多，
% 进行仿真预测
虽然可使网络的系统
XX1=[0.556 0.556 0.556 0.556 0.556 0.556 0.556] 误差减小，但一方面

基于BP神经网络的手写数字识别

基于BP神经网络的手写数字识别手写数字识别是人工智能领域中一个重要的研究方向。

它是指通过计算机对手写数字的图像进行识别和分类，从而实现对手写数字的自动识别。

BP神经网络是一种常用的模式识别方法，可以应用于手写数字识别任务中。

BP神经网络，全称为反向传播神经网络，是一种多层前馈神经网络。

其核心思想是通过训练来调整网络中连接权重的值，从而实现对输入模式的分类和识别。

BP神经网络由输入层、隐藏层和输出层组成，其中每个神经元与其他层的神经元相连。

手写数字识别任务的基本步骤如下：1. 数据预处理：需要对手写数字图像进行预处理，包括图像的灰度化、二值化、降噪等操作。

这样可以使得输入的图像数据更加规范化，便于网络的学习和训练。

2. 网络的构建：根据手写数字识别的需求，设计一个合适的BP神经网络结构。

一般来说，输入层的神经元数量与图像的像素数量相等，隐藏层的神经元数量可以根据实际情况进行设置，输出层的神经元数量一般为10，对应于0-9这10个数字的分类。

3. 训练网络：通过反向传播算法对网络进行训练。

随机初始化网络中的连接权重，并将输入的样本数据通过网络前向传播，得到网络的输出结果。

然后，计算输出结果与样本标签之间的误差，并根据误差调整网络中的连接权重。

通过多次迭代训练，直到网络的输出结果与样本标签的误差达到预定的阈值或者收敛。

4. 测试与评估：使用测试集对训练好的网络进行测试，并评估网络的性能。

可以计算识别准确率、召回率、精确率等指标，来评估网络的性能。

手写数字识别任务是一个典型的图像分类问题，其难点主要在于图像的非结构化和特征的高度变异性。

BP神经网络通过多次迭代训练，不断调整网络中的连接权重，可以逐渐提高网络的分类性能和准确度。

BP神经网络也存在一些问题，如容易陷入局部极小值、训练时间较长等。

为了提高手写数字识别任务的性能，可以采用一些改进的方法，如卷积神经网络（CNN）。

卷积神经网络通过引入卷积层和池化层，可以自动提取图像的局部特征，从而提高网络的特征表示能力和分类准确率。

bert-base-uncased和bert-base-cased的区别

bert-base-uncased和bert-base-cased的区别BERT (Bidirectional Encoder Representations from Transformers) 是一种预训练的神经网络模型，采用Transformer架构，能在各种自然语言处理任务中表现出色。

BERT有两个主要的预训练版本，即BERT-Base-Uncased和BERT-Base-Cased。

两者之间的区别在于：Uncased版本是对文本进行小写处理的，而Cased版本保留了原始文本的大小写信息。

BERT-Base-Uncased是基于小写文本的预训练模型。

在预处理阶段，将所有的文本转换为小写字母，即将文本中所有的大写字母转换成小写字母。

这样的预处理方式有助于减少模型的词汇大小，因为只保留了小写单词。

这意味着"Hello"和"HELLO"会被表示为相同的标记“hello”。

采用小写文本进行预训练有助于处理大小写不敏感的任务，例如情感分类或命名实体识别。

此外，Uncased版本的模型在预训练和微调过程中的模型大小较Cased版本较小，因为没有保留大写字母的额外信息。

BERT-Base-Cased是基于大小写文本的预训练模型。

在预处理阶段，不对文本进行大小写处理，保留了原始文本的大小写信息。

因此，"Hello"和"HELLO"会被看作两个不同的标记。

Cased版本的模型在预训练和微调过程中的模型大小相对较大，因为它保留了大写字母的额外信息。

这样的预处理方式适用于需要保留大小写信息的任务，例如命名实体识别或机器翻译。

两个版本的BERT模型在预训练和微调过程中的训练方式是一样的，唯一的区别就是在预处理阶段对文本大小写的处理。

Uncased版本通常在英语自然语言处理任务中表现较好，因为很多任务对大小写不敏感；而Cased版本在需要考虑大小写的任务中更加适用。

10 人工神经网络(ANN)方法简介(完整)

大脑含～1011个神经元，它们通过～ 1015个联结构成一个网络。每个神经元具有独立的接受、处理和传递电化学信号的能力，这种传递由神经通道来完成。
神经元的结构
树突从细胞体伸向其它神经元，神经元之间接受信号的联结点为突触。通过突触输入的信号起着兴奋/抑制作用。当细胞体接受的累加兴奋作用超过某阈值时，细胞进入兴奋状态，产生冲动，并由轴突输出。
x1
w1
x2 xn
w2 wn
wi xi
感知器的激活函数
神经元获得网络输入信号后，信号累计效果整合函数u(X)大于某阈值时，神经元处于激发状态；反之，神经元处于抑制状态。构造激活函数，用于表示这一转换过程。要求是[-1, 1]之间的单调递增函数。激活函数通常为3种类型，由此决定了神经元的输出特征。
第三阶段
突破性进展：1982年，CalTech的物理学家J. Hopfield提出Hopfield神经网络系统(HNNS)模型，提出能量函数的概念，用非线性动力学方法来研究 ANN，开拓了ANN用于联想记忆和优化计算的新途径； 1988年，McClelland和Rumelhart利用多层反馈学习算法解决了“异或（XOR）”问题。
§10.2 感知器（Perceptron） ——人工神经网络的基本构件
1、感知器的数学模型——MP模型
感知器（Perceptron）：最早被设计并实现的人工神经网络。 W. McCulloch和W. Pitts总结生物神经元的基本生理特征，提出一种简单的数学模型与构造方法，建立了阈值加权和模型，简称M-P模型(“A Logical Calculus Immanent in Nervous Activity”, Bulletin of Mathematical Biophysics, 1943(5): 115~133）。人工神经元模型是M-P模型的基础。

字母识别的神经网络方法

２实验方法
为一个１×３的行向量。还以上面几个字母量法，学习速率取为００，动量项系数取为５．５２１宇母的数字化裹征．为例：０５．，最大循环次数设为８０次。００我们设计的人工神经网络模型是用软件Ａ＝【００００１０００１０Ｏ０１１１人工神经网络模型的识别结果根据其输０１１１１１实现的，它只能处理数字信号，不能直接处理ｌ００１１１ＯＯＯ１０００出来决定，方法为：当人工神经网络模型认为
长处理输入与输出元素间存在复杂的多元非线性关系的问题【】４
在本文中。作者以神经网络应用软件Ｅｇｅｅ］ａｌ［中的模式识别模块为平台，设计了ｙ５反向传播型人工神经网络模型，将其应用于
ｌｌ０００ｌｌ１１１１ｌ０００
字母。ＡＮ模型的结构如图ｌＮ所示：
１０００１ｌ１１００】】ｌｌ
２６个英文大写字母的识别。
第二步。为了便于人工神经网络模型的
运算，将上述代表每个字母的５×７矩阵转变
网络的学习采用反向传播算法的附加动
有时当远方主机发送ＳＹＮ数据包过来
民邮电出版社，００１２０年０月．第一版．程之前、）包在内核通过路由算ｉ即将被时，（数据２芏；我们做简单丢弃的话，对方主机会持续发（】（Ｅｌ．ＭｉｈｌＬｎｘ系统安全实２美］ｌｎＬ．ｔｅ．ｉｕｅｃ１转发之前，和（）３本地进程向网络发送数据包送ＳＮ数据包过来，Ｙ直到多次超时失败为止。用手册．电予工业出版社．前三个关键ＨＯＫ位置发挥作用Ｏ因此我们可以用ＲＪＣａｇｔＥＥＴｔｒｅ来拒绝数据【】孔祥丰．ｌｃｗａｅＬｎｘ实用垒书．３Ｓａｋｒｉｕ电

人工神经网络

最近十多年来，人工神经网络的研究工作不断深入，已经取得了很大的进展，其在模式识别、智能机器人、自动控制、预测估计、生物、医学、经济等领域已成功地解决了许多现代计算机难以解决的实际问题，表现出了良好的智能特性。
神经元
如图所示 a1~an为输入向量的各个分量 w1~wn为神经元各个突触的权值 b为偏置 f为传递函数，通常为非线性函数。以下默认为hardlim() t为神经元输出数学表示 t=f(WA'+b) W为权向量 A为输入向量，A'为A向量的转置 b为偏置 f为传递函数
分类
根据学习环境不同，神经网络的学习方式可分为监督学习和非监督学习。在监督学习中，将训练样本的数据加到网络输入端，同时将相应的期望输出与网络输出相比较，得到误差信号，以此控制权值连接强度的调整，经多次训练后收敛到一个确定的权值。当样本情况发生变化时，经学习可以修改权值以适应新的环境。使用监督学习的神经网络模型有反传网络、感知器等。非监督学习时，事先不给定标准样本，直接将网络置于环境之中，学习阶段与工作阶段成为一体。此时，学习规律的变化服从连接权值的演变方程。非监督学习最简单的例子是Hebb 学习规则。竞争学习规则是一个更复杂的非监督学习的例子，它是根据已建立的聚类进行权值调整。自组织映射、适应谐振理论网络等都是与竞争学习有关的典型模型。
神经网络在很多领域已得到了很好的应用，但其需要研究的方面还很多。其中，具有分布存储、并行处理、自学习、自组织以及非线性映射等优点的神经网络与其他技术的结合以及由此而来的混合方法和混合系统，已经成为一大研究热点。由于其他方法也有它们各自的优点，所以将神经网络与其他方法相结合，取长补短，继而可以获得更好的应用效果。目前这方面工作有神经网络与模糊逻辑、专家系统、遗传算法、小波分析、混沌、粗集理论、分形理论、证据理论和灰色系统等的融合。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Hidden Unit Reduction of Artificial Neural Network on English Capital Letter RecognitionKietikul JEARANAITANAKIJDepartment of Computer EngineeringFaculty of EngineeringKing Mongkut‟s I nstitute of Technology LadkrabangBangkok, ThailandOuen PINNGERNDepartment of Computer EngineeringFaculty of EngineeringKing Mongkut‟s I nstitute of Technology LadkrabangBangkok, ThailandAbstract— We present an analysis on the minimum number of hidden units that is required to recognize English capital letters of the artificial neural network. The letter font that we use as a case study is the System font. In order to have the minimum number of hidden units, the number of input features has to be minimized. Firstly, we apply our heuristic for pruning unnecessary features from the data set. The small number of the remaining features leads the artificial neural network to have the small number of input units as well. The reason is a particular feature has a one-to-one mapping relationship onto the input unit. Next, the hidden units are pruned away from the network by using the hidden unit pruning heuristic. Both pruning heuristic is based on the notion of the information gain. They can efficiently prune away the unnecessary features and hidden units from the network. The experimental results show the minimum number of hidden units required to train the artificial neural network to recognize English capital letters in System font. In addition, the accuracy rate of the classification produced by the artificial neural network is practically high. As a result, the final artificial neural network that we produce is fantastically compact and reliable.Keywords—Artificial Neural Network, letter recognition, hidden unit, pruning, information gainI. I NTRODUCTIONAn artificial neural network can be defined as a model of reasoning based on the human brain. Recent developments on artificial neural network have been used widely in character recognition because of its ability to generalize well on the unseen patterns [1-8]. Recognition of both printed and handwritten letters is a typical domain where neural networks have been successfully applied. Letter recognition or in common, called OCR (Optical Character Recognition) is the ability of a computer to translate character images into a text file, using special software. It allows us to take a printed document and put it into a computer in editable form without the need of retyping the document (Negnevitsky, 2002, [9]).One issue of the letter recognition that uses the artificial neural network as the learning model is the suitable number of hidden units. The number of neurons in the hidden layer affects both the accuracy of character recognition and the speed of the training the network. Complex patterns cannot be detected by a small number of hidden units; however too many of them can unpleasantly increase the computational burden. Another problem is overfitting. The greater the number of hidden units, the greater the ability of the network to recognize existing patterns. However, if the number of hidden units is too big, the network might simply memorize all training examples. This may prevent it from generalizing, or producing incorrect outputs when presented with pattern that was not used in training.There are some proposed methods that can be used to reduce the number of hidden units in the artificial neural network. Sietsma and Dow [10], [11] suggested an interactive method in which they inspect a trained network and identify a hidden unit that has a constant activation over all training patterns. Then, the hidden unit which does not influence the output is pruned away. Murase et al. [12] measured the Goodness Factors of the hidden units in the trained network. The unit which has the lowest value of the Goodness Factor is removed from the hidden layer. Hagiwara [13] presented the Consuming Energy and the Weights Power methods for removal of both hidden units and weights, respectively. Jearanaitanakij and Pinngern [14] proposed the information- gain based pruning heuristic that can efficiently remove unnecessary hidden units within the nearly minimum period of time.In this paper, we analyze the reduction of hidden units of the artificial neural network for recognizing English capital letters that are printed in System font. There are 10x10 pixels for a particular image of English capital letter. Each pixel (or feature) is represented by either …1‟ or …0‟. Our objective is to determine the minimum number of hidden units that is required to classify the 26-English letters with the practical recognition rate. Firstly, unnecessary features are filtered out of the data set by the feature pruning heuristic [14]. Then the hidden unit pruning heuristic [14] is utilized in order to find a suitable number of hidden units. The analysis of the experimental results show exceeding low number of hidden units required to the classification process. In addition, the results support our heuristics [14] in terms of the compact network and the nearly minimum pruning time.The rest of this paper is organized into the following orders. In Section 2, we give a brief review about the information gain and our hidden unit pruning heuristic. In Section 3, the data set of English capital letters is described. Next, in Section 4, we describe the experimental results and analysis. Finally, in Section 5, the conclusions and possible future work are discussed.1-4244-0023-6/06/$20.00 ©2006 IEEE CIS 2006In jGain j kII. H IDDEN U NIT P RUNINGWe begin by briefly review the notion of information gain and our hidden unit pruning heuristic.A. Information GainEntropy, a measure commonly used in the information theory, characterizes the (im)purity of an arbitrary collection of examples. Given a collection S , containing examples with each of the C outcomes, the entropy of S isEntropy ( S ) =∑[ − p ( I ) log 2p ( I )], I ∈C(1)where p(I) is the proportion of S belonging to class I . Note that S is not a feature but an entire sample set. Entropy is 0 if all members of S belong to the same class. The scale of the entropy is 0 (purity) to 1 (impurity). The next measure is an information gain. This was first defined by Shannon and Weaver [15] to measure the expected reduction in entropy. For a particular feature A, Gain(S, A) means the information gain of the sample set S on the feature A and is defined by the following equation:Figure 1. Network notationsThe amount of information received at a hidden unit is the summation, on training patterns, of the total squared production Gain (S , A ) = Entropy (S ) −∑[(| Sv | / | S |).Entropy (S v )],v ∈ A(2)between weights, which connect from feature units to a hidden unit in a hidden layer, and information gains of all feature where Σ is the summation on all possible values (v ) of thefeature A ; Sv is the subset of S for which feature A has value v ; |Sv | is the number of elements in Sv , and; |S | is the number of elements in S.The merit of the information gain is that it indicates the units. Then the result is averaged over the number of training patterns and the number of feature units. We define the incoming information gains of the j -th hidden unit in n -th layer( Gain n) as the following: degree of significance that a particular feature has on the n=1 ×∑ ∑( n −1, n× n −1 ) 2 , (3)classification output. Therefore, the more information gain the Gain In j P × Iw i jPiGain i feature has, the more significance the feature gets. We always prefer the feature which has high value of information gain to those which have lower values.where P and I are the number of training patterns and the number of feature units in the (n-1)-th layer, respectively. This n Gain In j is, in turn, used for calculating the outgoingB. Hidden Unit Pruning Heuristicinformation gain of the j-th hidden unit. The degree of importance of a particular hidden unit can be determined by the We describe a hidden unit pruning heuristic (Jearanaitanakij outgoing information gain of the hidden unit ( G ain n ). The and Pinngern, 2005, [14]) used as ordering criterions for the hidden unit pruning in the artificial neural network. Before performing the hidden unit pruning, we must calculate the information gains of all features and then pass these gains to the hidden units in the next layer.The hidden unit pruning heuristic is based on thepropagated information gains from feature units. Before going further, let us define some notations used in this section such as Out j outgoing information gain of a particular hidden unit is the summation, on training patterns, of the total squared production between weights, which connect from the hidden unit to output units, and the incoming information gain of that hidden unit. Then the result is averaged over the number of training patterns and the number of output units. The outgoing information gain of the j-th hidden unit in the n-th layer ( n ) is given by: Out jinformation gain of feature unit i (Gain i ), incoming informationGain n=1× ∑ ∑( w n , n +1× Gain n ) 2 ,(4)gain of a hidden unit (Gain In ), outgoing information gain of a Out j P × OPkIn j hidden unit (Gain Out ), the weight from the i -th unit of the (n -1)- where O is the number of output units in the (n +1)-th layer. th layer to the j -th unit of the n -th layer ( n −1,n ), and, similarly, w ijthe weight from the j -th unit of the n -th layer to the k -th unit of Note that the number of training patterns, P , in both (3) and (4) is the number of training patterns that the network has seen so the (n +1)-th layer ( n , n +1). All notations are shown in Fig. 1.far. The hidden unit which has the lowest outgoing information w j kgain should be firstly removed from the trained network because it does not affect the convergence time for retraining the network that much. There is only one hidden unit removed at a time until the network cannot converge. Then, the last pruned unit and network configurations are restored.N u m b e r o f f e a t u r e s III. DATA SETThe data set used as the case study is the set of twenty-six English capital letters (A to Z) which are printed in System font. Each letter image is represented by 10x10 pixels. A particular pixel can be either on (…1‟) or off (…0‟). We scan each pixel in the enhanced letter image from top to bottom, from left to right, to locate the capital printed letter on the paper. An assumption has been made that the letters are separated clearly with each other.Figure 2. Image t ransforma ti onnoised and 5 noised images. Therefore, 260 letter images are used as the dataset. The dataset is randomly decomposed into 130 letter images for the training set, and 130 letter images for the test set. After a letter image has passed a transformation into an array of 10x10 binary-value features, the artificial neural network is brought into the training procedure. All features connect to input units by one-to-one relationship. The output units are encoded into 26 units, each stands for an English capital letter. For a particular classification, one of the 26 output units has value …1‟ whereas other output units must contain …0‟ as their values.IV.E XPERIMENTAL R ESULTSWe train the 26-letter data set with the initial artificial neural network which has 100 input units, 10 hidden units, and 26 output units. There is only one single hidden layer between the input and output layers. The learning algorithm used training process is the standard back-propagation algorithm (Rumelhart et al., 1986, [17]), without momentum. All the weights and thresholds are randomly reset into the range between -1 and +1. In order to obtain the highest recognition rate, the sum-squared error is set to be converged below 0.3. Note that the number of hidden unit, i.e. 10, here is not the minimum number but it is the number that allows the artificial neural network to converge easily. However, our goal is to find the minimum number of hidden units of the artificial neural network that still correctly classifies the patterns at high recognition rate.Since the number of hidden units depends on the number of input features, it is worthwhile to remove the feature units before the hidden unit pruning begins. The idea of feature removal is similar to the hidden unit pruning. Instead of using outgoing information gain, the information gain of every feature is used as the pruning criterion. When the initial network is trained, the feature which has the lowest information gain is firstly removed from the network. There is only one feature unit removed at a time until the network cannot converge. Then the final number of features is returned. The experimental result on the number of input features is depicted in Fig. 4.120 100 80 60 4020 Figure 3. 26 English capital letter images without noisesAs shown in Figure 2, all pixels in an extracted letter aretransformed to either …1‟ or …0‟. These pixels represent the features of the training set of the artificial neural network. For a particular pattern, there can be only one letter that corresponds to it. A set of non-noised 26 letter images is shown in Figure 3. In order to be realistic, we add more letter images which have noise probability of 0.05 in each pixel. Each letter has 5 non-0 15011001150120012501300135014001Number of epochsFigure 4. Number of features during the trainingThe number of features is constant at 100 units during the first 1473 training epochs. This is the duration that we train the initial neural network to get a convergence. When the networkN u m b e r o f h i d d e n u n i t s is trained, the number of features keeps decreasing until it settles down at 37. This means that the essential number of features for classifying the 26-letter of English capital letters in System font is 37. This is not only the minimum number of features, but also the number of features that still maintain the highest recognition rate of the artificial neural network.12 10 8 6 4 20 15011001150120012501300135014001Number of epochsFigure 5.Number of hidden units during the training processFig. 5 illustrates the number of hidden units during the training process. After the feature pruning has been done, at the 2000th epoch, the hidden unit which has the lowest outgoing information gain is pruned away. The hidden unit pruning process is seized, shown as a horizontal line, until the network is retrained. The hidden units are removed in succession until the network cannot be retrained. We discover that the final number of hidden units that maintains the highest recognition rate is 6.3.5 3 2.5 2 1.5 1 0.5 0Figure 6. Sum-squared error during the trainingFig. 6 shows the sum-squared error throughout the trainingprocess. The error gradually decreases from 4.75 to 0.3 within 1473 training epochs. When the network reaches a convergence point, the feature pruning process starts removing features one- by-one. This causes the sum-squared error increases slightly. However, the error suddenly decreases within a small period of time. The similar situation happens to the hidden unit pruning. The hidden unit pruning starts the task at the 2001st training epoch. At this point, the sum-squared error suddenly rises up to 3.5. This high error does not prolong the network training because the abrupt reduction of the error allows the network toconverge in no time. The ripples on the sum-squared errorindicate the places where the hidden unit reductions are taken. The pruning process finishes when the sum-squared error does not consecutively decrease within 500 epochs. Then the network is restored to the previous convergence point. The restoration of the network back to the previous convergence can explain why the sum-squared error at the end of the graph in Fig. 6 suddenly falls below 0.3.TABLE I. A CCURACY R ATE O N T HE T EST S ETTable I shows the accuracy rates on the test set between the conventional NN and our proposed method. The conventional approach has 10 hidden units and 100 input units, while the proposed method has 6 hidden units and 37 input units. We intend to use different number of hidden units in order to investigate that having the lower number of hidden units does not degrade the accuracy when classifying the unseen data. The result, in Table I, shows that the conventional NN has less accuracy rate than the proposed method. This can be explained as the effect of the overfitting problem that happens in the conventional NN. By having unnecessary hidden units, the conventional NN memorizes all training patterns instead of learning them. Moreover, our approach removes unimportant features from the original feature. This can filter some noises out of the training set.V.C ONCLUSIONSWe give an analysis of the hidden units that are necessary to recognize the English capital letters printed in System font. The 10x10 pixels in the letter image are the features that are passed into the input units of the artificial neural network. In the input layer, information gain indicates the degree of importance of a feature. The feature which has the smallest information gain is not important to the output classification and it should be ignored. As a result, we have the smallest number of epochs needed for retraining when that feature is pruned away. In the hidden layer, the hidden unit which has the smallest outgoing information gain tends to propagate rather small amount of information to the output units in the next layer. Consequently, removing that unit from the network gives little effect on the retraining time. The experimental result shows that the number of hidden units that is necessary to identify the 26 English capital letters in System font, which has 37 essential features, is 6 units. In addition, this small-sized artificial neural network gives a testing accuracy rate at 97.17%. Removing unnecessary hidden units reduces the overfitting problem that may occur to the network. If the network has too many hidden units, it will memorize all the training patterns, instead of learning them. This situation may prevent the network from generalizing, or producing incorrect outputs when presented with pattern that was not used in training.R EFERENCES[1] B. Widrow et al., “La yered neural nets for pattern recognition,”IEEETrans. On ASSP, vol. 36, no. 7, July 1988.[2] V.K. Govindan and A.P. Shivaprasad, “Character recognition –Review,”Pattern Recognition, vol. 23, no. 2, pp. 671-679, 1990.[3] B. Boser et al., “Hardware requirements for neural network patternclassifiers,” IEEE Micro, pp. 32-40, 1992.[4] A. Shustorovich and C.W. Thrasher, “Neural Network Positioning andClassification of Handwritten Characters,” Neural Networks vol. 9, no. 4, pp. 685-693, 1996.[5] R. Parekh, J. Yang and V. Honavar, “Constructive neural networklearning algorithms for pattern clas sification,”IEEE Transactions on Neural Networks, pp. 436-451, vol. 11. no. 2, 2000.[6] Kamruzzaman J., Kumagai Y., Mahfuzul Aziz S., “Characterrecognition by double backpropagation neural network,” Proceedings of IEEE Region 10 Annual Conference, Speech and Image Technologies for Computing and Telecommunications, vol. 1, pp. 411-414, 1997.[7] Kamruzzaman, J., “Co mparison of feed-forward neural net algorithms inapplication to character recognition,” Proceedings of IEEE Region 10 International Conference on Electrical and Electronic Technology, vol. 1, pp. 165-169, 2001.[8] Jacquet, D., Saucier, G., “Design of a digital neural chip: application tooptical character recognition by neural network,” Proceedings European Design and Test Conference, pp. 256-260, 1994. [9] Negnevitsky. M, Artificial Intelligence A Guide to Intelligent Systems,Addison-Wesley, 2002.[10] J. Sietsma and R.J.F. Dow, “Creating artificial neural networks thatgeneralize,” Neural Networks, vol.4, no.1, pp. 67-69, 1991.[11] J. Sietsma and R.J.F. Dow, “Neural net pruning – why and how,” in Proc.IEEE Int. Conf. Neural Networks, vol. I (San Diego), pp.325-333, 1988.[12] K. Murase, Y. Matsunaga, and Y. Nakade, “A Back-PropagationAlgorithm which Automatically Determines the Number of Association Units,”Proc. IEEE Int. Conf. Neural Networks, vol. 1, pp. 783-788, 1991.[13] M. Hagiwara, “Removal of Hidden Units and Weights for BackPropagation Networks,” Proc. IJCNN‟93, vol. 1, pp. 351-354, 1993. [14] K. Jearanaitanakij, O. Pinngern, “Deter mining the Orders of Feature andHidden Unit Prunings of Artificial Neural Networks,” Proc. IEEE 2005 Fifth Int. Conf. on Information, Communications and Signal Processing (ICICS), w3c.3, pp. 353-356, 2005.[15] Shannon, C. E. and Weaver, W., The Mathematical Theory ofCommunication, University of Illinois Press, Urbana, Illinois, 1949. [16] Quinlan, J. R., “Induction of decision trees,” Machine Learning, vol. 1,issue 1, pp. 81–106., 1986.[17] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning internalrepresentations by erro r propagation,” in Parallel Distributed Processing: Exploration in the Microstructure of Cognition: vol.1: Foundations, eds.D.E. Rumelhart and J.L. McClelland, pp.318-362, The MIT Press,Cambridge, Massachusetts, 1986.。