同一场景的彩色(RGB )图片和深度(D )图片,常用的

方法是使用一种同时具备相机镜头和深度传感器的设备,比如已经广泛商用的Kinect 。RGB-D 图片比传统的RGB 图片多出的深度信息带来了更多三维空间的立体感。因此,学术界特别是机器人和计算基于卷积神经网络的RGB-D 图片分类




摘要:针对基于卷积神经网络(CNNs )的物体分类问题,文中旨在探索一种最佳的输入组合,使得分类效果达到最佳。本文首先介绍了相关的RGB-D 数据集,然后在该数据集中提取部分图片组成训练、验证和测试集。然后对这些选取的图片进行预处理,包括去除RGB-D 图片的背景,和补齐深度(D )图片的深度信息。利用深度信息图和转换到不同色彩空间下的图片预先训练多个CNNs 。由于每一组彩色图和深度图的内容都是相同的,他们共享相似的特征,这些预先训练的网络可以互相取长补短,本文将这些CNNs 的概率向量对应元素相加并再次归一化,用这个概率向量作为最终分类的依据。实验结果表明,在本文的CNNs 结构下,RGB 信息、D 信息、RGB-D 信息三者的组合能够达到最高的分类准确率95.0%,比起仅使用其中任何一种高出至少5%。对于其他的色彩空间,预先训练的网络无法收敛,侧面印证了基于图片的深度学习工作大多使用RGB 色彩空间的合理性。关键词:物体分类;监督学习;卷积神经网络;RGB-D

中图分类号:TP391.4文献标识码:A 文章编号:1674-6236(2018)24-0159-05

RGB-D object classification using convolutional neural networks

LIU Chang 1,2,3,XU Xiao?jie 1,

2,3(1.Shanghai Institute of Microsystem and Information Technology ,Chinese Academy of Sciences ,

Shanghai 200050,China ;2.School of Information Science and Technology ,ShanghaiTech University ,

Shanghai 201210,China ;3.University of Chinese Academy of Sciences ,Beijing 100049,China )

Abstract:This paper aims at exploring a best combination of input images in order to boost the accuracy of object classification task based on convolutional neural networks (CNNs ).The paperfirst introduces related RGB-D datasets ,then choose part of themto build the training ,validation and test sets.After that ,we pre-process these sets to make RGB-D images free of background and depth (D )images depth-continuous.And then ,we use them to pre-trainseveral CNNs ,based on D ,RGB or other color space.Since RGB and Dimages of the same object may share similar attributes ,which may help the prediction of each other ,we accumulate the output probability vectors produced by these networks by element and renormalize them to 0~1.The generated probability vector is employed to the final prediction.Experimental results show that the combination of RGB ,D and RGB-D achieves the best accuracy ,95.0%,which exceeds the pre-training accuracy of any one of them separately by at least 5%.As for other color space ,the pre-training step fails.It reflects the rationality of the broadly usage of RGB images in deep learning community.Key words:object classification ;supervised learning ;CNNs ;RGB-D

