Towards inducing HTN domain models from examples (short paper)

合集下载

人工智能英文参考文献(最新120个)

人工智能英文参考文献(最新120个)

人工智能是一门新兴的具有挑战力的学科。

自人工智能诞生以来,发展迅速,产生了许多分支。

诸如强化学习、模拟环境、智能硬件、机器学习等。

但是,在当前人工智能技术迅猛发展,为人们的生活带来许多便利。

下面是搜索整理的人工智能英文参考文献的分享,供大家借鉴参考。

人工智能英文参考文献一:[1]Lars Egevad,Peter Str?m,Kimmo Kartasalo,Henrik Olsson,Hemamali Samaratunga,Brett Delahunt,Martin Eklund. The utility of artificial intelligence in the assessment of prostate pathology[J]. Histopathology,2020,76(6).[2]Rudy van Belkom. The Impact of Artificial Intelligence on the Activities ofa Futurist[J]. World Futures Review,2020,12(2).[3]Reza Hafezi. How Artificial Intelligence Can Improve Understanding in Challenging Chaotic Environments[J]. World Futures Review,2020,12(2).[4]Alejandro Díaz-Domínguez. How Futures Studies and Foresight Could Address Ethical Dilemmas of Machine Learning and Artificial Intelligence[J]. World Futures Review,2020,12(2).[5]Russell T. Warne,Jared Z. Burton. Beliefs About Human Intelligence in a Sample of Teachers and Nonteachers[J]. Journal for the Education of the Gifted,2020,43(2).[6]Russell Belk,Mariam Humayun,Ahir Gopaldas. Artificial Life[J]. Journal of Macromarketing,2020,40(2).[7]Walter Kehl,Mike Jackson,Alessandro Fergnani. Natural Language Processing and Futures Studies[J]. World Futures Review,2020,12(2).[8]Anne Boysen. Mine the Gap: Augmenting Foresight Methodologies with Data Analytics[J]. World Futures Review,2020,12(2).[9]Marco Bevolo,Filiberto Amati. The Potential Role of AI in Anticipating Futures from a Design Process Perspective: From the Reflexive Description of “Design” to a Discussion of Influences by the Inclusion of AI in the Futures Research Process[J]. World Futures Review,2020,12(2).[10]Lan Xu,Paul Tu,Qian Tang,Dan Seli?teanu. Contract Design for Cloud Logistics (CL) Based on Blockchain Technology (BT)[J]. Complexity,2020,2020.[11]L. Grant,X. Xue,Z. Vajihi,A. Azuelos,S. Rosenthal,D. Hopkins,R. Aroutiunian,B. Unger,A. Guttman,M. Afilalo. LO32: Artificial intelligence to predict disposition to improve flow in the emergency department[J]. CJEM,2020,22(S1).[12]A. Kirubarajan,A. Taher,S. Khan,S. Masood. P071: Artificial intelligence in emergency medicine: A scoping review[J]. CJEM,2020,22(S1).[13]L. Grant,P. Joo,B. Eng,A. Carrington,M. Nemnom,V. Thiruganasambandamoorthy. LO22: Risk-stratification of emergency department syncope by artificial intelligence using machine learning: human, statistics or machine[J]. CJEM,2020,22(S1).[14]Riva Giuseppe,Riva Eleonora. OS for Ind Robots: Manufacturing Robots Get Smarter Thanks to Artificial Intelligence.[J]. Cyberpsychology, behavior and social networking,2020,23(5).[15]Markus M. Obmann,Aurelio Cosentino,Joshy Cyriac,Verena Hofmann,Bram Stieltjes,Daniel T. Boll,Benjamin M. Yeh,Matthias R. Benz. Quantitative enhancement thresholds and machine learning algorithms for the evaluation of renal lesions using single-phase split-filter dual-energy CT[J]. Abdominal Radiology,2020,45(1).[16]Haytham H. Elmousalami,Mahmoud Elaskary. Drilling stuck pipe classification and mitigation in the Gulf of Suez oil fields using artificial intelligence[J]. Journal of Petroleum Exploration and Production Technology,2020,10(10).[17]Rüdiger Schulz-Wendtland,Karin Bock. Bildgebung in der Mammadiagnostik –Ein Ausblick <trans-title xml:lang="en">Imaging in breast diagnostics—an outlook [J]. Der Gyn?kologe,2020,53(6).</trans-title>[18]Nowakowski Piotr,Szwarc Krzysztof,Boryczka Urszula. Combining an artificial intelligence algorithm and a novel vehicle for sustainable e-waste collection[J]. Science of the Total Environment,2020,730.[19]Wang Huaizhi,Liu Yangyang,Zhou Bin,Li Canbing,Cao Guangzhong,Voropai Nikolai,Barakhtenko Evgeny. Taxonomy research of artificial intelligence for deterministic solar power forecasting[J]. Energy Conversion and Management,2020,214.[20]Kagemoto Hiroshi. Forecasting a water-surface wave train with artificial intelligence- A case study[J]. Ocean Engineering,2020,207.[21]Tomonori Aoki,Atsuo Yamada,Kazuharu Aoyama,Hiroaki Saito,Gota Fujisawa,Nariaki Odawara,Ryo Kondo,Akiyoshi Tsuboi,Rei Ishibashi,Ayako Nakada,Ryota Niikura,Mitsuhiro Fujishiro,Shiro Oka,Soichiro Ishihara,Tomoki Matsuda,Masato Nakahori,Shinji Tanaka,Kazuhiko Koike,Tomohiro Tada. Clinical usefulness of a deep learning‐based system as the first screening on small‐bowel capsule endoscopy reading[J]. Digestive Endoscopy,2020,32(4).[22]Masashi Fujii,Hajime Isomoto. Next generation of endoscopy: Harmony with artificial intelligence and robotic‐assisted devices[J]. Digestive Endoscopy,2020,32(4).[23]Roberto Verganti,Luca Vendraminelli,Marco Iansiti. Innovation and Design in the Age of Artificial Intelligence[J]. Journal of Product Innovation Management,2020,37(3).[24]Yuval Elbaz,David Furman,Maytal Caspary Toroker. Modeling Diffusion in Functional Materials: From Density Functional Theory to Artificial Intelligence[J]. Advanced Functional Materials,2020,30(18).[25]Dinesh Visva Gunasekeran,Tien Yin Wong. Artificial Intelligence in Ophthalmology in 2020: A Technology on the Cusp for Translation and Implementation[J]. Asia-Pacific Journal of Ophthalmology,2020,9(2).[26]Fu-Neng Jiang,Li-Jun Dai,Yong-Ding Wu,Sheng-Bang Yang,Yu-Xiang Liang,Xin Zhang,Cui-Yun Zou,Ren-Qiang He,Xiao-Ming Xu,Wei-De Zhong. The study of multiple diagnosis models of human prostate cancer based on Taylor database by artificial neural networks[J]. Journal of the Chinese Medical Association,2020,83(5).[27]Matheus Calil Faleiros,Marcello Henrique Nogueira-Barbosa,Vitor Faeda Dalto,JoséRaniery Ferreira Júnior,Ariane Priscilla Magalh?es Tenório,Rodrigo Luppino-Assad,Paulo Louzada-Junior,Rangaraj Mandayam Rangayyan,Paulo Mazzoncini de Azevedo-Marques. Machine learning techniques for computer-aided classification of active inflammatory sacroiliitis in magnetic resonance imaging[J]. Advances in Rheumatology,2020,60(1078).[28]Balamurugan Balakreshnan,Grant Richards,Gaurav Nanda,Huachao Mao,Ragu Athinarayanan,Joseph Zaccaria. PPE Compliance Detection using Artificial Intelligence in Learning Factories[J]. Procedia Manufacturing,2020,45.[29]M. Stévenin,V. Avisse,N. Ducarme,A. de Broca. Qui est responsable si un robot autonome vient à entra?ner un dommage ?[J]. Ethique et Santé,2020.[30]Fatemeh Barzegari Banadkooki,Mohammad Ehteram,Fatemeh Panahi,Saad Sh. Sammen,Faridah Binti Othman,Ahmed EL-Shafie. Estimation of Total Dissolved Solids (TDS) using New Hybrid Machine Learning Models[J]. Journal of Hydrology,2020.[31]Adam J. Schwartz,Henry D. Clarke,Mark J. Spangehl,Joshua S. Bingham,DavidA. Etzioni,Matthew R. Neville. Can a Convolutional Neural Network Classify Knee Osteoarthritis on Plain Radiographs as Accurately as Fellowship-Trained Knee Arthroplasty Surgeons?[J]. The Journal of Arthroplasty,2020.[32]Ivana Nizetic Kosovic,Toni Mastelic,Damir Ivankovic. Using Artificial Intelligence on environmental data from Internet of Things for estimating solar radiation: Comprehensive analysis[J]. Journal of Cleaner Production,2020.[33]Lauren Fried,Andrea Tan,Shirin Bajaj,Tracey N. Liebman,David Polsky,Jennifer A. Stein. Technological advances for the detection of melanoma: Part I. Advances in diagnostic techniques[J]. Journal of the American Academy of Dermatology,2020.[34]Mohammed Amoon,Torki Altameem,Ayman Altameem. Internet of things Sensor Assisted Security and Quality Analysis for Health Care Data Sets Using Artificial Intelligent Based Heuristic Health Management System[J]. Measurement,2020.[35]E. Lotan,C. Tschider,D.K. Sodickson,A. Caplan,M. Bruno,B. Zhang,Yvonne W. Lui. Medical Imaging and Privacy in the Era of Artificial Intelligence: Myth, Fallacy, and the Future[J]. Journal of the American College of Radiology,2020.[36]Fabien Lareyre,Cédric Adam,Marion Carrier,Juliette Raffort. Artificial Intelligence in Vascular Surgery: moving from Big Data to Smart Data[J]. Annals of Vascular Surgery,2020.[37]Ilesanmi Daniyan,Khumbulani Mpofu,Moses Oyesola,Boitumelo Ramatsetse,Adefemi Adeodu. Artificial intelligence for predictive maintenance in the railcar learning factories[J]. Procedia Manufacturing,2020,45.[38]Janet L. McCauley,Anthony E. Swartz. Reframing Telehealth[J]. Obstetrics and Gynecology Clinics of North America,2020.[39]Jean-Emmanuel Bibault,Lei Xing. Screening for chronic obstructive pulmonary disease with artificial intelligence[J]. The Lancet Digital Health,2020,2(5).[40]Andrea Laghi. Cautions about radiologic diagnosis of COVID-19 infection driven by artificial intelligence[J]. The Lancet Digital Health,2020,2(5).人工智能英文参考文献二:[41]K. Orhan,I. S. Bayrakdar,M. Ezhov,A. Kravtsov,T. ?zyürek. Evaluation of artificial intelligence for detecting periapical pathosis on cone‐beam computed tomography scans[J]. International Endodontic Journal,2020,53(5).[42]Avila A M,Mezi? I. Data-driven analysis and forecasting of highway traffic dynamics.[J]. Nature communications,2020,11(1).[43]Neri Emanuele,Miele Vittorio,Coppola Francesca,Grassi Roberto. Use of CT andartificial intelligence in suspected or COVID-19 positive patients: statement of the Italian Society of Medical and Interventional Radiology.[J]. La Radiologia medica,2020.[44]Tau Noam,Stundzia Audrius,Yasufuku Kazuhiro,Hussey Douglas,Metser Ur. Convolutional Neural Networks in Predicting Nodal and Distant Metastatic Potential of Newly Diagnosed Non-Small Cell Lung Cancer on FDG PET Images.[J]. AJR. American journal of roentgenology,2020.[45]Coppola Francesca,Faggioni Lorenzo,Regge Daniele,Giovagnoni Andrea,Golfieri Rita,Bibbolino Corrado,Miele Vittorio,Neri Emanuele,Grassi Roberto. Artificial intelligence: radiologists' expectations and opinions gleaned from a nationwide online survey.[J]. La Radiologia medica,2020.[46]?. ? ? ? ? [J]. ,2020,25(4).[47]Savage Rock H,van Assen Marly,Martin Simon S,Sahbaee Pooyan,Griffith Lewis P,Giovagnoli Dante,Sperl Jonathan I,Hopfgartner Christian,K?rgel Rainer,Schoepf U Joseph. Utilizing Artificial Intelligence to Determine Bone Mineral Density Via Chest Computed Tomography.[J]. Journal of thoracic imaging,2020,35 Suppl 1.[48]Brzezicki Maksymilian A,Bridger Nicholas E,Kobeti? Matthew D,Ostrowski Maciej,Grabowski Waldemar,Gill Simran S,Neumann Sandra. Artificial intelligence outperforms human students in conducting neurosurgical audits.[J]. Clinical neurology and neurosurgery,2020,192.[49]Lockhart Mark E,Smith Andrew D. Fatty Liver Disease: Artificial Intelligence Takes on the Challenge.[J]. Radiology,2020,295(2).[50]Wood Edward H,Korot Edward,Storey Philip P,Muscat Stephanie,Williams George A,Drenser Kimberly A. The retina revolution: signaling pathway therapies, genetic therapies, mitochondrial therapies, artificial intelligence.[J]. Current opinion in ophthalmology,2020,31(3).[51]Ho Dean,Quake Stephen R,McCabe Edward R B,Chng Wee Joo,Chow Edward K,Ding Xianting,Gelb Bruce D,Ginsburg Geoffrey S,Hassenstab Jason,Ho Chih-Ming,Mobley William C,Nolan Garry P,Rosen Steven T,Tan Patrick,Yen Yun,Zarrinpar Ali. Enabling Technologies for Personalized and Precision Medicine.[J]. Trends in biotechnology,2020,38(5).[52]Fischer Andreas M,Varga-Szemes Akos,van Assen Marly,Griffith L Parkwood,Sahbaee Pooyan,Sperl Jonathan I,Nance John W,Schoepf U Joseph. Comparison of Artificial Intelligence-Based Fully Automatic Chest CT Emphysema Quantification to Pulmonary Function Testing.[J]. AJR. American journal ofroentgenology,2020,214(5).[53]Moore William,Ko Jane,Gozansky Elliott. Artificial Intelligence Pertaining to Cardiothoracic Imaging and Patient Care: Beyond Image Interpretation.[J]. Journal of thoracic imaging,2020,35(3).[54]Hwang Eui Jin,Park Chang Min. Clinical Implementation of Deep Learning in Thoracic Radiology: Potential Applications and Challenges.[J]. Korean journal of radiology,2020,21(5).[55]Mateen Bilal A,David Anna L,Denaxas Spiros. Electronic Health Records to Predict Gestational Diabetes Risk.[J]. Trends in pharmacological sciences,2020,41(5).[56]Yao Xiang,Mao Ling,Lv Shunli,Ren Zhenghong,Li Wentao,Ren Ke. CT radiomics features as a diagnostic tool for classifying basal ganglia infarction onset time.[J]. Journal of the neurological sciences,2020,412.[57]van Assen Marly,Banerjee Imon,De Cecco Carlo N. Beyond the Artificial Intelligence Hype: What Lies Behind the Algorithms and What We Can Achieve.[J]. Journal of thoracic imaging,2020,35 Suppl 1.[58]Guzik Tomasz J,Fuster Valentin. Leaders in Cardiovascular Research: Valentin Fuster.[J]. Cardiovascular research,2020,116(6).[59]Fischer Andreas M,Eid Marwen,De Cecco Carlo N,Gulsun Mehmet A,van Assen Marly,Nance John W,Sahbaee Pooyan,De Santis Domenico,Bauer Maximilian J,Jacobs Brian E,Varga-Szemes Akos,Kabakus Ismail M,Sharma Puneet,Jackson Logan J,Schoepf U Joseph. Accuracy of an Artificial Intelligence Deep Learning Algorithm Implementing a Recurrent Neural Network With Long Short-term Memory for the Automated Detection of Calcified Plaques From Coronary Computed Tomography Angiography.[J]. Journal of thoracic imaging,2020,35 Suppl 1.[60]Ghosh Adarsh,Kandasamy Devasenathipathy. Interpretable Artificial Intelligence: Why and When.[J]. AJR. American journal of roentgenology,2020,214(5).[61]M.Rosario González-Rodríguez,M.Carmen Díaz-Fernández,Carmen Pacheco Gómez. Facial-expression recognition: An emergent approach to the measurement of tourist satisfaction through emotions[J]. Telematics and Informatics,2020,51.[62]Ru-Xi Ding,Iván Palomares,Xueqing Wang,Guo-Rui Yang,Bingsheng Liu,Yucheng Dong,Enrique Herrera-Viedma,Francisco Herrera. Large-Scale decision-making: Characterization, taxonomy, challenges and future directions from an Artificial Intelligence and applications perspective[J]. Information Fusion,2020,59.[63]Abdulrhman H. Al-Jebrni,Brendan Chwyl,Xiao Yu Wang,Alexander Wong,Bechara J. Saab. AI-enabled remote and objective quantification of stress at scale[J]. Biomedical Signal Processing and Control,2020,59.[64]Gillian Thomas,Elizabeth Eisenhauer,Robert G. Bristow,Cai Grau,Coen Hurkmans,Piet Ost,Matthias Guckenberger,Eric Deutsch,Denis Lacombe,Damien C. Weber. The European Organisation for Research and Treatment of Cancer, State of Science in radiation oncology and priorities for clinical trials meeting report[J]. European Journal of Cancer,2020,131.[65]Muhammad Asif. Are QM models aligned with Industry 4.0? A perspective on current practices[J]. Journal of Cleaner Production,2020,258.[66]Siva Teja Kakileti,Himanshu J. Madhu,Geetha Manjunath,Leonard Wee,Andre Dekker,Sudhakar Sampangi. Personalized risk prediction for breast cancer pre-screening using artificial intelligence and thermal radiomics[J]. Artificial Intelligence In Medicine,2020,105.[67]. Evaluation of Payer Budget Impact Associated with the Use of Artificial Intelligence in Vitro Diagnostic, Kidneyintelx, to Modify DKD Progression:[J]. American Journal of Kidney Diseases,2020,75(5).[68]Rohit Nishant,Mike Kennedy,Jacqueline Corbett. Artificial intelligence for sustainability: Challenges, opportunities, and a research agenda[J]. International Journal of Information Management,2020,53.[69]Hoang Nguyen,Xuan-Nam Bui. Soft computing models for predicting blast-induced air over-pressure: A novel artificial intelligence approach[J]. Applied Soft Computing Journal,2020,92.[70]Benjamin S. Hopkins,Aditya Mazmudar,Conor Driscoll,Mark Svet,Jack Goergen,Max Kelsten,Nathan A. Shlobin,Kartik Kesavabhotla,Zachary A Smith,Nader S Dahdaleh. Using artificial intelligence (AI) to predict postoperative surgical site infection: A retrospective cohort of 4046 posterior spinal fusions[J]. Clinical Neurology and Neurosurgery,2020,192.[71]Mei Yang,Runze Zhou,Xiangjun Qiu,Xiangfei Feng,Jian Sun,Qunshan Wang,Qiufen Lu,Pengpai Zhang,Bo Liu,Wei Li,Mu Chen,Yan Zhao,Binfeng Mo,Xin Zhou,Xi Zhang,Yingxue Hua,Jin Guo,Fangfang Bi,Yajun Cao,Feng Ling,Shengming Shi,Yi-Gang Li. Artificial intelligence-assisted analysis on the association between exposure to ambient fine particulate matter and incidence of arrhythmias in outpatients of Shanghai community hospitals[J]. Environment International,2020,139.[72]Fatemehalsadat Madaeni,Rachid Lhissou,Karem Chokmani,Sebastien Raymond,Yves Gauthier. Ice jam formation, breakup and prediction methods based on hydroclimatic data using artificial intelligence: A review[J]. Cold Regions Science and Technology,2020,174.[73]Steve Chukwuebuka Arum,David Grace,Paul Daniel Mitchell. A review of wireless communication using high-altitude platforms for extended coverage and capacity[J]. Computer Communications,2020,157.[74]Yong-Hong Kuo,Nicholas B. Chan,Janny M.Y. Leung,Helen Meng,Anthony Man-Cho So,Kelvin K.F. Tsoi,Colin A. Graham. An Integrated Approach of Machine Learning and Systems Thinking for Waiting Time Prediction in an Emergency Department[J]. International Journal of Medical Informatics,2020,139.[75]Matteo Terzi,Gian Antonio Susto,Pratik Chaudhari. Directional adversarial training for cost sensitive deep learning classification applications[J]. Engineering Applications of Artificial Intelligence,2020,91.[76]Arman Kilic. Artificial Intelligence and Machine Learning in Cardiovascular Health Care[J]. The Annals of Thoracic Surgery,2020,109(5).[77]Hossein Azarmdel,Ahmad Jahanbakhshi,Seyed Saeid Mohtasebi,Alfredo Rosado Mu?oz. Evaluation of image processing technique as an expert system in mulberry fruit grading based on ripeness level using artificial neural networks (ANNs) and support vector machine (SVM)[J]. Postharvest Biology and Technology,2020,166.[78]Wafaa Wardah,Abdollah Dehzangi,Ghazaleh Taherzadeh,Mahmood A. Rashid,M.G.M. Khan,Tatsuhiko Tsunoda,Alok Sharma. Predicting protein-peptide binding sites with a deep convolutional neural network[J]. Journal of Theoretical Biology,2020,496.[79]Francisco F.X. Vasconcelos,Róger M. Sarmento,Pedro P. Rebou?as Filho,Victor Hugo C. de Albuquerque. Artificial intelligence techniques empowered edge-cloud architecture for brain CT image analysis[J]. Engineering Applications of Artificial Intelligence,2020,91.[80]Masaaki Konishi. Bioethanol production estimated from volatile compositions in hydrolysates of lignocellulosic biomass by deep learning[J]. Journal of Bioscience and Bioengineering,2020,129(6).人工智能英文参考文献三:[81]J. Kwon,K. Kim. Artificial Intelligence for Early Prediction of Pulmonary Hypertension Using Electrocardiography[J]. Journal of Heart and Lung Transplantation,2020,39(4).[82]C. Maathuis,W. Pieters,J. van den Berg. Decision support model for effects estimation and proportionality assessment for targeting in cyber operations[J]. Defence Technology,2020.[83]Samer Ellahham. Artificial Intelligence in Diabetes Care[J]. The American Journal of Medicine,2020.[84]Yi-Ting Hsieh,Lee-Ming Chuang,Yi-Der Jiang,Tien-Jyun Chang,Chung-May Yang,Chang-Hao Yang,Li-Wei Chan,Tzu-Yun Kao,Ta-Ching Chen,Hsuan-Chieh Lin,Chin-Han Tsai,Mingke Chen. Application of deep learning image assessment software VeriSee? for diabetic retinopathy screening[J]. Journal of the Formosan Medical Association,2020.[85]Emre ARTUN,Burak KULGA. Selection of candidate wells for re-fracturing in tight gas sand reservoirs using fuzzy inference[J]. Petroleum Exploration and Development Online,2020,47(2).[86]Alberto Arenal,Cristina Armu?a,Claudio Feijoo,Sergio Ramos,Zimu Xu,Ana Moreno. Innovation ecosystems theory revisited: The case of artificial intelligence in China[J]. Telecommunications Policy,2020.[87]T. Som,M. Dwivedi,C. Dubey,A. Sharma. Parametric Studies on Artificial Intelligence Techniques for Battery SOC Management and Optimization of Renewable Power[J]. Procedia Computer Science,2020,167.[88]Bushra Kidwai,Nadesh RK. Design and Development of Diagnostic Chabot for supporting Primary Health Care Systems[J]. Procedia Computer Science,2020,167.[89]Asl? Bozda?,Ye?im Dokuz,?znur Begüm G?k?ek. Spatial prediction of PM 10 concentration using machine learning algorithms in Ankara, Turkey[J]. Environmental Pollution,2020.[90]K.P. Smith,J.E. Kirby. Image analysis and artificial intelligence in infectious disease diagnostics[J]. Clinical Microbiology and Infection,2020.[91]Alklih Mohamad YOUSEF,Ghahfarokhi Payam KAVOUSI,Marwan ALNUAIMI,Yara ALATRACH. Predictive data analytics application for enhanced oil recovery in a mature field in the Middle East[J]. Petroleum Exploration and Development Online,2020,47(2).[92]Omer F. Ahmad,Danail Stoyanov,Laurence B. Lovat. Barriers and pitfalls for artificial intelligence in gastroenterology: Ethical and regulatory issues[J]. Techniques and Innovations in Gastrointestinal Endoscopy,2020,22(2).[93]Sanne A. Hoogenboom,Ulas Bagci,Michael B. Wallace. Artificial intelligence in gastroenterology. The current state of play and the potential. How will it affect our practice and when?[J]. Techniques and Innovations in Gastrointestinal Endoscopy,2020,22(2).[94]Douglas K. Rex. Can we do resect and discard with artificial intelligence-assisted colon polyp “optical biopsy?”[J]. Techniques and Innovations in Gastrointestinal Endoscopy,2020,22(2).[95]Neal Shahidi,Michael J. Bourke. Can artificial intelligence accurately diagnose endoscopically curable gastrointestinal cancers?[J]. Techniques and Innovations in Gastrointestinal Endoscopy,2020,22(2).[96]Michael Byrne. Artificial intelligence in gastroenterology[J]. Techniques and Innovations in Gastrointestinal Endoscopy,2020,22(2).[97]Piet C. de Groen. Using artificial intelligence to improve adequacy of inspection in gastrointestinal endoscopy[J]. Techniques and Innovations in Gastrointestinal Endoscopy,2020,22(2).[98]Robin Zachariah,Andrew Ninh,William Karnes. Artificial intelligence for colon polyp detection: Why should we embrace this?[J]. Techniques and Innovations in Gastrointestinal Endoscopy,2020,22(2).[99]Alexandra T. Greenhill,Bethany R. Edmunds. A primer of artificial intelligence in medicine[J]. Techniques and Innovations in Gastrointestinal Endoscopy,2020,22(2).[100]Tomohiro Tada,Toshiaki Hirasawa,Toshiyuki Yoshio. The role for artificial intelligence in evaluation of upper GI cancer[J]. Techniques and Innovations in Gastrointestinal Endoscopy,2020,22(2).[101]Yahui Jiang,Meng Yang,Shuhao Wang,Xiangchun Li,Yan Sun. Emerging role of deep learning‐based artificial intelligence in tumor pathology[J]. Cancer Communications,2020,40(4).[102]Kristopher D. Knott,Andreas Seraphim,Joao B. Augusto,Hui Xue,Liza Chacko,Nay Aung,Steffen E. Petersen,Jackie A. Cooper,Charlotte Manisty,Anish N. Bhuva,Tushar Kotecha,Christos V. Bourantas,Rhodri H. Davies,Louise A.E. Brown,Sven Plein,Marianna Fontana,Peter Kellman,James C. Moon. The Prognostic Significance of Quantitative Myocardial Perfusion: An Artificial Intelligence–Based Approach Using Perfusion Mapping[J]. Circulation,2020,141(16).[103]Muhammad Asad,Ahmed Moustafa,Takayuki Ito. FedOpt: Towards Communication Efficiency and Privacy Preservation in Federated Learning[J]. Applied Sciences,2020,10(8).[104]Wu Wenzhi,Zhang Yan,Wang Pu,Zhang Li,Wang Guixiang,Lei Guanghui,Xiao Qiang,Cao Xiaochen,Bian Yueran,Xie Simiao,Huang Fei,Luo Na,Zhang Jingyuan,Luo Mingyan. Psychological stress of medical staffs during outbreak of COVID-19 and adjustment strategy.[J]. Journal of medical virology,2020.[105]. Eyenuk Fulfills Contract for Artificial Intelligence Grading of Retinal Images[J]. Telecomworldwire,2020.[106]Kim Tae Woo,Duhachek Adam. Artificial Intelligence and Persuasion: A Construal-Level Account.[J]. Psychological science,2020,31(4).[107]McCall Becky. COVID-19 and artificial intelligence: protecting health-care workers and curbing the spread.[J]. The Lancet. Digital health,2020,2(4).[108]Alca?iz Mariano,Chicchi Giglioli Irene A,Sirera Marian,Minissi Eleonora,Abad Luis. [Autism spectrum disorder biomarkers based on biosignals, virtual reality and artificial intelligence].[J]. Medicina,2020,80 Suppl 2.[109]Cong Lei,Feng Wanbing,Yao Zhigang,Zhou Xiaoming,Xiao Wei. Deep Learning Model as a New Trend in Computer-aided Diagnosis of Tumor Pathology for Lung Cancer.[J]. Journal of Cancer,2020,11(12).[110]Wang Fengdan,Gu Xiao,Chen Shi,Liu Yongliang,Shen Qing,Pan Hui,Shi Lei,Jin Zhengyu. Artificial intelligence system can achieve comparable results to experts for bone age assessment of Chinese children with abnormal growth and development.[J]. PeerJ,2020,8.[111]Hu Wenmo,Yang Huayu,Xu Haifeng,Mao Yilei. Radiomics based on artificial intelligence in liver diseases: where we are?[J]. Gastroenterology report,2020,8(2).[112]Batayneh Wafa,Abdulhay Enas,Alothman Mohammad. Prediction of the performance of artificial neural networks in mapping sEMG to finger joint angles via signal pre-investigation techniques.[J]. Heliyon,2020,6(4).[113]Aydin Emrah,Türkmen ?nan Utku,Namli G?zde,?ztürk ?i?dem,Esen Ay?e B,Eray Y Nur,Ero?lu Egemen,Akova Fatih. A novel and simple machine learning algorithm for preoperative diagnosis of acute appendicitis in children.[J]. Pediatric surgery international,2020.[114]Ellahham Samer. Artificial Intelligence in Diabetes Care.[J]. The Americanjournal of medicine,2020.[115]David J. Winkel,Thomas J. Weikert,Hanns-Christian Breit,Guillaume Chabin,Eli Gibson,Tobias J. Heye,Dorin Comaniciu,Daniel T. Boll. Validation of a fully automated liver segmentation algorithm using multi-scale deep reinforcement learning and comparison versus manual segmentation[J]. European Journal of Radiology,2020,126.[116]Binjie Fu,Guoshu Wang,Mingyue Wu,Wangjia Li,Yineng Zheng,Zhigang Chu,Fajin Lv. Influence of CT effective dose and convolution kernel on the detection of pulmonary nodules in different artificial intelligence software systems: A phantom study[J]. European Journal of Radiology,2020,126.[117]Georgios N. Kouziokas. A new W-SVM kernel combining PSO-neural network transformed vector and Bayesian optimized SVM in GDP forecasting[J]. Engineering Applications of Artificial Intelligence,2020,92.[118]Qingsong Ruan,Zilin Wang,Yaping Zhou,Dayong Lv. A new investor sentiment indicator ( ISI ) based on artificial intelligence: A powerful return predictor in China[J]. Economic Modelling,2020,88.[119]Mohamed Abdel-Basset,Weiping Ding,Laila Abdel-Fatah. The fusion of Internet of Intelligent Things (IoIT) in remote diagnosis of obstructive Sleep Apnea: A survey and a new model[J]. Information Fusion,2020,61.[120]Federico Caobelli. Artificial intelligence in medical imaging: Game over for radiologists?[J]. European Journal of Radiology,2020,126.以上就是关于人工智能参考文献的分享,希望对你有所帮助。

第4篇微调预训练模型

第4篇微调预训练模型

第4篇微调预训练模型微调预训练模型使⽤预训练模型有很多好处。

预训练模型节省了你的计算开销、你的碳排放,并且让你能够使⽤sota模型⽽不需要⾃⼰从头训练。

Hugging Face Transformers为你提供了上千种预训练模型,可⼴泛⽤于各种任务。

当你使⽤⼀个预训练模型,你可以在任务特定数据集上训练。

这就是著名的微调,⼀种⾮常厉害的训练技巧。

在本篇教程中,你可以⽤Pytorch微调⼀个预训练模型。

准备⼀个数据集在你微调⼀个预训练模型之前,下载⼀个数据集并将其处理成可⽤于训练的形式。

⾸先下载⼀个Yelp Reviews数据集:from datasets import load_datasetdataset = load_dataset("yelp_review_full")dataset[100]'''output:{'label': 0,'text': 'My expectations for McDonalds are t rarely high...}'''正如你所知道的那样,你需要⼀个分词器来处理⽂本以及填充、截断策略来处理可变序列长度。

为了⼀步处理你的数据集,使⽤Dataset的map⽅法,将预处理函数应⽤在整个数据集上。

from transformers import AutoTokenizertokenizer = AutoTokenizer.from_pretrained("bert-base-cased")def tokenize_function(examples):return tokenizer(examples["text"], padding="max_length", truncation=True)tokenized_datasets = dataset.map(tokenize_function, batched=True)你还可以使⽤完整数据集的⼀个⼦集来进⾏微调,这样可以减少时间。

卷积神经网络与视觉Transformer联合驱动的跨层多尺度融合网络高光谱图像分类方法

卷积神经网络与视觉Transformer联合驱动的跨层多尺度融合网络高光谱图像分类方法

卷积神经网络与视觉Transformer联合驱动的跨层多尺度融合网络高光谱图像分类方法赵凤;耿苗苗;刘汉强;张俊杰;於俊【期刊名称】《电子与信息学报》【年(卷),期】2024(46)5【摘要】高光谱图像(HSI)分类是地球科学和遥感影像处理任务中最受关注的研究热点之一。

近年来,卷积神经网络(CNN)和视觉Transformer相结合的方法,通过综合考虑局部-全局信息,在HSI分类任务中取得了成功。

然而,HSI中地物具有丰富的纹理信息和复杂多样的结构,且不同地物之间存在尺度差异。

现有的二者结合的方法通常对多尺度地物目标的纹理和结构信息的提取能力有限。

为了克服上述局限性,该文提出CNN与视觉Transformer联合驱动的跨层多尺度融合网络HSI分类方法。

首先,从结合CNN与视觉Transformer的角度出发,设计了跨层多尺度局部-全局特征提取模块分支,其主要由卷积嵌入的视觉Transformer和跨层特征融合模块构成。

具体来说,卷积嵌入的视觉Transformer通过深度融合多尺度CNN与视觉Transformer实现了多尺度局部-全局特征信息的有效提取,从而增强网络对不同尺度地物的关注。

进一步地,跨层特征融合模块深度聚合了不同层次的多尺度局部-全局特征信息,以综合考虑地物的浅层纹理信息和深层结构信息。

其次,构建了分组多尺度卷积模块分支来挖掘HSI中密集光谱波段潜在的多尺度特征。

最后,为了增强网络对HSI中局部波段细节和整体光谱信息的挖掘,设计了残差分组卷积模块对局部-全局光谱特征进行提取。

Indian Pines, Houston 2013和Salinas Valley 3个HSI数据集上的实验结果证实了所提方法的有效性。

【总页数】12页(P2237-2248)【作者】赵凤;耿苗苗;刘汉强;张俊杰;於俊【作者单位】西安邮电大学通信与信息工程学院;陕西师范大学计算机科学学院;中国科学技术大学信息科学技术学院【正文语种】中文【中图分类】TN911.73;TP751【相关文献】1.基于多尺度卷积神经网络的高光谱图像分类算法2.基于多尺度卷积神经网络的自适应熵加权决策融合船舶图像分类方法3.基于多尺度卷积神经网络的高光谱图像分类4.基于多尺度3D-2D卷积神经网络的高光谱图像分类5.基于Swin Transformer和三维残差多层融合网络的高光谱图像分类因版权原因,仅展示原文概要,查看原文内容请购买。

限定领域语言模型训练语料的词类扩展方法

限定领域语言模型训练语料的词类扩展方法
21 0 1年 第 2 0卷 第 1 1期
ht:ww . S . gc t N wc -o . p —ar n
计 算 机 系 统 应 用
限定领域语言模型训练语料的词类扩展方法① I
黄韵竹 ,韦 玮 ,罗杨 宇,李成 荣
( 中国科学 院 自动化研究所 ,北京 10 9 ) 0 10

要 :限定领域 的语言模型训练 语料 的搜集需要耗 费大量 的人力物力 ,如果语料搜集 不充分,往往会造成数
HUANG n Zh , EIWe , Yu - u W i LUO n - Ya g Yu, e - n LI Ch ng Ro g
( s tt oAuu t n C iee a e f ce csB in 0 10 C ia I tue f tma o , hn s Acdmyo S i e, e ig1 09 , hn) ni i n j
1 引言
分 词后的文本语料 可以用 来对 语音识别系统 的语 言模 型进行参数估计 ,被称 为语言模型 的训练语料 。
s e hr c g i o yse pe c e o n t n s t m. i
Ke r s c r u x a s n mu a f r ai n l g a emo a; p e h rc g i o ; r ls e ywo d : o se p i ; t ln o p n o u i m t ; a u g d l s e c o nt n wo dc a s s o n e i
r d cn h o e u ig t e c mplx o f mo a h o g aa s oh n h e o h r i e p n ig t e o u .I hi a e ,a e i n o d l t r u h d t mo ti g.T t e s x a dn h c r s n t s p p r p s miu o tcmeh d t x n anngc r u ng a emo a r p s d. l g ito wo dca s si e e ae e a t ma i t o oe pa dt i i o p sofl u g d li p o o e A els f r ls e sg n rt d r a s r a b ac lt h t a n o ma in o o —e ti td a e o u n l g c l.T e , h s r l s sr lt d t y c lua i t emu u lif r t fn n r src e a c r si a e s ae h n t o e wo d ca e ae o ng o r s p r s e

Deep Sparse Rectifier Neural Networks

Deep Sparse Rectifier Neural Networks

Deep Sparse Rectifier Neural NetworksXavier Glorot Antoine Bordes Yoshua BengioDIRO,Universit´e de Montr´e al Montr´e al,QC,Canada glorotxa@iro.umontreal.ca Heudiasyc,UMR CNRS6599UTC,Compi`e gne,FranceandDIRO,Universit´e de Montr´e alMontr´e al,QC,Canadaantoine.bordes@hds.utc.frDIRO,Universit´e de Montr´e alMontr´e al,QC,Canadabengioy@iro.umontreal.caAbstractWhile logistic sigmoid neurons are more bi-ologically plausible than hyperbolic tangentneurons,the latter work better for train-ing multi-layer neural networks.This pa-per shows that rectifying neurons are aneven better model of biological neurons andyield equal or better performance than hy-perbolic tangent networks in spite of thehard non-linearity and non-differentiabilityat zero,creating sparse representations withtrue zeros,which seem remarkably suitablefor naturally sparse data.Even though theycan take advantage of semi-supervised setupswith extra-unlabeled data,deep rectifier net-works can reach their best performance with-out requiring any unsupervised pre-trainingon purely supervised tasks with large labeleddatasets.Hence,these results can be seen asa new milestone in the attempts at under-standing the difficulty in training deep butpurely supervised neural networks,and clos-ing the performance gap between neural net-works learnt with and without unsupervisedpre-training.1IntroductionMany differences exist between the neural network models used by machine learning researchers and those used by computational neuroscientists.This is in part Appearing in Proceedings of the14th International Con-ference on Artificial Intelligence and Statistics(AISTATS) 2011,Fort Lauderdale,FL,USA.Volume15of JMLR: W&CP15.Copyright2011by the authors.because the objective of the former is to obtain com-putationally efficient learners,that generalize well to new examples,whereas the objective of the latter is to abstract out neuroscientific data while obtaining ex-planations of the principles involved,providing predic-tions and guidance for future biological experiments. Areas where both objectives coincide are therefore particularly worthy of investigation,pointing towards computationally motivated principles of operation in the brain that can also enhance research in artificial intelligence.In this paper we show that two com-mon gaps between computational neuroscience models and machine learning neural network models can be bridged by using the following linear by part activa-tion:max(0,x),called the rectifier(or hinge)activa-tion function.Experimental results will show engaging training behavior of this activation function,especially for deep architectures(see Bengio(2009)for a review), i.e.,where the number of hidden layers in the neural network is3or more.Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures.This is in part inspired by observations of the mammalian vi-sual cortex,which consists of a chain of processing elements,each of which is associated with a different representation of the raw visual input.This is partic-ularly clear in the primate visual system(Serre et al., 2007),with its sequence of processing stages:detection of edges,primitive shapes,and moving up to gradu-ally more complex visual shapes.Interestingly,it was found that the features learned in deep architectures resemble those observed in thefirst two of these stages (in areas V1and V2of visual cortex)(Lee et al.,2008), and that they become increasingly invariant to factors of variation(such as camera movement)in higher lay-ers(Goodfellow et al.,2009).Deep Sparse Rectifier Neural NetworksRegarding the training of deep networks,something that can be considered a breakthrough happened in2006,with the introduction of Deep Belief Net-works(Hinton et al.,2006),and more generally the idea of initializing each layer by unsupervised learn-ing(Bengio et al.,2007;Ranzato et al.,2007).Some authors have tried to understand why this unsuper-vised procedure helps(Erhan et al.,2010)while oth-ers investigated why the original training procedure for deep neural networks failed(Bengio and Glorot,2010). From the machine learning point of view,this paper brings additional results in these lines of investigation. We propose to explore the use of rectifying non-linearities as alternatives to the hyperbolic tangent or sigmoid in deep artificial neural networks,in ad-dition to using an L1regularizer on the activation val-ues to promote sparsity and prevent potential numer-ical problems with unbounded activation.Nair and Hinton(2010)present promising results of the influ-ence of such units in the context of Restricted Boltz-mann Machines compared to logistic sigmoid activa-tions on image classification tasks.Our work extends this for the case of pre-training using denoising auto-encoders(Vincent et al.,2008)and provides an exten-sive empirical comparison of the rectifying activation function against the hyperbolic tangent on image clas-sification benchmarks as well as an original derivation for the text application of sentiment analysis.Our experiments on image and text data indicate that training proceeds better when the artificial neurons are either offor operating mostly in a linear regime.Sur-prisingly,rectifying activation allows deep networks to achieve their best performance without unsupervised pre-training.Hence,our work proposes a new contri-bution to the trend of understanding and merging the performance gap between deep networks learnt with and without unsupervised pre-training(Erhan et al., 2010;Bengio and Glorot,2010).Still,rectifier net-works can benefit from unsupervised pre-training in the context of semi-supervised learning where large amounts of unlabeled data are provided.Furthermore, as rectifier units naturally lead to sparse networks and are closer to biological neurons’responses in their main operating regime,this work also bridges(in part)a machine learning/neuroscience gap in terms of acti-vation function and sparsity.This paper is organized as follows.Section2presents some neuroscience and machine learning background which inspired this work.Section3introduces recti-fier neurons and explains their potential benefits and drawbacks in deep networks.Then we propose an experimental study with empirical results on image recognition in Section4.1and sentiment analysis in Section4.2.Section5presents our conclusions.2Background2.1Neuroscience ObservationsFor models of biological neurons,the activation func-tion is the expectedfiring rate as a function of the total input currently arising out of incoming signals at synapses(Dayan and Abott,2001).An activation function is termed,respectively antisymmetric or sym-metric when its response to the opposite of a strongly excitatory input pattern is respectively a strongly in-hibitory or excitatory one,and one-sided when this response is zero.The main gaps that we wish to con-sider between computational neuroscience models and machine learning models are the following:•Studies on brain energy expense suggest that neurons encode information in a sparse and dis-tributed way(Attwell and Laughlin,2001),esti-mating the percentage of neurons active at the same time to be between1and4%(Lennie,2003).This corresponds to a trade-offbetween richness of representation and small action potential en-ergy expenditure.Without additional regulariza-tion,such as an L1penalty,ordinary feedforward neural nets do not have this property.For ex-ample,the sigmoid activation has a steady state regime around12,therefore,after initializing with small weights,all neuronsfire at half their satura-tion regime.This is biologically implausible and hurts gradient-based optimization(LeCun et al., 1998;Bengio and Glorot,2010).•Important divergences between biological and machine learning models concern non-linear activation functions.A common biological model of neuron,the leaky integrate-and-fire(or LIF)(Dayan and Abott,2001),gives the follow-ing relation between thefiring rate and the input current,illustrated in Figure1(left):f(I)=τlogE+RI−V rE+RI−V th+t ref−1,if E+RI>V th0,if E+RI≤V thwhere t ref is the refractory period(minimal time between two action potentials),I the input cur-rent,V r the resting potential and V th the thresh-old potential(with V th>V r),and R,E,τthe membrane resistance,potential and time con-stant.The most commonly used activation func-tions in the deep learning and neural networks lit-erature are the standard logistic sigmoid and theXavier Glorot,Antoine Bordes,YoshuaBengioFigure1:Left:Common neural activation function motivated by biological data.Right:Commonly used activation functions in neural networks literature:logistic sigmoid and hyperbolic tangent(tanh).hyperbolic tangent(see Figure1,right),which areequivalent up to a linear transformation.The hy-perbolic tangent has a steady state at0,and istherefore preferred from the optimization stand-point(LeCun et al.,1998;Bengio and Glorot,2010),but it forces an antisymmetry around0which is absent in biological neurons.2.2Advantages of SparsitySparsity has become a concept of interest,not only incomputational neuroscience and machine learning butalso in statistics and signal processing(Candes andTao,2005).It wasfirst introduced in computationalneuroscience in the context of sparse coding in the vi-sual system(Olshausen and Field,1997).It has beena key element of deep convolutional networks exploit-ing a variant of auto-encoders(Ranzato et al.,2007,2008;Mairal et al.,2009)with a sparse distributedrepresentation,and has also become a key ingredientin Deep Belief Networks(Lee et al.,2008).A sparsitypenalty has been used in several computational neuro-science(Olshausen and Field,1997;Doi et al.,2006)and machine learning models(Lee et al.,2007;Mairalet al.,2009),in particular for deep architectures(Leeet al.,2008;Ranzato et al.,2007,2008).However,inthe latter,the neurons end up taking small but non-zero activation orfiring probability.We show here thatusing a rectifying non-linearity gives rise to real zerosof activations and thus truly sparse representations.From a computational point of view,such representa-tions are appealing for the following reasons:•Information disentangling.One of theclaimed objectives of deep learning algo-rithms(Bengio,2009)is to disentangle thefactors explaining the variations in the data.Adense representation is highly entangled becausealmost any change in the input modifies most ofthe entries in the representation vector.Instead,if a representation is both sparse and robust tosmall input changes,the set of non-zero featuresis almost always roughly conserved by smallchanges of the input.•Efficient variable-size representation.Dif-ferent inputs may contain different amounts of in-formation and would be more conveniently repre-sented using a variable-size data-structure,whichis common in computer representations of infor-mation.Varying the number of active neuronsallows a model to control the effective dimension-ality of the representation for a given input andthe required precision.•Linear separability.Sparse representations arealso more likely to be linearly separable,or moreeasily separable with less non-linear machinery,simply because the information is represented ina high-dimensional space.Besides,this can reflectthe original data format.In text-related applica-tions for instance,the original raw data is alreadyvery sparse(see Section4.2).•Distributed but sparse.Dense distributed rep-resentations are the richest representations,be-ing potentially exponentially more efficient thanpurely local ones(Bengio,2009).Sparse repre-sentations’efficiency is still exponentially greater,with the power of the exponent being the numberof non-zero features.They may represent a goodtrade-offwith respect to the above criteria.Nevertheless,forcing too much sparsity may hurt pre-dictive performance for an equal number of neurons,because it reduces the effective capacity of the model.Deep Sparse Rectifier NeuralNetworksFigure 2:Left:Sparse propagation of activations and gradients in a network of rectifier units.The input selects a subset of active neurons and computation is linear in this subset.Right:Rectifier and softplus activation functions.The second one is a smooth version of the first.3Deep Rectifier Networks3.1Rectifier NeuronsThe neuroscience literature (Bush and Sejnowski,1995;Douglas and al.,2003)indicates that corti-cal neurons are rarely in their maximum saturation regime ,and suggests that their activation function can be approximated by a rectifier.Most previous stud-ies of neural networks involving a rectifying activation function concern recurrent networks (Salinas and Ab-bott,1996;Hahnloser,1998).The rectifier function rectifier(x )=max(0,x )is one-sided and therefore does not enforce a sign symmetry 1or antisymmetry 1:instead,the response to the oppo-site of an excitatory input pattern is 0(no response).However,we can obtain symmetry or antisymmetry by combining two rectifier units sharing parameters.Advantages The rectifier activation function allows a network to easily obtain sparse representations.For example,after uniform initialization of the weights,around 50%of hidden units continuous output val-ues are real zeros,and this fraction can easily increase with sparsity-inducing regularization.Apart from be-ing more biologically plausible,sparsity also leads to mathematical advantages (see previous section).As illustrated in Figure 2(left),the only non-linearity in the network comes from the path selection associ-ated with individual neurons being active or not.For a given input only a subset of neurons are active .Com-putation is linear on this subset:once this subset of neurons is selected,the output is a linear function of1The hyperbolic tangent absolute value non-linearity |tanh(x )|used by Jarrett et al.(2009)enforces sign symme-try.A tanh(x )non-linearity enforces sign antisymmetry.the input (although a large enough change can trigger a discrete change of the active set of neurons).The function computed by each neuron or by the network output in terms of the network input is thus linear by parts.We can see the model as an exponential num-ber of linear models that share parameters (Nair and Hinton,2010).Because of this linearity,gradients flow well on the active paths of neurons (there is no gra-dient vanishing effect due to activation non-linearities of sigmoid or tanh units),and mathematical investi-gation is putations are also cheaper:there is no need for computing the exponential function in activations,and sparsity can be exploited.Potential Problems One may hypothesize that the hard saturation at 0may hurt optimization by block-ing gradient back-propagation.To evaluate the poten-tial impact of this effect we also investigate the soft-plus activation:softplus (x )=log (1+e x )(Dugas et al.,2001),a smooth version of the rectifying non-linearity.We lose the exact sparsity,but may hope to gain eas-ier training.However,experimental results (see Sec-tion 4.1)tend to contradict that hypothesis,suggesting that hard zeros can actually help supervised training.We hypothesize that the hard non-linearities do not hurt so long as the gradient can propagate along some paths ,i.e.,that some of the hidden units in each layer are non-zero.With the credit and blame assigned to these ON units rather than distributed more evenly,we hypothesize that optimization is easier.Another prob-lem could arise due to the unbounded behavior of the activations;one may thus want to use a regularizer to prevent potential numerical problems.Therefore,we use the L 1penalty on the activation values,which also promotes additional sparsity.Also recall that,in or-der to efficiently represent symmetric/antisymmetric behavior in the data,a rectifier network would needXavier Glorot,Antoine Bordes,Yoshua Bengiotwice as many hidden units as a network of symmet-ric/antisymmetric activation functions.Finally,rectifier networks are subject to ill-conditioning of the parametrization.Biases and weights can be scaled in different (and consistent)ways while preserving the same overall network function.More precisely,consider for each layer of depth i of the network a scalar αi ,and scaling the parameters asW i =W iαi and b i =b i ij =1αj.The output units values then change as follow:s =sn j =1αj .Therefore,aslong as nj =1αj is 1,the network function is identical.3.2Unsupervised Pre-trainingThis paper is particularly inspired by the sparse repre-sentations learned in the context of auto-encoder vari-ants,as they have been found to be very useful intraining deep architectures (Bengio,2009),especially for unsupervised pre-training of neural networks (Er-han et al.,2010).Nonetheless,certain difficulties arise when one wants to introduce rectifier activations into stacked denois-ing auto-encoders (Vincent et al.,2008).First,the hard saturation below the threshold of the rectifier function is not suited for the reconstruction units.In-deed,whenever the network happens to reconstruct a zero in place of a non-zero target,the reconstruc-tion unit can not backpropagate any gradient.2Sec-ond,the unbounded behavior of the rectifier activation also needs to be taken into account.In the follow-ing,we denote ˜x the corrupted version of the input x ,σ()the logistic sigmoid function and θthe model pa-rameters (W enc ,b enc ,W dec ,b dec ),and define the linear recontruction function as:f (x,θ)=W dec max(W enc x +b enc ,0)+b dec .Here are the several strategies we have experimented:e a softplus activation function for the recon-struction layer,along with a quadratic cost:L (x,θ)=||x −log(1+exp(f (˜x ,θ)))||2.2.Scale the rectifier activation values coming from the previous encoding layer to bound them be-tween 0and 1,then use a sigmoid activation func-tion for the reconstruction layer,along with a cross-entropy reconstruction cost.L (x,θ)=−x log(σ(f (˜x ,θ)))−(1−x )log(1−σ(f (˜x ,θ))).2Why is this not a problem for hidden layers too?we hy-pothesize that it is because gradients can still flow throughthe active (non-zero),possibly helping rather than hurting the assignment of credit.e a linear activation function for the reconstruc-tion layer,along with a quadratic cost.We triedto use input unit values either before or after the rectifier non-linearity as reconstruction targets.(For the first layer,raw inputs are directly used.)e a rectifier activation function for the recon-struction layer,along with a quadratic cost.The first strategy has proven to yield better gener-alization on image data and the second one on text data.Consequently,the following experimental study presents results using those two.4Experimental StudyThis section discusses our empirical evaluation of recti-fier units for deep networks.We first compare them to hyperbolic tangent and softplus activations on image benchmarks with and without pre-training,and then apply them to the text task of sentiment analysis.4.1Image RecognitionExperimental setup We considered the image datasets detailed below.Each of them has a train-ing set (for tuning parameters),a validation set (for tuning hyper-parameters)and a test set (for report-ing generalization performance).They are presented according to their number of training/validation/test examples,their respective image sizes,as well as their number of classes:•MNIST (LeCun et al.,1998):50k/10k/10k,28×28digit images,10classes.•CIFAR10(Krizhevsky and Hinton,2009):50k/5k/5k,32×32×3RGB images,10classes.•NISTP:81,920k/80k/20k,32×32character im-ages from the NIST database 19,with randomized distortions (Bengio and al,2010),62classes.This dataset is much larger and more difficult than the original NIST (Grother,1995).•NORB:233,172/58,428/58,320,taken from Jittered-Cluttered NORB (LeCun et al.,2004).Stereo-pair images of toys on a cluttered background,6classes.The data has been prepro-cessed similarly to (Nair and Hinton,2010):we subsampled the original 2×108×108stereo-pair images to 2×32×32and scaled linearly the image in the range [−1,1].We followed the procedure used by Nair and Hinton (2010)to create the validation set.Deep Sparse Rectifier Neural NetworksTable1:Test error on networks of depth3.Bold results represent statistical equivalence between similar ex-periments,with and without pre-training,under the null hypothesis of the pairwise test with p=0.05.Neuron MNIST CIF AR10NISTP NORB With unsupervised pre-trainingRectifier 1.20%49.96%32.86%16.46% Tanh 1.16%50.79%35.89%17.66% Softplus 1.17%49.52%33.27%19.19% Without unsupervised pre-trainingRectifier 1.43%50.86%32.64%16.40% Tanh 1.57%52.62%36.46%19.29% Softplus 1.77%53.20%35.48%17.68% For all experiments except on the NORB data(Le-Cun et al.,2004),the models we used are stacked denoising auto-encoders(Vincent et al.,2008)with three hidden layers and1000units per layer.The ar-chitecture of Nair and Hinton(2010)has been used on NORB:two hidden layers with respectively4000 and2000units.We used a cross-entropy reconstruc-tion cost for tanh networks and a quadratic cost over a softplus reconstruction layer for the rectifier and softplus networks.We chose masking noise as the corruption process:each pixel has a probability of0.25of being artificially set to0.The unsuper-vised learning rate is constant,and the following val-ues have been explored:{.1,.01,.001,.0001}.We se-lect the model with the lowest reconstruction error. For the supervisedfine-tuning we chose a constant learning rate in the same range as the unsupervised learning rate with respect to the supervised valida-tion error.The training cost is the negative log likeli-hood−log P(correct class|input)where the probabil-ities are obtained from the output layer(which imple-ments a softmax logistic regression).We used stochas-tic gradient descent with mini-batches of size10for both unsupervised and supervised training phases.To take into account the potential problem of rectifier units not being symmetric around0,we use a vari-ant of the activation function for whichhalf of the units output values are multiplied by-1.This serves to cancel out the mean activation value for each layer and can be interpreted either as inhibitory neurons or simply as a way to equalize activations numerically. Additionally,an L1penalty on the activations with a coefficient of0.001was added to the cost function dur-ing pre-training andfine-tuning in order to increase the amount of sparsity in the learned representations. Main results Table1summarizes the results on networks of3hidden layers of1000hidden units each,Figure3:Influence offinal sparsity on accu-racy.200randomly initialized deep rectifier networks were trained on MNIST with various L1penalties(from 0to0.01)to obtain different sparsity levels.Results show that enforcing sparsity of the activation does not hurtfinal performance until around85%of true zeros.comparing all the neuron types3on all the datasets, with or without unsupervised pre-training.In the lat-ter case,the supervised training phase has been carried out using the same experimental setup as the one de-scribed above forfine-tuning.The main observations we make are the following:•Despite the hard threshold at0,networks trained with the rectifier activation function canfind lo-cal minima of greater or equal quality than those obtained with its smooth counterpart,the soft-plus.On NORB,we tested a rescaled version of the softplus defined by1αsoftplus(αx),which allows to interpolate in a smooth manner be-tween the softplus(α=1)and the rectifier(α=∞).We obtained the followingα/test error cou-ples:1/17.68%,1.3/17.53%,2/16.9%,3/16.66%, 6/16.54%,∞/16.40%.There is no trade-offbe-tween those activation functions.Rectifiers are not only biologically plausible,they are also com-putationally efficient.•There is almost no improvement when using un-supervised pre-training with rectifier activations, contrary to what is experienced using tanh or soft-plus.Purely supervised rectifier networks remain competitive on all4datasets,even against the pretrained tanh or softplus models.3We also tested a rescaled version of the LIF and max(tanh(x),0)as activation functions.We obtained worse generalization performance than those of Table1, and chose not to report them.Xavier Glorot,Antoine Bordes,Yoshua Bengio•Rectifier networks are truly deep sparse networks.There is an average exact sparsity(fraction of ze-ros)of the hidden layers of83.4%on MNIST,72.0%on CIFAR10,68.0%on NISTP and73.8%on NORB.Figure3provides a better understand-ing of the influence of sparsity.It displays the MNIST test error of deep rectifier networks(with-out pre-training)according to different average sparsity obtained by varying the L1penalty on the works appear to be quite ro-bust to it as models with70%to almost85%of true zeros can achieve similar performances. With labeled data,deep rectifier networks appear to be attractive models.They are biologically credible, and,compared to their standard counterparts,do not seem to depend as much on unsupervised pre-training, while ultimately yielding sparse representations.This last conclusion is slightly different from those re-ported in(Nair and Hinton,2010)in which is demon-strated that unsupervised pre-training with Restricted Boltzmann Machines and using rectifier units is ben-eficial.In particular,the paper reports that pre-trained rectified Deep Belief Networks can achieve a test error on NORB below16%.However,we be-lieve that our results are compatible with those:we extend the experimental framework to a different kind of models(stacked denoising auto-encoders)and dif-ferent datasets(on which conclusions seem to be differ-ent).Furthermore,note that our rectified model with-out pre-training on NORB is very competitive(16.4% error)and outperforms the17.6%error of the non-pretrained model from Nair and Hinton(2010),which is basically what wefind with the non-pretrained soft-plus units(17.68%error).Semi-supervised setting Figure4presents re-sults of semi-supervised experiments conducted on the NORB dataset.We vary the percentage of the orig-inal labeled training set which is used for the super-vised training phase of the rectifier and hyperbolic tan-gent networks and evaluate the effect of the unsuper-vised pre-training(using the whole training set,unla-beled).Confirming conclusions of Erhan et al.(2010), the network with hyperbolic tangent activations im-proves with unsupervised pre-training for any labeled set size(even when all the training set is labeled). However,the picture changes with rectifying activa-tions.In semi-supervised setups(with few labeled data),the pre-training is highly beneficial.But the more the labeled set grows,the closer the models with and without pre-training.Eventually,when all avail-able data is labeled,the two models achieve identical performance.Rectifier networks can maximally ex-ploit labeled and unlabeledinformation.Figure4:Effect of unsupervised pre-training.On NORB,we compare hyperbolic tangent and rectifier net-works,with or without unsupervised pre-training,andfine-tune only on subsets of increasing size of the training set.4.2Sentiment AnalysisNair and Hinton(2010)also demonstrated that recti-fier units were efficient for image-related tasks.They mentioned the intensity equivariance property(i.e. without bias parameters the network function is lin-early variant to intensity changes in the input)as ar-gument to explain this observation.This would sug-gest that rectifying activation is mostly useful to im-age data.In this section,we investigate on a different modality to cast a fresh light on rectifier units.A recent study(Zhou et al.,2010)shows that Deep Be-lief Networks with binary units are competitive with the state-of-the-art methods for sentiment analysis. This indicates that deep learning is appropriate to this text task which seems therefore ideal to observe the behavior of rectifier units on a different modality,and provide a data point towards the hypothesis that rec-tifier nets are particarly appropriate for sparse input vectors,such as found in NLP.Sentiment analysis is a text mining area which aims to determine the judg-ment of a writer with respect to a given topic(see (Pang and Lee,2008)for a review).The basic task consists in classifying the polarity of reviews either by predicting whether the expressed opinions are positive or negative,or by assigning them star ratings on either 3,4or5star scales.Following a task originally proposed by Snyder and Barzilay(2007),our data consists of restaurant reviews which have been extracted from the restaurant review site .We have access to10,000 labeled and300,000unlabeled training reviews,while the test set contains10,000examples.The goal is to predict the rating on a5star scale and performance is evaluated using Root Mean Squared Error(RMSE).4 4Even though our tasks are identical,our database is。

基于双域Transformer耦合特征学习的CT截断数据重建模型

基于双域Transformer耦合特征学习的CT截断数据重建模型

基于双域Transformer耦合特征学习的CT截断数据重建模型汪辰;蒙铭强;李明强;王永波;曾栋;边兆英;马建华【期刊名称】《南方医科大学学报》【年(卷),期】2024(44)5【摘要】目的为解决CT扫描视野(FOV)不足导致的截断伪影和图像结构失真问题,本文提出了一种基于投影和图像双域Transformer耦合特征学习的CT截断数据重建模型(DDTrans)。

方法基于Transformer网络分别构建投影域和图像域恢复模型,利用Transformer注意力模块的远距离依赖建模能力捕捉全局结构特征来恢复投影数据信息,增强重建图像。

在投影域和图像域网络之间构建可微Radon反投影算子层,使得DDTrans能够进行端到端训练。

此外,引入投影一致性损失来约束图像前投影结果,进一步提升图像重建的准确性。

结果Mayo仿真数据实验结果表明,在部分截断和内扫描两种截断情况下,本文方法DDTrans在去除FOV边缘的截断伪影和恢复FOV外部信息等方面效果均优于对比算法。

结论DDTrans模型可以有效去除CT截断伪影,确保FOV内数据的精确重建,同时实现FOV外部数据的近似重建。

【总页数】10页(P950-959)【作者】汪辰;蒙铭强;李明强;王永波;曾栋;边兆英;马建华【作者单位】南方医科大学生物医学工程学院;琶洲实验室(黄埔)【正文语种】中文【中图分类】TP3【相关文献】1.中国省域经济金融协调发展的整体趋势与差异化特征:基于两系统耦合模型2.一种基于双回线路特征方程及互感耦合模型的零序参数计算方法3.基于对比学习的双分类器无监督域适配模型4.基于机器学习CT影像组学特征联合血清学特征模型预测高出血风险食管静脉曲张5.基于增强CT的深度学习模型预测胃肠道间质瘤Ki-67表达的双中心研究因版权原因,仅展示原文概要,查看原文内容请购买。

Towards Good Practices for very deep Two-stream convnets

Towards Good Practices for very deep Two-stream convnets

a r X i v :1507.02159v 1 [c s .C V ] 8 J u l 2015Towards Good Practices for Very Deep Two-Stream ConvNetsLimin Wang 1Yuanjun Xiong 1Zhe Wang 2Yu Qiao 21Department of Information Engineering,The Chinese University of Hong Kong,Hong Kong2Shenzhen key lab of Comp.Vis.&Pat.Rec.,Shenzhen Institutes of Advanced Technology,CAS,China {07wanglimin,bitxiong,buptwangzhe2012}@,yu.qiao@AbstractDeep convolutional networks have achieved great suc-cess for object recognition in still images.However,for ac-tion recognition in videos,the improvement of deep convo-lutional networks is not so evident.We argue that there are two reasons that could probably explain this result.First the current network architectures (e.g.Two-stream ConvNets [12])are relatively shallow compared with those very deep models in image domain (e.g.VGGNet [13],GoogLeNet [15]),and therefore their modeling capacity is constrained by their depth.Second,probably more importantly,the training dataset of action recognition is extremely small compared with the ImageNet dataset,and thus it will be easy to over-fit on the training dataset.To address these issues,this report presents very deep two-stream ConvNets for action recognition,by adapting recent very deep architectures into video domain.How-ever,this extension is not easy as the size of action recog-nition is quite small.We design several good practices for the training of very deep two-stream ConvNets,namely (i)pre-training for both spatial and temporal nets,(ii)smaller learning rates,(iii)more data augmentation techniques,(iv)high drop out ratio.Meanwhile,we extend the Caffe tool-box into Multi-GPU implementation with high computa-tional efficiency and low memory consumption.We verify the performance of very deep two-stream ConvNets on the dataset of UCF101and it achieves the recognition accuracy of 91.4%.1.IntroductionHuman action recognition has become an important problem in computer vision and received a lot of research interests in this community [12,16,19].The problem of action recognition is challenging due to the large intra-class variations,low video resolution,high dimension of video data,and so on.The past several years have witnessed great progress on action recognition from short clips [8,9,12,16,17,18,19].These research works can be roughly categorized into two types.The first type of algorithm focuses on the hand-crafted local features and Bag of Visual Words (BoVWs)representation.The most successful example is to extract improved trajectory features [16]and employ Fisher vector representation [11].The second type of algorithm utilizes deep convolutional networks (ConvNets)to learn video rep-resentation from raw data (e.g.RGB images or optical flow fields)and train recognition system in an end-to-end man-ner.The most competitive deep model is the two-stream ConvNets [12].However,unlike image classification [7],deep ConvNets did not yield significant improvement over these traditional methods.We argue that there are two possible reasons to explain this phenomenon.First,the concept of action is more complex than object and it is relevant to other high-level vision concepts,such as interacting object,scene con-text,human pose.Intuitively,the more complicated prob-lem will need the model of higher complexity.However,the current two-stream ConvNets are relatively shallow (5convolutional layers and 3fully-connected layers)com-pared with those successful models in image classification [13,15].Second,the dataset of action recognition is ex-tremely small compared the ImageNet dataset [1].For ex-ample,the UCF101dataset [14]only contains 13,320clips.However,these deep ConvNets always require a huge num-ber of training samples to tune the network weights.In order to address these issues,this report presents very deep two-stream ConvNets for action recognition.Very deep two-stream ConvNets contain high modeling capacity and are capable of handling the large complexity of action classes.However,due to the second problem above,train-ing very deep models in such a small dataset is much chal-lenging due to the over-fitting problem.We propose several good practices to make the training of very deep two-stream ConvNets stable and reduce the effect of over-fitting.By carefully training our proposed very deep ConvNets on the action dataset,we are able to achieve the state-of-the-art performance on the dataset of UCF101.Meanwhile,we ex-tend the Caffe toolbox [4]into multi-GPU implementation 1with high efficiency and low memory consumption.The remainder of this report is organized as follows.In Section2,we introduce our proposed very deep two-stream ConvNets in details,including network architectures,train-ing details,testing strategy.We report our experimental re-sults on the dataset of UCF101in Section3.Finally,we conclude our report in Section4.2.Very Deep Two-stream ConvNetsIn this section,we give a detailed description of our pro-posed method.Wefirst introduce the architectures of very deep two-stream ConvNets.After that,we present the train-ing details,which are very important to reduce the effect of over-fitting.Finally,we describe our testing strategies for action recognition.work architecturesNetwork architectures are of great importance in the de-sign of deep ConvNets.In the past several years,many famous network structures have been proposed for im-age classification,such as AlexNet[7],ClarifaiNet[22], GoogLeNet[15],VGGNet[13],and so on.Some trends emerge during the evolution of from AlexNet to VGGNet: smaller convolutional kernel size,smaller convolutional strides,and deeper network architectures.These trends have turned out to be effective on improving object recogni-tion performance.However,their influence on action recog-nition has not be fully investigated in video domain.Here, we choose two latest successful network structures to de-sign very deep two-stream ConvNets,namely GoogLeNet and VGGNet.GoogLeNet.It is essentially a deep convolutional net-work architecture codenamed Inception,whose basic idea is Hebbian principle and the intuition of multi-scale process-ing.An important component in Inception network is the Inception module.Inception module is composed of multi-ple convolutionalfilters with different sizes alongside each other.In order to speed up the computational efficiency, 1×1convolutional operation is chosen for dimension re-duction.GoogLeNet is a22-layer network consisting of In-ception modules stacked upon each other,with occasional max-pooling layers with stride2to halve the resolution of grid.More details can be found in its original paper[15].VGGNet.It is a new convolutional architecture with smaller convolutional size(3×3),smaller convolutional stride(1×1),smaller pooling window(2×2),deeper structure(up to19layers).The VGGNet systematically investigates the influence of network depth on the recog-nition performance,by building and pre-training deeper ar-chitectures based on the shallower ones.Finally,two suc-cessful network structures are proposed for the ImageNet challenge:VGG-16(13convolutional layers and3fully-connected layers)and VGG-19(16convolutional layers and 3fully-connected layers).More details can be found in its original paper[13].Very Deep Two-stream ConvNets.Following these successful architectures in object recognition,we adapt them to the design of two-stream ConvNets for action recognition in videos,which we called very deep two-stream ConvNets.We empirically study both GoogLeNet and VGG-16for the design of very deep two-stream Con-vNets.The spatial net is built on a single frame image (224×224×3)and therefore its architecture is the same as those for object recognition in image domain.The input of temporal net is10-frame stacking of opticalflowfields (224×224×20)and thus the convolutionalfilters in the first layer are different from those of image classification models.work trainingHere we describe how to train very deep two-stream ConvNets on the UCF101dataset.The UCF101dataset contains13,320video clips and provides3splits for eval-uation.For each split,there are around10,000clips for training and3300clips for testing.As the training dataset is extremely small and the concept of action is relatively com-plex,training very deep two-stream ConvNets is quite chal-lenging.From our empirical explorations,we discover sev-eral good practices for training very deep two-stream Con-vNets as follows.Pre-training for Two-stream ConvNets.Pre-training has turned out to be an effective way to initialize deep Con-vNets when there is not enough training samples available. For spatial nets,as in[12],we choose the ImageNet mod-els as the initialization for network training.For temporal net,its input modality are opticalflowfields,which capture the motion information and are different from static RGB images.Interestingly,we observe that it still works well by pre-training temporal nets with ImageNet model.In order to make this pre-training reasonable,we make several mod-ifications on opticalflowfields and ImageNet model.First, we extract opticalflowfields for each video and discretize opticalflowfields into interval of[0,255]by a linear trans-formation.Second,as the input channel number for tempo-ral nets is different from that of spatial nets(20vs.3),we average the ImageNet modelfilters offirst layer across the channel,and then copy the average results20times as the initialization of temporal nets.Smaller Learning Rate.As we pre-trained the two-stream ConvNets with ImageNet model,we use a smaller learning rate compared with original training in[12]. Specifically,we set the learning rate as follows:•For temporal net,the learning rate starts with0.005, decreases to its1/10every10,000iterations,stops at 30,000iterations.•For spatial net,the learning rate starts with0.001, decreases to its1/10every4,000iterations,stops at 10,000iterations.In total,the learning rate is decreased3times.At the same time,we notice that it requires less iterations for the training of very deep two-stream ConvNets.We analyze that this may be due to the fact we pre-trained the networks with the ImageNet models.More Data Augmentation Techniques.It has been demonstrated that data augmentation techniques such as random cropping and horizontalflipping are very effective to avoid the problem of over-fitting.Here,we try two new data augmentation techniques for training very deep two-stream ConvNets as follows:•We design a corner cropping strategy,which means we only crop4corners and1center of the images.We find that if we use random cropping method,it is more likely select the regions close to the image center and training loss goes quickly down,leading to the over-fitting problem.However,if we constrain the cropping to the4corners or1center explicitly,the variations of input to the network will increase and it helps to reduce the effect of over-fitting.•We use a multi-scale cropping method for training very deep two-stream ConvNets.Multi-scale representa-tions have turned out to be effective for improving the performance of object recognition on the ImageNet dataset[13].Here,we adapt this good practice into the task of action recognition.But we present an efficient implementation compared with that in object recogni-tion[13].Wefix the input image size as256×340and randomly sample the cropping width and height from {256,224,192,168}.After that,we resize the cropped regions to224×224.It is worth noting that this crop-ping strategy not only introduces the multi-scale aug-mentation,but also aspect ratio augmentation.High Dropout Ratio.Similar to the original two-stream ConvNets[12],we also set high drop out ratio for the fully connected layers in the very deep two-stream ConvNets.In particular,we set0.9and0.8drop out ratios for the fully connected layers of temporal nets.For spatial nets,we set 0.9and0.9drop out ratios for the fully connected layers.Multi-GPU training.One great obstacle for applying deep learning models in video action recognition task is the prohibitively long training time.Also the input of multi-ple frames heightens the memory consumption for storing layer activations.We solve these problems by employing data-parallel training on multiple GPUs.The training sys-tem is implemented with Caffe[4]and OpenMPI.Follow-ing a similar technique used in[3],we avoid synchronizing the parameters of fully connected(fc)layers by gathering the activations from all worker processes before running the fc layers.With4GPUs,the training is3.7x faster for VGGNet-16and4.0x faster for GoogLeNet.It takes4x less memory per GPU.The system is publicly available1.work testingFor fair comparison with the original two-stream Con-vNets[12],we follow their testing scheme for action recog-nition.At the test time,we sample25frame images or op-ticalflowfields for the testing of spatial and temporal nets, respectively.From each of these selected frames,we obtain 10inputs for very deep two-stream ConvNets,i.e.4cor-ners,1center,and their horizontalflipping.Thefinal pre-diction score is obtained by averaging across the sampled frames and their cropped regions.For the fusion of spatial and temporal nets,we use a weighted linear combination of their prediction scores,where the weight is set as2for temporal net and1for spatial net.3.ExperimentsDatasets and Implementation Details.In order to ver-ify the effectiveness of proposed very deep two-stream Con-vNets,we conduct experiments on the UCF101[14]dataset. The UCF101dataset contains101action classes and there are at least100video clips for each class.The whole dataset contains13,320video clips,which are divided into 25groups for each action category.We follow the evalua-tion scheme of the THUMOS13challenge[5]and adopt the three training/testing splits for evaluation.We report the av-erage recognition accuracy across classes over these three splits.For the extraction of opticalflowfields,we follow the work of TDD[19]and choose the TVL1opticalflow al-gorithm[21].Specifically,we use the OpenCV implemen-tation,due to its balance between accuracy and efficiency.Results.We report the action recognition performance in Table1.We compare three different network archi-tectures,namely ClarifaiNet,GoogLeNet,and VGGNet-16.From these results,we see that the deeper architec-tures obtains better performance and VGGNet-16achieves the best performance.For spatial nets,VGGNet-16out-perform shallow network by around5%,and for temporal net,VGGNet-16is better by around4%.Very deep two-stream ConvNets outperform original two-stream ConvNets by3.4%.It is worth noting in our previous experience[20]in THUMOS15Action Recognition Challenge[2],we have tried very deep two-stream ConvNets but temporal nets with deeper structure did not yield good performance, as shown in Table2.In this THUMOS15submission,we train the very deep two-stream ConvNets in the same way as the original two-stream ConvNets[12]without using theseSpatial nets Two-stream ConvNets Split1Split3Split1Split3Split1Split3 ClarifaiNet(from[12])-73.0%-83.7%-88.0% GoogLeNet73.2%75.3%86.5%85.8%89.3%89.3% VGGNet-1677.3%78.4%88.2%87.0%91.6%91.4% Table1.Performance comparison of different architectures on the UCF101dataset.(with using our proposed good practices)Spatial netsClarifaiNet VGGNet-16GoogLeNet42.3%54.5%39.9%YeariDT+FV[16]85.9%iDT+HSV[10]87.9%MIFS+FV[8]89.1%TDD+FV[19]90.3%201420142015Very deep two-stream91.4%Table3.Performance comparison with the state of the art on UCF101dataset.proposed good practices.From the different performance of very deep two-stream ConvNets on two datasets,we con-jecture that our proposed good practices is very effective to reduce the effect of over-fitting due to(a)pre-training tem-poral nets with the ImageNet models;(b)using more data augmentation techniques.Comparison.Finally,we compare our recognition accu-racy with several recent methods and the results are shown in Table3.Wefirst compare with Fisher vector represen-tation of hand-crafted features like Improved Trajectories (iDT)[16]or deep-learned features like Trajectory-Pooled Deep-Convolutional Descriptors(TDD)[19].Our results is better than all these Fisher vector representations.Second, we perform comparison between the very deep two-stream ConvNets with other deep networks such as DeepNets[6] and two-stream ConvNets with recurrent neural networks [9].We see that our proposed very deep models outperform previous ones and are better than the best result by2.8%.4.ConclusionsIn this work we have evaluated very deep two-stream ConvNets for action recognition.Due to the fact that ac-tion recognition dataset is extremely small,we proposed several good practices for the training very deep two-stream ConvNets.With our carefully designed training strategies, the proposed very deep two-stream ConvNets achieved the recognition accuracy of91.4%on the UCF101dataset. Meanwhile,we extended the famous Caffe toolbox into Multi-GPU implementation with high efficiency and low memory consumption.References[1]J.Deng,W.Dong,R.Socher,L.Li,K.Li,and F.Li.ImageNet:A large-scale hierarchical image database.In CVPR,pages248–255,2009.1[2]A.Gorban,H.Idrees,Y.-G.Jiang,A.Roshan Zamir,ptev,M.Shah,and R.Sukthankar.THUMOSchallenge:Action recognition with a large number of classes./,2015.3, 4[3]K.He,X.Zhang,S.Ren,and J.Sun.Delving deepinto rectifiers:Surpassing human-level performance on imagenet classification.CoRR,abs/1502.01852, 2015.3[4]Y.Jia,E.Shelhamer,J.Donahue,S.Karayev,J.Long,R.B.Girshick,S.Guadarrama,and T.Darrell.Caffe: Convolutional architecture for fast feature embedding.CoRR,abs/1408.5093.1,3[5]Y.-G.Jiang,J.Liu, A.Roshan Zamir,ptev,M.Piccardi,M.Shah,and R.Sukthankar.THUMOS challenge:Action recognition with a large number of classes,2013.3[6]A.Karpathy,G.Toderici,S.Shetty,T.Leung,R.Suk-thankar,and rge-scale video classifica-tion with convolutional neural networks.In CVPR, pages1725–1732,2014.4[7]A.Krizhevsky,I.Sutskever,and G.E.Hinton.Im-ageNet classification with deep convolutional neural networks.In NIPS,pages1106–1114,2012.1,2[8]n,M.Lin,X.Li,A.G.Hauptmann,and B.Raj.Beyond gaussian pyramid:Multi-skip feature stack-ing for action recognition.In CVPR,pages204–212, 2015.1,4[9]J.Y.-H.Ng,M.Hausknecht,S.Vijayanarasimhan,O.Vinyals,R.Monga,and G.Toderici.Beyond short snippets:Deep networks for video classification.In CVPR,pages4694–4702,2015.1,4[10]X.Peng,L.Wang,X.Wang,and Y.Qiao.Bag ofvisual words and fusion methods for action recogni-tion:Comprehensive study and good practice.CoRR, abs/1405.4506,2014.4[11]J.S´a nchez,F.Perronnin,T.Mensink,and J.J.Ver-beek.Image classification with thefisher vector:The-ory and practice.International Journal of Computer Vision,105(3):222–245,2013.1[12]K.Simonyan and A.Zisserman.Two-stream convo-lutional networks for action recognition in videos.In NIPS,pages568–576,2014.1,2,3,4[13]K.Simonyan and A.Zisserman.Very deep convo-lutional networks for large-scale image recognition.CoRR,abs/1409.1556,2014.1,2,3[14]K.Soomro,A.R.Zamir,and M.Shah.UCF101:Adataset of101human actions classes from videos in the wild.CoRR,abs/1212.0402,2012.1,3[15]C.Szegedy,W.Liu,Y.Jia,P.Sermanet,S.Reed,D.Anguelov,D.Erhan,V.Vanhoucke,and A.Ra-binovich.Going deeper with convolutions.CoRR, abs/1409.4842,2014.1,2[16]H.Wang and C.Schmid.Action recognition with im-proved trajectories.In ICCV,pages3551–3558,2013.1,4[17]L.Wang,Y.Qiao,and X.Tang.Mining motion atomsand phrases for complex action recognition.In ICCV, pages2680–2687,2013.1[18]L.Wang,Y.Qiao,and X.Tang.Motionlets:Mid-level3D parts for human motion recognition.In CVPR, pages2674–2681,2013.1[19]L.Wang,Y.Qiao,and X.Tang.Action recognitionwith trajectory-pooled deep-convolutional descriptors.In CVPR,pages4305–4314,2015.1,3,4[20]L.Wang,Z.Wang,Y.Xiong,and Y.Qiao.CUHK&SIAT submission for THUMOS15action recognition challenge.In THUMOS’15Action Recog-nition Challenge,2015.3,4[21]C.Zach,T.Pock,and H.Bischof.A duality based ap-proach for realtime tv-L1opticalflow.In29th DAGM Symposium on Pattern Recognition,2007.3[22]M.D.Zeiler and R.Fergus.Visualizing and under-standing convolutional networks.In ECCV,pages 818–833,2014.2。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

TowardsinducingHTNdomainmodelsfromexamples(shortpaper)N.E.Richardson,T.L.McCluskey,M.M.WestSchoolofComputingandEngineeringDepartmentofInformaticsTheUniversityofHuddersfield,Huddersfield,HD13DH,UKn.e.richardson@hud.ac.uk,t.l.mccluskey@hud.ac.uk,m.m.west@hud.ac.uk

AbstractDomainmodellingforAIPlanningcanbeacomplexprocessespeciallyifthereisalargenumberofobjectsoractionsorbothtobemodelled.Thistaskcanbefacilitatedbytoolswhichinduceoperatorsormethodsfromexamples.Further,largeandcomplexdomainsaremoreeasilyconstructedifdo-mainlanguagesareusedwhichallowforhierarchicaldecom-positionofdomaincomponents.Examplesofsuchadecom-positionareobjectclasshierarchiesandmethodhierarchies.Thispaperdescribesongoingworkwhichaimstoproducealgorithmswhichlearneffectivehierarchicaldecompositionsfromexamples.

IntroductionDomainmodellingisacomplex,errorproneprocess,espe-ciallywhenthemodeliscomplex.Capturingdynamicsandbehaviourusingoperatorstructures(orcompositionsofop-eratorscalledmethods)liesattheheartofconstructingplan-ningdomains.Onewaytofacilitatetheprocessistousetoolswhichinduceoperatorsormethodsusingtasksolu-tionsastrainingexamples.Inourpreviousworkwehaveshownhow‘flat’domainoperatorscanbeinducedfromexamples.Operatorscanbeinducedusingopmaker(Mc-Cluskey,Richardson,&Simpson2002)whichhasbeenem-beddedinteractivelyinGIPO(Simpsonetal.2001),(Simp-son2005).GIPOaidsdomainconstruction,offeringeditors,validationtools,agraphicallife-historyeditorandplanningtools.OutputfromGIPOisthecompletedandvalidateddomainbeingmodelledinavariantofGIPO’sinternallan-guageOCL(Liu&McCluskey2000)orPDDL.Largeandcomplexdomainsaremoreeasilyconstructedifdomainlanguagesareusedwhichallowforhierarchicaldecompositionofdomaincomponents.Thismakesforaricherlanguagewhichmorecloselycapturestherealworldsituations.Methodscomposedofhierarchicaltasknetworks(HTN)makebettersenseoftheseworldsbutare,however,difficulttoconstruct.Weareworkingonanextensionoftheinductionprocesswherebyoperatorsarecombinedintotasknetworks.Toillustratethetechniquesweareusingwehavecreatedahierarchicalversionofthefamiliarbriefcasedomain.Belowwebrieflydescribethisworktowardscreat-ingprocedureswhichinputtrainingsequencesandapartialmodelcontainingobjectandclassinformation,andoutputsanHTNdomainmodel.HierarchicalDomainsToillustrateourmethodwehavecreatedaversionofthefamiliarbriecaseworldcontainingasimplestructuralhier-archyofobject”sorts”showninfigure1.Thetreeshowsthehierarchicalsortstructurewithpredicatesattachedatappro-priatelevels.Forexampleinheritanceinthesorttreemeansthatthestateatcarrierappliesnotonlytocarrierbuttoanyothersortbelowitonthetree.Theconversedoesnotworksothatgoesinappliestobox,lunchboxandpencilboxonly,andnottocarrierorbag.IntheOCLlanguageplanningdomainsmaybehierarchi-calintwoways.Thelanguagestructurestheobjectstobemembersofcertaintypescalled‘sorts’.Forexampleinthehierarchicalbriefcasedomain(HBC)sorts(carrier,[bag,box]).sorts(bag,[briefcase,suitcase]).objects(briefcase,[bc1]).objects(suitcase,[sc1]).describeshowbag(andbox)areofsort‘carrier’,whilstbriefcaseisofsort‘bag’and‘bc1’isaspecificobjectofsortbriefcase.Thesecondexampleofthehierarchicalna-tureofdomainsinvolvesthemethods.Methodsarecon-structedbecausethesequenceofactionstheyperformneedtobepackagedtogetherforefficiencyand/oreffectiveness,andhencetheyencapsulatedomainheuristics.Theycanbethoughtofas‘mini-plans’whereaplanisasequenceofactionstoachievethestatechangesfromaspecifiedinitialstatetosomepredeterminedgoalstate.Methodsarestruc-turedintohierarchiessothatsomemethodsdecomposeintoothersordecomposeintobothothermethodsandprimitivesinordertocompletetheirtask.Thisstructureincomplexdomainscanbequiteextensiveanditcanbedifficulttoseetheinterlacingoftasks.WorkinProgressUsingpartialdomainmodelswehavebeenabletoreplicatetheGIPO-constructedoperatorsandmethodsbyinductionasfollows.ForeachmethodintheGIPO-constructeddo-mainwehavecompiledafileofexamplematerialincludingthepartialdomain(containinganobjectclasshierarchy)butexcludingalltheoperatorsandmethods.Foreachnotionaltaskthefileseachcontainasolutionintheformofanamedoperatorsequence,initialstatesfortheobjectsinvolvedandnumberedexamplematerialindicatingthestatesafterthe[at_carrier] carrier thing [at_thing]lunch_box pencil_box briefcase suitcase[goes_in][box_outside][box_in_bag]box bag[safe_in][fits_in][in_box][in_bag] place(sorts)Figure1:TheSort-TreeShowingtheLevelsatwhichPredicatesApplyapplicationofeachoperator.Wecanthinkofthisasusingalinearplantoinducetheoperatorsandprovidethedecom-positionforthemethod.Atthisstagewearechoosingma-terialfortheexamplefilescarefullysothatmethodsdonotoverlaptheirtasksbutwiththeaimofbuildingmethodhier-archies.opmaker,describedingreaterdetailin(McCluskey,Richardson,&Simpson2002),inducesoperatorheadingsfromthoselistedintheoperatorsequenceandformsstatetransitions.Thelefthandsideofanyinducedtransitioncomesfromeitherthelistofinitialstatesintheexamplefileorfromthealteredstateofapreviouslyinducedoper-ator.Therighthandsidesoftheinducedtransitionscomefromthenumberedexampleinputs.(Thesecanbenullandinduceaprevailtransition.)Theinductionalgorithmoutputs,foreachexamplefile,asetofinstantiatedoperatorsandanHTNmethodinducedfromthesequence.Ineachcaseonlythoseoperatorsrequiredforthemethodswewerereplicatingwereinducedfromeachfile.Anexampleinducedoperatorputboxinbagisasfollows.operator(put_box_in_bag(Bag,Place,Box),%prevail[se(bag,Bag,[at_carrier(Bag,Place)])],%necessary[sc(box,Box,[box_outside(Box),at_carrier(Box,Place)]=>[box_in_bag(Box,Bag),at_carrier(Box,Place),goes_in(Box,Bag)])],%conditional[sc(thing,Thing,[in_box(Thing,Box),at_thing(Thing,Place)]=>[in_box(Thing,Box),at_thing(Thing,Place),safe_in(Thing,Box)])]).Heretheoperatorheaderliststhesortsofobjectsinvolvedintheactionandtheprevailtransitionstatesthatthebagremainsatthesameplace.Thenecessarytransitionstatesthattheboxchangesstatefrombeingoutsidethebagataplacetobeinginsidethebagatthesameplace.Finallytheconditionaltransitionstatesthatifathingisintheboxthenitalsoundergoesastatechange-inthiscaseitisstillintheboxbuttheboxisnowinthebag.Asimplemethodoperatorinducedisshownbelow.Thetasknetworkiscomposedoftwoinducedoperatorsputinboxandputboxinbag.

相关文档
最新文档