An Algorithm and Implementation for GeoOntologies Integration

An Algorithm and Implementation for GeoOntologies Integration
An Algorithm and Implementation for GeoOntologies Integration

VIII Brazilian Symposium on GeoInformatics, Campos do Jord?o, Brazil, November 19-22, 2006, INPE, p. 129-140.

An Algorithm and Implementation for GeoOntologies

Integration

Guillermo Nudelman Hess1,2,Cirano Iochpe1,3,Silvana Castano2 1Instituto de Inform′a tica–Universidade Federal do Rio Grande do Sul(UFRGS) Caixa Postal15.064–91.501-970–Porto Alegre–RS–Brazil

2DICo–Universit`a degli Studi di Milano

20135Milano–Italy

3Procempa–Empresa da Tecnologia da Informac??a o e Comunicac??a o de Porto Alegre

Porto Alegre–RS–Brazil

{hess,ciochpe}@inf.ufrgs.br,castano@dico.unimi.it Abstract.Sharing information through the web is a practice that many organi-

zations and users do daily.This generates a need of methodologies and tools

for semantic integrating the obtained information.With the GIS community

the scenario is not different,but the needs are a little different because of the

particularities of the geographic data.In this paper we present G-Match,an

algorithm and implementation for integration of geographic ontologies.Our

proposal combines some mathematical foundations and existing technologies in

order to achieve expressive results.

Resumo.O compartilhamento de informac??o es atrav′e s da web′e uma pr′a tica

utilizada diariamente por pessoas e organizac??o es.Esta pr′a tica gera uma ne-

cessidade por metodologias e ferramentas que fac?am a integrac??a o sem?a ntica

das informac??o es obtidas.Na comunidade de SIG o cen′a rio n?a o′e diferente,

com o agravante das particularidades inerentes aos dados geogr′a?cos.Neste

artigo n′o s apresentamos o G-Match,um algoritmo e implementac??a o para a

integrac??a o de ontologias geogr′a?cas.Nossa proposta combina alguns aspectos

matem′a ticos com tecnologias existentes com o objetivo de alcanc?ar resultados

expressivos.

1.Introduction

The high cost of the data acquisition for populating Geographic Information Systems (GIS)was,in the past,one of the main obstacles for its popularization.The Internet created a huge network where users,institutions and organizations can easily share in-formation.However,if on one hand,this interchange offers lots of bene?ts,on the other hand it generates the need to address the heterogeneities among the information obtained from distinct sources.It worth nothing obtaining a third part information if its meaning is not known or cannot be inferred automatically.

One way of making the information’s meaning more explicit is the use of Ontologies [Spaccapietra et al.2004],also in the geographic?eld.Some efforts on creating geo-graphic ontologies are found in[Apinar et al.2005,Chaves et al.2005]as well as the ISO19109standard.

As the ontologies may be created by different communities and thus heterogeneities problems may arise when integrating the information from two or more ontologies.A number of works and tools address the problem of integration(also known as align-ment)for conventional,non geogarphic ontologies[Castano et al.2006,Doan et al.2004, Giunchiglia et al.2005,Noy2004].However,as they are not designed or developed for dealing with the particularities of the GIS data,speci?cally with the spatial relationships [Kuhn2002,Schwering and Raubal2005],many times they do not achieve as good re-sults as the one obtained with conventional information.

In this paper we present the G-Match,an algorithm and an implementation of a geographic ontology matcher.Taking as input two different geographic ontologies,it measures the similarities of their concepts by considering their names,attributes,taxonomies and con-ventional as well as topological relationships.Except for the name comparison,G-Match considers both the commonalities and the differences for measuring the similarity be-tween two concepts.

The remaining of the paper is organized as follows.Some related work regarding geo-graphic information integration are brie?y presented in Section2.A motivating example presented in Section3.Section4comprises the G-MATCH algorithm explanation in de-tails while the results of the execution of our implementation are addressed in Section5. Finally,conclusions and future directions are discussed in Section6.

2.Related Work

2.1.Ontology mediated integration

Rodriguez,Egenhofer and Rugg[Rodr′?guez et al.1999]proposed an approach for as-sessing similarities among geospatial feature class de?nitions.The similarity evaluation is basically done over the semantic interrelation among classes.In that sense,they con-sider not only the IS-A(taxonomic)relations and the part-Of relations but also distinguish features(parts,functions and attributes)[Rodr′?guez and Egenhofer2003].In addition to semantic relations and distinguish features two more linguistic concepts are taken into consideration for the de?nition of entity classes:words and meanings,synonymy and polysemy(homonymy).Later work on using ontologies and based on the properties and operations of the set theory they determined semantic similarity among entity classes from different ontologies[Rodr′?guez and Egenhofer2003],but not considering the geospatial classes.

Fonseca et al.[Fonseca et al.2002]proposed an ontology-driven GIS architecture to enable geographic information integration.In that proposal,the ontology acts as a model-independent system integrator[Fonseca et al.2002].The work of Fonseca et al. [Fonseca et al.2003]focuses on the application level,in which they can work on the trans-lation of a conceptual schema to application ontology.A framework for mapping between ontologies and conceptual schemas de?nes the mappings between a term in a spatial on-tology and an entity in a conceptual schema for geographic information is de?ned,after the formalization of both conceptual schema and ontology’s elements.

Hakimpour and Timpf[Hakimpour and Timpf2001]proposed the use of ontology in the resolution of semantic heterogeneities especially those found in Geographic Information Systems.The goal was to establish equivalences between conceptual schemas or local ontologies.Basically the process is done in two phases[Hakimpour and Geppert2002]: First,a reasoning system is used to merge formal ontologies.The result of merging is

used by a schema integrator to build a global schema from local schemas.In the last phase,they?nd the possible meaningful mappings in the generated global schema and by that establish the mapping of data between the databases.Then the data(instances)from the local schemas are mapped.This process is composed by three parts:entity mapping, attribute mapping,data transformation[Hakimpour and Timpf2001].

Sotnykova et al.[Sotnykova et al.2005]state that the integration of spatial-temporal in-formation(both schema and then data)is a three-step process comprising pre-integration (resolution of syntactic con?icts),Inter-Schema Correspondence Assertions(ICAs)(res-olution of semantic con?icts)and integrated schema generation(resolution of structural con?icts).For semantic con?icts they propose an integration language,which allows for-mulating correspondences between different database schemas.Part of the work concerns on how to integrate the schemas.Not regarding to rules,formalization or measures of sim-ilarities,but in terms of how much information the integrated schema must have. Stoimenov and Djordjevic-Kajan[Stoimenov and Djordjevic-Kajan2005]propose the GeoNis framework to reach the semantic GIS data interoperability.It is based on me-diators,wrappers and ontologies.The use of ontologies was proposed as a knowledge base to solve semantic con?icts as homonyms,synonyms and taxonomic heterogeneities. Matching the geographical objects based on the matching of their child objects is the pro-posal of Cruz et al.[Cruz et al.2004].They have designed and implemented a tool for aligning ontologies,proposing a semi-automatic method for propagating such mappings along the ontologies,especially those ones for land use.In their approach,there must be a global ontology that is the reference for the alignment(not combining or merging) of the local ontologies.Alignment is the identi?cation of semantically related entities in different ontologies.The alignment process is semi-automate,which means that the values associated with the vertices may be assigned in two ways:as functions of the chil-dren vertices or of the user input.The user initially identi?es the hierarchy levels in the two ontologies that are aligned.Then the alignment component propagates to the parent nodes.

2.2.Semantic annotation based integration

The Knowledge and Information Management(KIM)platform provides an infrastructure and services for automatic semantic annotation,indexing and retrieval of unstructured and semi-structure content.The ontologies and knowledge bases are kept in reposi-tories based on cutting edge Semantic Web technology and standards including RDF repositories,ontology middleware and reasoning[Manov et al.].The main idea behind KIM is the semantic annotation,which means that the system looks at the description of an entity searching for key words and then associates it with a concept in the ontology (central knowledge base).The spatial features of a concept are described in a KIMO’s sub ontology.The goal was to include the most important and frequently used types of Locations(which are specializations of Entity),including relations between them(such as hasCapital,subRegionOf),relations between Locations and other Entities and various attributes.

2.3.Spatial relationship based integration

Focusing on the semantic relationships other than the taxonomic ones is the proposal of [Jiang and Conrath1997].Especially,the so called functional relations of concepts are of

interest,and are available in the glosses(descriptions)of the concepts.Doing that,it is possible to?nd that two concepts are semantically related even though are hierarchically not.The measurement of the concepts similarity,following the authors’approach,is to use conceptual regions,which means representing the concepts as n dimensional regions in a vector space i.e.the region is continuous completely closed and the hull of the region is convex.The measurement of semantic similarity between conceptual regions is based on applying previously de?ned distance measures[Schwering and Raubal2005]. Detecting similarities between geospatial data considering different geometries was pro-posed by Belussi et al.[Belussi et al.2005].In that work the authors make a deep com-parison on topological relationships pointing equivalences depending on the geometry of the involved objects.

3.Motivating Example

Let’s consider the scenario bellow,where the ontology from Figure1has to be integrated with ontology from Figure2.This example is quite simple,but complete in terms of the geographic usually found in geographic ontologies and the ones we address in this paper.Furthermore,a problem more speci?c to geographic ontologies and which is

Figure1.The ontology O

not supported by conventional matchers occurs when the concepts are designed using different ontologies.In these cases the different topological relationships may have the

Figure2.The ontology O

same semantics,as stated in[Belussi et al.2005]and illustred in Figure3.Basically,the differences can be enumerated as follows:

4.G-Match

The G-Match algorithm is iterative,which means that each concept c i from an ontology O is compared against all concepts c j from ontology O .Furthermore,the matching process is n:n,which means that more than one concept c j from the ontology O may be the match for a given concept c i from ontology O.In this case,all the possible matches are presented to the user.

As Figure4shows,G-Match takes as input two ontologies O and O and produces as output a list of similarity measures between the concepts from the two ontologies. The WordNet[Miller1995]thesaurus is used by the name matcher and by the attributes matcher modules to?nd synonyms and related terms.

Figure3.Equivalent topologies

Figure4.G-match architecture

4.1.De?nitions

A geographic ontology may contain both geographic as non-geographic(conventional) concepts.Furthermore,it describes the properties of a concepts and the relationships it has with the other concepts.

De?nition1.A concept c is a tuple of the form c=(T,S),where:

?T is the set of terms(synonyms)which nominates the concept c.A term t∈T is de?ned as a unary relation of the form t(c);

?S=(h,P)is the structure of the concept c,where h is the hierarchy in which the concept c is located,de?ned as a unary relation h(c),and P=(A,R)is the set of properties of the concept c.

A is the set of attributes associated with c.An attribute a∈A is a binary

relation of type a(c,dtp),where dtp is a data type(such as string,integer,etc.) R is the set of relations of c with other concepts.A relation r∈R is a binary relation r(c,c ).Furthermore,a relation r={g,tr,cr},where g is a relationship between the concept c and a concept c which denotes a geometry,tr is a topologi-cal relationship,i.e.,a special type of spatial relationship between two geospatial

concepts c and c and cr is a conventional relationship between two concepts c and c .

De?nition2.A geospatial concept sc is de?ned as sc={c∈C|?r(c,c ),r=g}

which means that a concept c is considered geospatial if and only if it has at least on relationship r of type g.

De?nition3.At last,a relationship of type tr is de?ned as tr={r(c,c )∈R|(?c,c ,(c=sc∧c =sc)}

which means it can occur only between two geospatial concepts.

4.2.The Algorithm

The G-Match has three main phases of similarity measure in its execution,as the shown by the diagram of Figure5.In the?rst one,the concepts names(SimName(c i),c j)and attributes(SimAt(c i),c j)are compared.Then,using the results from the name similarity measure,the taxonomies(SimTx(c i),c j),relationships(SimRel(c i),c j)and topology rela-tionships(SimTop(c i),c j)are evaluated.Finally,the last phase is the overall similarity measure.The algorithm’s steps are detailed in the sequence.

Figure5.G-match execution?ow

1.Load concept c i from ontology O.A concept c i is loaded.

2.Load concept c j from ontology O .A concept c j from the other is loaded to be

compared against the concept c i.

3.Measure the similarity between the names of c i and c https://www.360docs.net/doc/0d12307633.html,ing the WordNet

thesaurus as an auxiliary knowledge base,the name similarity SimName(c i,c i)is given by searching the correspondence of the two terms t(c i)and t(c j).In case the

WordNet returns0(not synonyms nor related)we calculate the string similarity of the terms using an adaptation of the metric proposed in[Stoilos et al.2005].

4.Measure the similarity between the attributes of c i and c j.Once again we use

the WordNet to assess the similarity between the terms used for each one of the attributes a(c i,dtp)∈A(c i)against each one of the attributes a(c j,dtp)∈A(c j).In this case,however,we only consider to be a match if the terms are synonyms(or the same).The?nal similarity measure regarding the attributes is given by:

SimAt(c i,c j)= ((a(c

i

)∩a(c j))?W a)

A(c i)∪A(c j)

(1)

where Wa is the weight of the attribute a in the ontology.This weight is de?ned by the number of concepts c that are associated with this attribute.The more concepts,the more generic the attribute is and thus the lower is its weight.

The steps3and4can be executed in parallel.

5.Measure the hierarchy similarity between c i and c j.Based on the results ob-

tained in the step3,the similarity in terms of the concepts taxonomy is measured.

This is done by checking the number of common sub-concepts the concepts c i and c j have and the level in hierarchy they are.The?nal value for the taxonomy similarity measure is given by:

SimT x(c i,c j)= ((h(c

i

)∩h(c j))?W level)

h(c i)∪h(c j)

(2)

where Wlevel is1.0if the subclasses are in the same level and0.7if they are in different levels in the ontologies.

6.Measure the relationship similarity between c i and c j.Again,using the results

from step3,the similarity of the conventional relationships is measured.This is done by simply counting the common relationships the two concepts c i and c j have in common and the different ones,as follows:

SimRel(c i,c j)=

ass(c i,r)∩(c j,r)

(ass(c i,r)∪(c j,r))

(3)

7.Measure the topological relationship similarity between c i and c j.Again,us-

ing the results from step3,the similarity of the topological relationships is mea-sured.We considered here the ones described in[Egenhofer and Franzosa1991] (disjoint,touch,inside,cover,coveredBy,overlap,equal,cross and contain).The similarity value is given by:

SimT op(c i,c j)=

ass(c i,t)∩(c j,t)

(ass(c i,t)∪(c j,t))

(4)

For the topological relationships,it is important to clarify that we do not consider only the name of the relationship,but also the geometries of the concepts.As stated and deeply detailed in[Belussi et al.2005]depending on the geometries of the concepts,the different topological relationship have the same meaning,that is, are equivalent.The G-Match is capable of detecting these equivalences during the similarity measurement,and this is the main feature that makes it more suitable for geographic ontologies than the conventional matcher.

8.Measure the overall similarity between c i and c j.In this step the similarity

obtained in the previous steps are combined in a weighed sum,as follows: Sim(ci,cj)=W N?SimName(c i,c j)+W A?SimAt(c i,c j)+

(5)

W H?SimT x(c i,c j)+W R?SimRel(c i,c j)+W T?SimT op(c i,c j).

9.Non relevant matches discharge.The pairs c i,c j with similarity values too low

must be discharged,in order to produce less results and make it easy to choose the correct matches.Thus,a threshold parameter must be set in the beginning of the G-Match execution and if the similarity value for the pair c i,c j does not reach the threshold,it is discharged.

10.c j iteration.If there are more concepts from ontology O to be processed,return

to step2.

11.c i iteration.If there are more concepts from ontology O to be processed,return

to step1.

5.Results

We executed the G-Match using as inputs the ontologies presented in the section2.Basi-cally,the differences can be enumerated as follows:

?The whole hierarchy of TransportFacility is present only in the ontology O ;

?The concept Attraction from ontology O has as equivalent the concept Sightseen in ontology O ;

?The most similar concept to Hotel from ontology O in ontology O is RegularHo-tel;

?Accommodation in ontology O is associated with Administration,while in ontol-ogy O the association is with Owner;

?In many concepts some attributes are present only in ontology O ;

We implemented the G-Match in two ways:as a stand-alone,complete matcher(called G-Match complete)and as a extension for an existing matcher,as the ones cited previ-ously,called G-Match.In the later case,only the relationships(conventional and topo-logical)similarities were measured by our tool.The tests were run establishing as the minimum threshold for analysis0.4.Table1shows the results in terms of recall and pre-cision.EM denotes the expected matches,AM the automatic matches(i.e.,similarity measured higher than0.7)and CAM the correct automatic matches.As can be seen us-ing a matcher specially tailored for the spatial relationships increases both the recall and precision.Furthermore,in the cases where the G-Match failed in choosing the correct match there were more than one pair(c i,c j)with similarity value higher than0.7.The expected correct match was one of the returned pairs,but not the one with higher sim-ilarity.When the G-Match did not?nd any pair(c i,c j)with similarity higher than the acceptance threshold,the correct pair was within the ones with similarity higher than the analysis threshold.

6.Conclusion and Future Works

The challenge faced here was to develop a methodology that achieves good practical re-sults when integrating two geographic ontologies by measuring their content similarities and differences.The similarity measure is balanced,that is,considers the features of

Table1.Precision and Recall results

Matcher EM AM CAM Precision Recall

Prompt15141393%87%

H-Match15121083%67%

G-Match15131185%73%

G-Match Complete15151493%93%

a concept separately and then gives some weights for each feature-name,attributes, taxonomy,conventional and topological relationships-to compute the overall similarity between two concepts.As the information may be de?ned in different levels of detail, there is not a perfect combination of the weight factors(WN,WA,WH,WR and WT). This combination depends on the characteristics of the input ontologies.The results ob-tained show that the G-Match is in the correct direction towards the development of a semantic matcher specially tailored for geographic ontologies.

As future work,we plan to study the impact of the other spatial relationships,such as distance relations,on the similarity measure between two ontologies.Furthermore,up to know the G-Match considers always all the features,independently of the concept being processed.Thus,if the ontology does not have,for example,topological relationships,the similarity measure decreases.Because of that,we intend to make the G-Match capable of self-adaptation depending on the input ontology,which means self-con?guration of the weights WN,WA,WH,WR and WT.

References

Apinar,I.B.,Sheth,A.,Ramakrishnan,C.,Usery,E.L.,Azami,M.,and Kwan,M.-P.

(2005).Geospatial ontology development and semantic analysis.In Wilson,J.P.and Fotheringham,S.,editors,Handbook of Geographic Information Science.Blackwell Publishing.

Belussi,A.,Catania,B.,and Podest`a,P.(2005).Towards topological consistency and similarity of multiresolution geographical maps.In GIS’05:Proceedings of the13th annual ACM international workshop on Geographic information systems,pages220–229,New York,NY,USA.ACM Press.

Castano,S.,Ferrara,A.,and Montanelli,S.(2006).Matching ontologies in open net-worked systems:Techniques and applications.In Spaccapietra,S.,Atzeni,P.,Chu, W.W.,Catarci,T.,and Sycara,K.P.,editors,J.Data Semantics V,Lecture Notes in Computer Science,pages25–63.Springer.

Chaves,M.S.,Silva,M.J.,and Martins,B.(2005).A geographic knowledge base for semantic web applications.In Heuser,C.A.,editor,SBBD,pages40–54.UFU. Cruz,I.F.,Sunna,W.,and Chaudhry,A.(2004).Semi-automatic ontology alignment for geospatial data integration.In Egenhofer,M.J.,Freksa,C.,and Miller,H.J.,editors, GIScience,volume3234of Lecture Notes in Computer Science,pages51–66.Springer. Doan,A.,Madhavan,J.,Domingos,P.,and Halevy,A.Y.(2004).Ontology matching:A machine learning approach.In Staab,S.and Studer,R.,editors,Handbook on Ontolo-gies,International Handbooks on Information Systems,pages385–404.Springer.

Egenhofer,M.J.and Franzosa,R.D.(1991).Point set topological relations.International Journal of Geographical Information Systems,5:161–174.

Fonseca,F.,Egenhofer,M.,Agouris,P.,and Camara,G.(2002).Using ontologies for integrated geographic information systems.Transactions in Geographic Information Systems,6(3).

Fonseca,F.T.,Davis,C.A.,and C?a mara,G.(2003).Bridging ontologies and conceptual schemas in geographic information integration.GeoInformatica,7(4):355–378.

Giunchiglia,F.,Shvaiko,P.,and Yatskevich,M.(2005).S-match:an algorithm and an implementation of semantic matching.In Kalfoglou,Y.,Schorlemmer,W.M.,Sheth,

A.P.,Staab,S.,and Uschold,M.,editors,Semantic Interoperability and Integration,

volume04391of Dagstuhl Seminar Proceedings.IBFI,Schloss Dagstuhl,Germany.

Hakimpour,F.and Geppert,A.(2002).Global schema generation using formal ontolo-gies.In Spaccapietra,S.,March,S.T.,and Kambayashi,Y.,editors,ER,volume2503 of Lecture Notes in Computer Science,pages307–321.Springer.

Hakimpour,F.and Timpf,S.(2001).Using ontologies for resolution of semantic hetero-geneity in gis.

Jiang,J.and Conrath,D.(1997).Semantic similarity based in corpus statistics and lex-ical taxonomy.In International Conference Reasearch in Computational Linguistics, ROCLING X,Taiwan.

Kuhn,W.(2002).Modeling the semantics of geographic categories through conceptual integration.In Egenhofer,M.J.and Mark,D.M.,editors,GIScience,volume2478of Lecture Notes in Computer Science,pages108–118.Springer.

Manov,D.,Kiryakov,A.,Popov,B.,Bontcheva,K.,and and,D.M.Experiments with geographic knowledge for information extraction.

Miller,G.A.(1995).Wordnet:A lexical database for https://www.360docs.net/doc/0d12307633.html,mun.ACM,38(11):39–

41.

Noy,N.F.(2004).Tools for mapping and merging ontologies.In Staab,S.and Studer,R., editors,Handbook on Ontologies,International Handbooks on Information Systems, pages365–384.Springer.

Rodr′?guez,M.A.and Egenhofer,M.J.(2003).Determining semantic similarity among entity classes from different ontologies.IEEE Trans.Knowl.Data Eng.,15(2):442–456.

Rodr′?guez,M.A.,Egenhofer,M.J.,and Rugg,R.D.(1999).Asessing semnatic similar-ities among geospatial feature class de?nitions.In Vckovski,A.,Brassel,K.E.,and Schek,H.-J.,editors,INTEROP,volume1580of Lecture Notes in Computer Science, pages189–202.Springer.

Schwering,A.and Raubal,M.(2005).Spatial relations for semantic similarity measure-ment.In Akoka,J.,Liddle,S.W.,Song,I.-Y.,Bertolotto,M.,Comyn-Wattiau,I., Cher?,S.S.-S.,van den Heuvel,W.-J.,Thalheim,B.,Kolp,M.,Bresciani,P.,Trujillo, J.,Kop,C.,and Mayr,H.C.,editors,ER(Workshops),volume3770of Lecture Notes in Computer Science,pages259–269.Springer.

Sotnykova,A.,Cullot,N.,and Vangenot,C.(2005).Spatio-temporal schema integra-tion with validation:A practical approach.In Meersman,R.,Tari,Z.,Herrero,P., M′e ndez,G.,Cavedon,L.,Martin,D.,Hinze,A.,Buchanan,G.,P′e rez,M.S.,Robles, V.,Humble,J.,Albani,A.,Dietz,J.L.G.,Panetto,H.,Scannapieco,M.,Halpin,T.A., Spyns,P.,Zaha,J.M.,Zim′a nyi,E.,Stefanakis,E.,Dillon,T.S.,Feng,L.,Jarrar,M., Lehmann,J.,de Moor,A.,Duval,E.,and Aroyo,L.,editors,OTM Workshops,volume 3762of Lecture Notes in Computer Science,pages1027–1036.Springer. Spaccapietra,S.,Cullot,N.,Parent,C.,and Vangenot,C.(2004).On spatial ontologies.

In Proceedings of the VI Brazilian Symposium on Geoinformatica(GEOINFO2004), Campos do Jord?a o,Brazil.

Stoilos,G.,Stamou,G.,and Kollias,S.(2005).A string metric for ontology alignment.

4th International Semantic Web Conference(ISWC2005),Galway,2005. Stoimenov,L.and Djordjevic-Kajan,S.(2005).An architecture for interoperable gis use in a local community https://www.360docs.net/doc/0d12307633.html,puters and Geosciences,31:211–220.

国密算法(国家商用密码算法简介)

国家商用密码算法简介 密码学是研究编制密码和破译密码的技术科学,起源于隐秘消息传输,在编码和破译中逐渐发展起来。密码学是一个综合性的技术科学,与语言学、数学、电子学、声学、信息论、计算机科学等有着广泛而密切的联系。密码学的基本思想是对敏感消息的保护,主要包括机密性,鉴别,消息完整性和不可否认性,从而涉及加密,杂凑函数,数字签名,消息认证码等。 一.密码学简介 密码学中应用最为广泛的的三类算法包括对称算法、非对称算法、杂凑算法。 1.1 对称密码 对称密码学主要是分组密码和流密码及其应用。分组密码中将明文消息进行分块加密输出密文区块,而流密码中使用密钥生成密钥流对明文消息进行加密。世界上应用较为广泛的包括DES、3DES、AES,此外还有Serpent,Twofish,MARS和RC6等算法。对称加密的工作模式包括电码本模式(ECB 模式),密码反馈模式(CFB 模式),密码分组链接模式(CBC 模式),输入反馈模式(OFB 模式)等。1.2 非对称密码 公钥密码体制由Diffie和Hellman所提出。1978年Rivest,Shamir和Adleman提出RAS密码体制,基于大素数分解问题。基于有限域上的离散对数问题产生了ElGamal密码体制,而基于椭圆曲线上的离散对数问题产生了椭圆曲线密码密码体制。此外出现了其他公钥密码体制,这些密码体制同样基于困难问题。目前应用较多的包括RSA、DSA、DH、ECC等。 1.3杂凑算法 杂凑算法又称hash函数,就是把任意长的输入消息串变化成固定长的输出串的一种函数。这个输出串称为该消息的杂凑值。一个安全的杂凑函数应该至少满足以下几个条件。 1)输入长度是任意的; 2)输出长度是固定的,根据目前的计算技术应至少取128bits长,以便抵抗生日攻击; 3)对每一个给定的输入,计算输出即杂凑值是很容易的; 4)给定杂凑函数的描述,找到两个不同的输入消息杂凑到同一个值是计算上不可行的,或给定 杂凑函数的描述和一个随机选择的消息,找到另一个与该消息不同的消息使得它们杂凑到同一个值是计算上不可行的。 杂凑函数主要用于完整性校验和提高数字签名的有效性,目前已有很多方案。这些算法都是伪随机函数,任何杂凑值都是等可能的。输出并不以可辨别的方式依赖于输入;在任何输入串中单个比特

基于国密算法、数字证书的安全问题理解

基于国密算法、数字证书的安全问题理解 1. 国密算法发展背景 随着金融安全上升到国家安全高度,近年来国家有关机关和监管机构站在国家安全和长远战略的高度提出了推动国密算法应用实施、加强行业安全可控的要求。摆脱对国外技术和产品的过度依赖,建设行业网络安全环境,增强我国行业信息系统的“安全可控”能力显得尤为必要和迫切。 密码算法是保障信息安全的核心技术,尤其是最关键的银行业核心领域长期以来都是沿用3DES、SHA-1、RSA等国际通用的密码算法体系及相关标准,为从根本上摆脱对国外密码技术和产品的过度依赖。2010年底,国家密码管理局公布了我国自主研制的“椭圆曲线公钥密码算法”(SM2算法)。为保障重要经济系统密码应用安全,国家密码管理局于2011年发布了《关于做好公钥密码算法升级工作的通知》,要求“自2011年3月1日期,在建和拟建公钥密码基础设施电子认证系统和密钥管理系统应使用SM2算法。自2011年7月1日起,投入运行并使用公钥密码的信息系统,应使用SM2算法。” 2. 国密算法 安全的本质是算法和安全系统 保证安全最根本的方法是基础软件和基础硬件都是自己控制,目前我国无法短期国产化的情况下,数据加密是最好的方式。如果加密算法以及实现都是外国提供的,安全性从何说起,所以我国国家密码局发布了自主可控的国密算法 国密算法:为了保障商用密码的安全性,国家商用密码管理办公室制定了一系列密码标准,包括SM1(SCB2)、SM2、SM3、SM4、SM7、SM9、祖冲之密码算法(ZUC)那等等。 其中SM1、SM4、SM7、祖冲之密码(ZUC)是对称算法;SM2、SM9是非对称算法;SM3是哈希算法。 目前SM1、SM7算法不公开,调用该算法时,需要通过加密芯片的接口进行调用 3. 国密算法的现状 虽然在SSL VPN、数字证书认证系统、密钥管理系统、金融数据加密机、签名验签服务器、智能密码钥匙、智能IC卡、PCI密码卡等产品上改造完毕,但是目前的信息系统整体架构中还有操作系统、数据库、中间件、浏览器、网络设备、负载均衡设备、芯片等软硬件,由于复杂的原因无法完全把密码模块升级为国产密码模块,导致整个信息系统还存在安全薄弱环节。 4. 浏览器与SSL证书 SSL证书的作用就是传输加密 如果网站安装了SSL证书,就启用了https访问,那么你的访问就更加安全了。

国密算法应用流程

RJMU401国密算法应用流程 一、国密芯片RJMU401数据加密传输、身份认证及数据完整性保证 1、传输信道中的数据都采用SM4分组加密算法,保证数据传输时数据的机密性; 2、使用散列算法SM3保证数据的完整性,以防止数据在传输的过程中被篡改; 3、使用非对称算法SM2的私钥签名来保证数据的不可抵赖性,确保数据是从某一个确 定的车载用户端发出; 4、具体流程如下: a、用户数据使用SM3进行散列运算得到数据摘要,再使用非对称算法SM2进行 摘要签名; b、同时使用对称算法SM4的密钥对数据摘要进行加密并传输给安全模块; c、使用同一个对称算法SM4密钥对用户数据进行加密,并将加密后的密文传输给 监控端; d、监控端收到数据密文后,使用对应的密钥进行对称算法SM4解密,并使用散列 算法对解密后的数据进行运算得到数据摘要1; e、监控端对收到的摘要签名进行对称算法SM4解密,再经过非对称算法解密得到 最初的数据摘要2; f、对比数据摘要1和数据摘要2,若两者相等则认为传输数据具备完整性;否则 认为数据出错; 图1、数据加密传输、数据完整性及签名认证流程 补充说明: 1、需要有一主机发起认证指令,监控端收到对应指令后,会产生一个随机数(会话密 钥),可用该随机数作为对称加密SM4的单次密钥,用于加密传输的数据; 2、此SM4的会话密钥不会明文传输,监控端查找对应车载用户端的公钥进行加密,传 给对应的车载用户端,车载用户端收到数据后,用自己的SM2私钥解密,即可得到此次会话密钥。(会话密钥采用非对称密钥机制进行传输) 3、每一个车载用户端存放一个或者两个SM2密钥对,可采用CA证书形式。证书在车 载用户端生产时候预置进安全芯片RJMU401,监控端存储所有的车载用户端的SM2密钥对(证书)。

国密算法

国密算法(SM1/2/4)芯片用户手册(UART接口) 注意:用户在实际使用时只需通过UART口控制国密算法芯片即可,控制协议及使用参考示例见下面 QQ:1900109344(算法芯片交流)

目录 1.概述 (3) 2.基本特征 (3) 3.通信协议 (3) 3.1.物理层 (3) 3.2.链路层 (4) 3.2.1.通讯数据包定义 (4) 3.2.2.协议描述 (4) 3.3.数据单元格式 (5) 3.3.1.命令单元格式 (5) 3.3.2.应答单元格式 (5) 3.4.SM1算法操作指令 (6) 3.4.1.SM1 加密/解密 (6) 3.4.2.SM1算法密钥导入指令 (6) 3.5.SM2算法操作指令 (7) 3.5.1.SM2_Sign SM2签名 (7) 3.5.2.SM2_V erify SM2验证 (7) 3.5.3.SM2_Enc SM2加密 (8) 3.5.4.SM2_Dec SM2解密 (9) 3.5.5.SM2_GetPairKey 产生SM2密钥对 (9) 3.5.6.SM2算法公钥导入 (10) 3.6.SM4算法操作指令 (10) 3.6.1.SM4加密/解密 (10) 3.6.2.SM4算法密钥导入指令 (11) 3.7.校验/修改Pin指令 (11) 3.8.国密算法使用示例(Uart口命令流) (12) 3.8.1.SM1算法操作示例 (12) 3.8.2.SM2算法操作示例 (13) 3.8.3.SM4算法操作示例 (14) 3.9.参考数据 (15) 3.9.1.SM1参考数据 (15) 3.9.2.SM2参考数据 (15) 3.9.3.SM4参考数据 (17)

国密算法芯片

国密算法芯片 用户手册 注意:用户在实际使用时需要通过UART口控制国密算法芯片,控制协议见下面说明,芯片本身只包含其中某几个算法,需要在购买时说明。 通过UART口发命令即可,方便用户使用,价格便宜 QQ:2425053909(注明加密芯片)

目录 1.概述 (2) 2.基本特征 (2) 3.通信协议 (2) 3.1.物理层 (2) 3.2.链路层 (3) 3.2.1. 通讯数据包定义 (3) 3.2.2. 协议描述 (3) 3.3.数据单元格式 (4) 3.3.1. 命令单元格式 (4) 3.3.2. 应答单元格式 (4) 4.国密芯片加解密指令 (5) 4.1.SM1算法操作指令 (5) 4.2.SM4算法操作指令 (5) 4.3.SM7算法操作指令 (6) 4.4.SSF33算法操作指令 (7) 4.5.SM3算法操作指令 (7)

1.概述 本文档适用于使用国密算法芯片进行终端开发的用户。终端开发者通过发送串口命令的方式操作国密芯片进行数据交换,国密产品应用开发。通过阅读本文档,终端开发者可以在无需考虑国密算法实现细节情况下,借助国密芯片来迅速改造现有系统使之适合国密应用。 2.基本特征 芯片的基本特征见下表: 3.通信协议 3.1.物理层 国密芯片采用系统供电方式,电压5V或者3.3V。国密芯片串口与系统MCU 串口相连,异步全双工通讯,波特率默认为115200bps。数据格式为1位起始位、8位数据位和1位停止位,无校验位。 系统MCU向国密芯片发送命令时,在同一个命令内,相连两个发送字符之间的间隔不应大于10个字符时间,否则国密芯片可能会认为命令超时导致无任何响应。

基于国密算法的文件安全系统研究与实现

基于国密算法的文件安全系统研究与实现 徐学东1,季才伟2 (1.长春工程学院机电学院,吉林长春,130012;2.长春易申软件有限公司,吉林长春,130012) 摘要:针对网络环境下电子文件的安全需求,研究基于国密算法的文件安全系统。提出C/S架构的系统模型,针对公共信道交互的安全性问题,提出“新鲜性标识符+挑战应答模式”的认证及密码协议方案,并完成了算法、密钥体系安全性设计和产品开发,目前已并通过了国密测试。 关键词:国密算法;公共信道;挑战应答模式;密钥体系 Research on the file security system based on the National commercial cipher algorithm Xu Xuedong1,Ji Caiwei2 (1.School of Mechanical&Electrical Engineering Changchun institute of Technology,ChangChun,130012; 2.Changchun E-sun Software Co.,ChangChun,130012) Abstract:According to the security requirement of the electronic file in the network environment,Analysis the file security system based on the national commercial cipher algorithm. Present C/S structure of the system model, “the fresh identifier + challenge response mode”authentication and cryptographic protocol is put forward to solve the problem of the safety of public channel interaction,and gives the algorithm and key security system design and specific implementation process.At present, has passed the testing of National commercial cipher algorithm. Keywords:National commercial cipher algorithm;Common channel;Challenge response mode;Key system 0 引言 重要电子文件传输、存储等过程中存在非法访问、窃取的风 险,需要通过密码产品进行保护。我国《商用密码管理条例》规定, 对信息进行加密保护或认证所使用的密码技术和产品,只能使用 经国家密码管理局审批认定的商用密码产品。商密产品必须基于 国密商用算法,且需通过国密产品检测获得产品型号。 国密商用算法是指国密 SM 系列算法。在国密产品检测中, 除了应用国密算法,基于PKI的加密体系,公共信道双向认证、算 法和密钥管理安全性和性能均有很高的要求。研究基于国密算法 的电子文件安全系统的设计方案,确保可信认证及无漏洞交互, 达到国密产品检测的要求。 1 系统架构设计 网络环境下,电子文件一般采取集中管理、分布使用的管理 方式。基于SC/SS架构模型,系统由服务端和客户端构成。总体 架构如图1。服务端负责用户的身份认证,权限控制,电子文件的 加密存储和行为审计。客户端负责文件的解密,监控文件的使用。 两者共同完成用户身份认证、会话协商密钥和客户端系统的安全 监控、防破解功能。 系统的加解密环节则需要在服务端和客户端分别实现。服务 端密码运算采用国密局审批通过的SJK1238加密卡,负责与客户 端协商会话密钥,验证客户端身份,对文档进行加解密。客户端采 用SJK1104 USBkey,负责与服务端协商会话密钥时的数字签名、 通讯数据加解密及客户端文档解密。为确保性能及密钥的安全 性,密码运算采用硬件加密。具体算法为:公钥密码算法为SM2, 杂凑算法为SM3,对称密码算法为SM4。 2 公共信道认证及密码协议 由于客户端和安全服务器的交互必须通过公共信道,其安全 设计至关重要。传统通过公共信道实现基于本地身份的交互认证 问题的研究已经较为深入。无密钥的安全传输机制也取得一些成 果。本设计的目标是简洁、高效、确保认证防止内部攻击。 2.1 新鲜性标识符概念 基于细粒度新鲜性的认证及密码协议是通过公共信道交互 的有效方案。其中新鲜性的概念是:协议运行期间,新鲜性标示 符N在产生时间t0之间没有被使用过。如果交互双方都相信新 鲜性标识符重复概率很低,攻击者在协议运行期间找到一个使用 过的标识符是实践上不可行的,则该新鲜性标识符只有若干合法 的主体拥有。 2.2 基于新鲜性标识符的挑战应答模式 挑战应答(Challenge/Response)模式是常用的认证方式,就 是每次认证时认证服务器端都给客户端发送一个"挑战"字串, 客户端程序收到这个"挑战"字串后,做出相应的"应答",认 证服务器将应答串与自己的计算结果比较,判断是否通过认证。 根据Dolev-Yao模型,每次交互过程双方均采用“新鲜性挑战”字 串,如加密算法完善,不同消息之间的格式不能相同,每个主体能 区分消息是否是由自己产生,则基于该新鲜性标识符进行双向认 证的过程就是安全的。 本系统客户端与服务器通讯采用标准双向认证协议,使用挑 战应答模式协商会话密钥,交互双方在交互过程中各产生一个新 鲜性随时数,作为签名验证的标识符。协商成功后通讯的数据均 使用会话密钥进行加密。协商过程如下: 基金项目:吉林省科技攻关计划项目(20140204060SF);吉林省教育厅科研规划项目(吉教科合字[2014]第337号)DOI:10.16520/https://www.360docs.net/doc/0d12307633.html,ki.1000-8519.2016.18.029

国密算法高速加密芯片

国密算法 --TF32A09系列芯片 赵玉山1371 899 2179 芯片概述: TF32A09系列芯片采用独有的数据流加解密处理机制,实现了对高速数据流同步加解密功能,在加解密速度上全面超越了国内同类型芯片。 TF32A09系列芯片集成度高、安全性强、接口丰富、加解密速度快,具有极高的性价比。该系列芯片可广泛的应用于金融、电子政务、电子商务、配电、视频加 密、安全存储、工业安全防护、物联网安全防护等安全领域。 芯片架构:

关键特性: ?CS320D 32位安全内核,外部总线支持8位/16位/32位访问; ?工作频率可达到100MHz; ?64k Byte ROM,可将成熟固件或受保护代码掩膜到ROM,密码算法使用MPU保护; ?20kByte 片内SRAM,从容完成高速数据处理; ?512kByte Nor Flash,满足不同客户应用要求; ?拥有两个USB—OTG 接口,可根据应用需求设置成Host或Device ; ?集成多种通信接口和多种信息安全算法(SM1、SM2、SM3、SM4、3DES、RSA等),可实现高度整合的单芯片解决方案; ?支持在线调试,IDE 调试环境采用CodeWarrior。 内部存储器: ?20KB SRAM ?64KB ROM ?512KB FLASH 芯片外部接口:

技术参数: 1. 工作电压: 2.4V~ 3.6V 2. 频率:最高可达100MHZ 3. 外形尺寸:TF32A9FAL1(LQFP176,20X20X1.4mm ) TF32A9FCL1(LQFP80,10X10X1.4mm ) TF32A9FDL1(LQFP64,10X10X1.4mm ) 芯片优势: ? 自主设计,国产安全芯片 ? 专利设计,加密传输速度快 ? 拥有两个USB —OTG 接口,高速流加密 ? 接口丰富 ? 集成国密算法 外设模块 176Pin 芯片 80Pin 芯片 64Pin 芯片 USB1 ● ● ● USB2 ● ● 7816-1(智能卡) ● ● ● 7816-2(智能卡) ● ● SPI1 ● ● ● SPI2 ● UART1 ● ● ● UART2 ● ● NFC(nandFlash) ● ● I2C ● ● ● KPP(键盘) ● GPIO 32 16 2

车联网安全之国密算法与其他算法的区别

车联网安全之国密算法与其他算法的区别 概述: 在车联网中,不管是端到云、还是云到车,每个环节、每个节点的信息安全保障都离不开加解密算法的支持,加解密算法的合理使用以及其计算能力直接影响车联网的性能和用户的体验。 今天和大家讨论一下国密算法与其他加解密算法(即国际算法是美国的安全局发布,是现今最通用的商用算法)的差别,以便于车联网的设计者们在不同的环节、节点采用最优的算法,从而提高产品的性价比。 国密算法: 即国产密码算法,是国家密码局制定标准的一系列算法,其中包括了对称加密算法,椭圆曲线非对称加密算法,杂凑算法等,在金融领域目前主要使用公开的SM2、SM3、SM4三类算法,分别是非对称算法、哈希算法和对称算法。在车联网中,目前还没有车厂采用国密算法,但作者认为随着互联网安全作为国家属性的重要度提升,未来国密算法将是车联网安全领域的发展趋势。 与国际算法差别: SM1是国家密码管理部门审批的分组密码算法,分组长度和密钥长度都为128比特,算法安全保密强度及相关软硬件实现性能与AES相当,该算法不公开,仅以 IP核的形式存在于芯片中; 图1:SM1与AES的比较 SM2算法和RSA算法都是公钥密码算法,SM2算法是一种更先进安全的算法,在我们国家商用密码体系中被用来替换RSA算法。SM2性能更优更安全:密码复杂度高、处理速度快、硬件性能消耗更小。

图2:SM2与RSA的比较 注: 1. 亚指数级算法复杂度低于指数级别的算法。 2. RSA秘钥生成速度较慢。例:主频1.5G赫兹的话,RSA需要2-3秒的,这在车联网中是根本无法接受的,而SM2只需要几十毫秒。 SM3是摘要加密算法,该算法适用于商用密码应用中的数字签名和验证以及随机数的生成,是在SHA-256基础上改进实现的一种算法。SM3算法采用Merkle-Damgard结构,消息分组长度为512位,摘要值长度为256位。 图3:SM3与Sha256的比较 SM4分组密码算法是我国自主设计的分组对称密码算法,用于实现数据的加密/解密运算,以保证数据和信息的机密性,是专门为无线局域网产品设计的加密算法。

相关主题
相关文档
最新文档