用四叉树和希尔伯特曲线做空间索引

合集下载

一种存储复杂多边形包含关系的四叉树索引

一种存储复杂多边形包含关系的四叉树索引
四叉树是一种用来划分二维空间的树状数据结构，可以用来存储和检索包含关系。

在
存储复杂多边形的包含关系时，可以利用四叉树索引来提高存储和查询的效率。

四叉树是一种将空间划分为四个象限的树结构。

每个节点有四个子节点，每个子节点
对应一个象限。

树的根节点代表整个二维空间，然后通过递归将空间划分为四分之一，并
一直划分下去，直到达到一个最小单位。

在存储复杂多边形的包含关系时，可以根据多边形的边界框来划分空间并构建四叉树。

边界框是一个矩形，包含了多边形的最小外接矩形。

首先找到整个多边形集合的边界框，
然后将边界框划分为四个象限，分别找到每个象限内包含的多边形，并用四叉树的节点来
表示这些多边形。

然后对每个节点再进行递归划分，直到每个叶子节点只包含一个多边
形。

在查询复杂多边形的包含关系时，可以通过四叉树索引来提高查询效率。

首先找到查
询多边形的边界框，然后判断边界框与四叉树节点的相交关系。

如果边界框与节点不相交，则可以跳过该节点及其子节点，不需要进一步的遍历。

如果边界框与节点相交，则需要继
续对子节点进行检查，直到达到叶子节点。

最后可以检查叶子节点内的多边形是否与查询
多边形相交，从而确定是否包含。

四叉树索引可以大大减少查询的次数，提高查询的效率。

因为每个节点都代表了一个
空间区域，所以通过判断边界框的相交关系可以快速排除不相关的节点。

四叉树索引对于
复杂多边形的包含关系的查询也非常高效，因为每个叶子节点只包含一个多边形，不需要
进行进一步的判断。

Hilbert曲线编码

得s[2:1]=01，所以最终s=100001，即s=33。反之，逆运算的对应表格如下。
Hilbert编码逆运算按位运算对应表格
三、Hilbert编码方法
利用上述编码方法得到的8× 8平面的Hilbert编码结果如下。
0 1
3 2
4 7 8 11
5 58 59 6 15 12 16 17
Hilert曲线编码
一、Hilbert曲线介绍二、Hilbert编码的意义
三、Hilbert编码方法四、Hilbert编码应用
一、Hilbert曲线介绍
Hilbert曲线是一种能填充满一个平面正方形的分形曲线（空间填充曲线），由德国数学家David
Hilbert在1891年提出。
Hilbert曲线的建立过程
究》一文中，提出了一种利用Hilbert曲线进行数据划分的方法，有效避免了数据倾斜现象，从而提高
了空间数据的检索与查询效率。该算法是在充分考虑空间数据中每个元素个体的数据量不均衡性的基础上进行划分，使得在不同磁盘上存储的实际数据量大小趋近平均，而不是将元素个数进行平均划分。
算法的原理如下，将空间数据中每个元素的数据量划分为间对象实体的大小Vi 和非空间类型字段值的大小Vother，计算出分为N份后每份理论的数据量Vavg，扫描整个空间数据集，构造Hilbert曲线，为每个空间对象实体赋予Hilbert值，初始化第一个数据集数据=0。然后进行一个累加和比较的过程，按照Hilbert曲线编码顺序累加数据，当累加到某一元素，使得数据集数据量大于Vavg时，在对下一数据集进行累加工作，直到将数据划分完毕。
询与检索的并行化程度。
此外，Hilbert曲线编码在数据压缩等方面有重要应用。
三、Hilbert编码方法

空间索引使用的意义及网格索引和四叉树索引简单介绍转

空间索引使用的意义及网格索引和四叉树索引简单介绍转空间索引使用的意义及网格索引和四叉树索引简单介绍转空间索引使用的意义及网格索引和四叉树索引简单介绍[转载]2010-09-27 07:40在介绍空间索引之前，先谈谈什么叫"索引"。

对一个数据集做"索引"，是为了提高对这个数据集检索的效率。

书的"目录"就是这本书内容的"索引"，当我们拿到一本新书，想查看感兴趣内容的时候，我们会先查看目录，确定感兴趣的内容会在哪些页里，直接翻到那些页，就OK了，而不是从第一章节开始翻，一个字一个字地找我们感兴趣的内容，直到找到为止，这种检索内容的效率也太低了，如果一本书没有目录，可以想象有多么不方便…可见书的目录有多重要，索引有多重要啊～现在大家对索引有了感性认识，那什么是"空间索引"呢?"空间索引"也是"索引"，是对空间图形集合做的一个"目录"，提高在这个图形集合中查找某个图形对象的效率。

比如说，我们在一个地图图层上进行矩形选择，确定这个图层上哪些图元被这个矩形所完全包含呢，在没有"空间索引"的情况下，我们会把这个图层上的所有图元，一一拿来与这个矩形进行几何上的包含判断，以确定到底哪些图元被完全包含在这个矩形内。

您是不是觉得这样做很合理呢?其实不然，我们先看一个网格索引的例子:我们对这个点图层作了网格索引，判断哪些点在这个矩形选择框内，是不需要把这个图层里所有的点都要与矩形进行几何包含运算的，只对a,b,c,d,e,f,g这七个点做了运算。

可以推想一下，如果一个点图层有十万个点，不建立空间索引，任何地图操作都将对整个图层的所有图元遍历一次，也就是要For循环10万次;建立索引将使得For循环的次数下降很多很多，效率自然提高很多～呵呵…想必大家都知道空间索引的好处了，也不知不觉向大家介绍了点图层的网格索引，还有哪些常用的空间索引呢?这些空间索引又该如何实现呢?带着这样的问题，下面介绍几种常用的空间索引。

MMSE Extension of V-BLAST Sorted QR Decompositionbased on

MMSE Extension of V-BLAST based on Sorted QR DecompositionDirk W¨u bben,Ronald B¨o hnke,V olker K¨u hn,and Karl-Dirk KammeyerDepartment of Communications EngineeringUniversity of BremenOtto-Hahn-Allee,D-28359Bremen,GermanyEmail:{wuebben,boehnke,kuehn,kammeyer}@ant.uni-bremen.deAbstract—In rich-scattering environments layered space-time architectures like the BLAST system may exploit the capacity ad-vantage of multiple antenna systems.In this paper,we present a novel,computationally efﬁcient algorithm for detecting V-BLAST architectures with respect to the MMSE criterion.It utilizes a sorted QR decomposition of the channel matrix and leads to a simple successive detection structure.The new algorithm needs only a fraction of computational effort compared to the standard V-BLAST algorithm and achieves the same error performance.Index Terms—BLAST,MIMO systems,Zero-Forcing and MMSE detection,wireless communication.I.I NTRODUCTIONIn rich-scattering environments the V-BLAST(Vertical Bell Labs Layered Space-Time)architecture proposed in[1]ex-ploits the capacity advantage of multiple antenna systems.It uses a vertically layered coding structure,where independent code blocks(called layers)are associated with a particular transmit antenna.At the receiver,these layers are detected by a successive interference cancellation technique which nulls the interferers by linearly weighting the received signal vector with a zero-forcing nulling vector(ZF-BLAST).This successive detection requires multiple calculations of pseudo-inverses,being a computational expensive task.A reduced complexity detection algorithm utilizing a sorted QR decom-position of the channel matrix was proposed by the authors in[2],[3].It jointly calculates an optimized detection order and the QR decomposition of the channel matrix and is called ZF-SQRD(ZF Sorted QR Decomposition).An adaption of the original ZF-BLAST to the MMSE criterion was presented in [4]and a version with lower complexity was introduced in[5]. In this paper,we extend the ZF-SQRD algorithm to the MMSE solution,called MMSE-SQRD.Similar to ZF-SQRD it does not alwaysﬁnd the optimal detection order and from there a performance degradation may occur.If this drawback is not acceptable for the speciﬁc application,a post-sorting algorithm (PSA)can be used,leading to the ideal detection sequence and thus to the performance of MMSE-BLAST.However, the combination of MMSE-SQRD and PSA requires only a fraction of computational effort compared to the BLAST detection algorithm.This work was supported in part by the German ministry of education and research(BMBF)under grant01BU153.The remainder of this paper is as follows.In Section II, the system model and notation is introduced.In order to simplify later derivation we recall the linear ZF and MMSE ﬁlter and introduce an extended system model in Section III. The detection of BLAST systems using the QR decomposition of the channel matrix is investigated in Section IV.The computational effort and the performance analysis are given in Section V and VI,respectively.Concluding remarks can be found in Section VII.II.S YSTEM DESCRIPTIONWe consider a multiple antenna system with n T transmit and n R≥n T receive antennas.The data is demultiplexed into n T data substreams of equal length(called layers).These substreams are optionally encoded by a convolutional code (CC),bit-interleaved,mapped onto M-PSK or M-QAM sym-bols s i and transmitted over the n T antennas simultaneously. For simplicity we will assume uncoded substreams for the derivation of the detection algorithms,but will investigate the performance of coded and uncoded systems in Section VI.Fig.1.Model of a MIMO system with n T transmit and n R receive antennas.In order to describe the MIMO system,one time slot of the time-discrete complex baseband model is investigated. Let1s=[s1...s n T]T denote the n T×1transmit signal vector,then the corresponding n R×1receive signal vector x=[x1...x nR]T is given byx=Hs+n.(1) In(1),n=[n1...n n R]T represents white gaussian noise of varianceσ2n observed at the n R receive antennas while 1Throughout this paper,(·)T and(·)H denote matrix transpose and hermitian transpose,respectively.Furthermore,Iαindicates theα×αidentity matrix and0α,βdenotes theα×βall zero matrix.the average transmit power of each antenna is normalized to one,i.e.E ssH =I n T and E nn H =σ2n I n R .The n R ×n T channel matrix H contains uncorrelated complex gaussian fading gains with unit variance.We assume a ﬂat fading environment,where the channel matrix H is constant over a frame and changes independently from frame to frame (block fading channel ).The distinct fading gains are assumed to be perfectly known by the receiver.In order to detect the transmitted information,it would be optimal to use a maximum-likelihood (ML)detector.As the computational effort is of order M n T ,ML detection is not feasible for larger number of transmit antennas or higher modulation schemes.Therefore,we present suboptimal detection schemes with reduced complexity in the following sections.III.L INEAR D ETECTIONIn this section we recall the linear detection with respect to the zero-forcing (ZF)and to the minimum-mean-square-error (MMSE)criterion.By introducing an extended system model,we show the similarity of both criteria.This analogy will play a key role for the introduction of the MMSE based QR detection algorithm in Section IV.A.Zero-Forcing Detector (ZF)In a linear detector,the receive signal vector x is multiplied with a ﬁlter matrix G ,followed by a parallel decision on all layers.Zero-forcing means that the mutual interference between the layers shall be perfectly suppressed.This is accomplished by the Moore-Penrose pseudo-inverse (denoted by (·)+)of the channel matrix [6]G ZF =H += H H H −1H H,(2)where we assumed that H has full column rank.The decisionstep consists of mapping each element of the ﬁlter output vector˜s ZF =G ZF x =H +x =s + H H H −1H Hn (3)onto an element of the symbol alphabet by a minimumdistance quantization.The estimation errors of the different layers correspond to the main diagonal elements of the error covariance matrixΦZF =E (˜s ZF −s )(˜s ZF −s )H =σ2n H H H −1(4)which equals the covariance matrix of the noise after the receive ﬁlter.It is obvious that small eigenvalues of H H H will lead to large errors due to noise ampliﬁcation.This effect is especially observed in systems with equal number of transmit and receive antennas.In fact,using a result from random matrix theory [7],it can be shown that in the large system limit for n T =n R →∞the noise ampliﬁcation tends to inﬁnity almost surely.In order to improve the performance the noise term can be included in the design of the ﬁlter matrix G .This is done by the MMSE detection scheme,where the ﬁlter represents a trade-off between noise ampliﬁcation and interference suppression.B.MMSE DetectorThe MMSE detector minimizes the mean squared error (MSE)between the actually transmitted symbols and the output of the linear detector and leads to the ﬁlter matrix [6]G MMSE = H H H +σ2n I n T −1H H .(5)The resulting ﬁlter output is given by˜s MMSE =G MMSE x = H H H +σ2nI n T −1H H x .(6)The estimation errors of the different layers correspond to themain diagonal elements of the error covariance matrixΦMMSE =E(˜s MMSE −s )(˜s MMSE −s )H (7)=σ2n H H H +σ2n I n T −1.With the deﬁnition of a (n T +n R )×n T extended channel matrixH and a (n T +n R )×1extended receive vector x throughH = H σn I n T and x = x0n T ,1,(8)the output of the MMSE ﬁlter given by (6)can be rewrittenas ˜s MMSE =H HH −1H H x =H +x .(9)Furthermore,the error covariance matrix (7)becomesΦMMSE =σ2n H H H−1=σ2n H +H +H.(10)Comparing (9)and (10)to the corresponding expression forlinear zero-forcing detector in (3)and (4),the only difference is that the channel matrix H has been replaced by H .This ob-servation is extremely important for incorporating the MMSE criterion into the SQRD based detection algorithm.IV.BLAST D ETECTIONThe V-BLAST detection algorithm [1]bases on the linear zero-forcing solution,but detects the signals one after another and not in parallel.In order to achieve the best performance,it is optimal to choose always the layer with the largest post detection signal-to-noise-ratio (SNR),or equivalently with the smallest estimation error.The adaptation to the MMSE criterion was presented in [4],[5],where the optimal sequence maximizes the signal-to-interference-and-noise ratio (SINR)in each detection step.The main drawback of the V-BLAST detection algorithms lies in the computational complexity,as it requires multiple calculations of the pseudo-inverse of the channel matrix [3].A.Zero-Forcing BLAST with QR DecompositionIt was shown in several publications,e.g.[2],[3],[8],that the ZF-BLAST algorithm can be restated in terms of the QR decomposition of the channel matrix H =QR ,where the n R ×n T matrix Q has orthogonal columns with unit norm and the n T ×n T matrix R is upper triangular.Multiplying the received signal x with Q H yields the sufﬁcient statistic˜s =Q H x =Rs +η(11)for the estimation of transmit vector s .As Q is an unitary matrix,the statistical properties of the noise term η=Q H n remain unchanged.Due to the upper triangular structure of R ,the k -th element of ˜s is given by˜s k =r k,k ·s k +n Ti =k +1r k,i ·s i +ηk(12)and is free of interference from layers 1,...,k −1.Thus,˜s n T is totally free of interference and can be used to estimate s n T after appropriate scaling with 1/r n T ,n T .Proceeding with˜s n T −1,...,˜s1and assuming correct previous decisions,the interference can be perfectly cancelled in each step.Then it follows from (12)that the SNR of layer k is determined by the diagonal element |r k,k |2.As already mentioned,the detection sequence is crucial due to the risk of error propagation.It can be modiﬁed by permuting elements of s and the corresponding columns of H prior to the QR decomposition,leading to different matrices Q and R [3].In order to ﬁnd the optimum sequence,|r k,k |,which represents the component of the column vector h k that is perpendicular to the space spanned by h 1,...,h k −1,needs to be maximized for k =n T ,...,1.This may be accom-plished in a straight forward way by performing O (n 2T /2)QR decompositions of permutations of H [9].The heuristic approach ZF-SQRD does not assure the optimal order and therefore leads to a small performance degradation but with only of a fraction of computational complexity [2],[3].After introducing the QR based MMSE detection,we will present the extension of ZF-SQRD to the MMSE criterion.B.MMSE QR DetectionIn order to extend the QR based detection with respect to the MMSE criterion,we can apply the similarity of ZF and MMSE detection noted in Section III-B.We introduce the QR decomposition of the extended channel matrix (8)H =H σn I n T =QR = Q 1Q 2 R = Q 1RQ 2R,(13)where the (n T +n R )×n T matrix Q with orthonormal columnswas partitioned into the n R ×n T Q 1and the n T ×n T matrix Q 2.Obviously,Q H H =Q H 1H +σn Q H2=R(14)holds and from the relation σn I n T =Q 2R it follows thatR −1=1σnQ 2,(15)i.e.the inverse R −1is a byproduct of the QR decomposition and Q 2is an upper triangular matrix.This relation will be useful for the post-sorting algorithm proposed in Section ing (15)and (14),the ﬁltered receive vector becomes˜s =Q H x =Q H 1x =Rs −σn Q H 2s +Q H1n .(16)The second term on the right hand side of (16)including the lower triangular matrix Q H 2constitutes the remaining interfer-ence that can not be removed by the successive interferencecancellation procedure.This points out the trade-off betweennoise ampliﬁcation and interference suppression.The optimum detection sequence now maximizes the signal-to-interference-and-noise ratio (SINR)for each layer,leading to minimal estimation error for the corresponding detection step.The estimation errors of the different layers in the ﬁrst detection step correspond to the diagonal elements of the error covariance matrix (10)Φ=σ2n H H H −1=σ2n R −1R −H.(17)The estimation error after perfect interference cancellationis given by σ2n/|r k,k |2.Thus,it is again optimal to choose the permutation that maximizes |r k,k |in each detection step.The algorithm proposed in the next section determines an optimized detection sequence within a single sorted QR de-composition and thereby signiﬁcantly reduces the computa-tional complexity in comparison to standard MMSE-BLAST algorithms.C.MMSE Sorted QR Decomposition (MMSE-SQRD)In order to obtain the optimal detection order,ﬁrst |r n T ,n T |has to be maximized over all possible permutations of the columns of the extended channel matrix H ,followed by |r n T −1,n T −1|,and so on.Unfortunately,using standard al-gorithms for the QR decomposition,the diagonal elements of R are calculated just in the opposite order,starting with r 1,1.This makes ﬁnding the optimal order of detection a difﬁcult task.A heuristic approach of arranging the order of detection into the QR decomposition for the ZF detection was presented in [2],[3].This sorted QR decomposition algorithm is basically an extension to the modiﬁed Gram-Schmidt procedure by reordering the columns of the channel matrix prior to each orthogonalization step.In the sequel we present an adapted version of this algorithm for MMSE detection.The fundamental idea is that |r k,k |is minimized in the order it is computed (1,...,n T )instead of being maximized in the order of detection (n T ,...,1).This is motivated by the fact that the layers detected last affect only few other layers through error propagation and may therefore have rather small SINR’s,which increases the probability of large SINR’s in the ﬁrst layers.Now,r 1,1is simply the norm of the column vector h 1,so the ﬁrst optimization in the SQRD algorithm consists merely of permuting the column of H with minimum norm to this position.During the following orthogonalization of the vectors h 2,...,h n T with respect to the normalized vector h 1,the ﬁrst row of R is obtained.Next,r 2,2is determined in a similar fashion from the remaining n T −1orthogonalized vectors,et cetera.Thereby,the extended channel matrix H is successively transformed into the matrix Q associated with the desired ordering,while the corresponding R is calculated row by row.Note that the column norms have to be calculated only once in the beginning and can be easily updated afterwards.Hence,the computational overhead due to sorting is negligible.An in-place-description of the whole MMSE-SQRD algorithmis given in Tab.1,with q i indicating column i of Q and vector p denoting the permutation of the columns of H .Tab.1MMSE-SQRD A LGORITHM(1)R =0,Q =H ,p =(1,...,n T )(2)for i =1,...,n T(3)norm i = q i 2(4)end(5)for i =1,...,n T(6)k i =arg min =i,...,n T norm(7)exchange columns i and k i in R ,p ,norm and in the ﬁrst n R +i −1rows of Q(8)r i,i =√norm i (9)q i :=q i /r i,i(10)for k =i +1,...,n T(11)r i,k =q H i·q k(12)q k :=q k −r i,k ·q i (13)norm k :=norm k −r 2i,k (14)end (15)endIt should be emphasized that MMSE-SQRD does not always lead to the perfect detection sequence,but in many cases of interest the performance degradation is small compared to the reduced complexity.Furthermore,whenever MMSE-SQRD fails to ﬁnd the optimal order,the post-sorting algorithm described in the sequel may be applied.It assures the optimal sorting and thereby achieves the same performance as MMSE-BLAST.D.Post-Sorting-Algorithm (PSA)In order to introduce the Post-Sorting-Algorithm (PSA),we investigate the structure of the error covariance matrix in case of optimal sorting in more detail.Due to the relation (15)the error covariance matrix (17)is given byΦ=Q 2Q H 2(18)and Q 2is a square root of Φ[5].As Q 2is upper triangular,the k -th diagonal element of Φis proportional to the norm of the k -th row of Q 2.Recalling the optimal ordering criterion,the last row of Q 2must have minimum norm of all rows.Assume that this condition is fulﬁlled,then the last row of the upper left n T −1×n T −1submatrix of Q 2must have minimum norm of all rows of this submatrix.In case of the correct sorting this condition is accomplished by all upper left submatrices.Now assume that this condition is not fulﬁlled for the matrix Q 2.Then the row with minimum norm and the last row (as well as the corresponding elements of p )need to be exchanged at the expense of destroying the upper triangular structure.However,by right multiplying the permuted version of Q 2with a proper unitary n T ×n T Householder reﬂection matrix 22TheHouseholder matrix for a 1×n row vector a with complex elementsis given by Θ=I n −(1+w )u H u with the deﬁnitions u =a − a en a − a e n,e n =[01,n −11]and w =ua H au H.Thus,aΘ=[01,n −1 a ]holds.Θ,a block triangular matrix is achieved.Finally,Q 1has to be updated to Q 1Θ.Instead of permuting columns of R and left multiplying with ΘH in each step,we can alternatively invertQ 2at the end of the PSA,due to the relation R =1/σn Q −12.These ordering and reﬂection steps are then iterated for the upper left (n T −1)×(n T −1)submatrix of the such modiﬁed matrix Q 2and the ﬁrst n T −1columns of the new matrix Q 1,resulting in the QR decomposition of the optimally ordered channel matrix H .The whole post-sorting algorithm is given in 3Tab.2.Tab.2P OST -S ORTING A LGORITHM(1)k min =n T(2)for i =n T ,...,2(3)for =1,...,i (4)error = Q 2( ,1:i ) 2(5)end (6)k i =arg min =1,...,i error (7)k min =min(k min ,k i )(8)if k i <i (9)exchange rows i and k i in Q 2and col.i and k i in p (10)end (11)if k min <i (12)calculate Householder reﬂector Θsuch that elementsof Q 2(i,k min :i −1)become zero(13)Q 2(1:i,k min :i ):=Q 2(1:i,k min :i )Θ(14)Q 1(:,k min :i ):=Q 1(:,k min :i )Θ(15)end (16)end(17)R =1/σn Q −12V.C OMPUTATIONAL E FFORTIn this section we investigate the computational effort of the proposed sorting algorithm.Therefore,the complex ﬂoating point operations (ﬂops)f are speciﬁed according to the number of transmit and receive antennas.For simplicity,we count each addition as one ﬂop and each multiplication as three ﬂops.The MMSE-SQRD requiresf SQRD =43n 3T +4n 2T n R +13n 2T +2n T n R +256n T ﬂops.The overhead in comparison to an unsorted QR decom-position is very small,only 2n 2T −2n T additional operations are necessary when the sorting steps are included in the decomposition.For the PSA the computational effort depends on the required number of permutations.We get an upper bound for the complexity by ignoring the upper triangular structure of Q 2.In this case,f PSA =143n 3T +4n 2Tn R +272n 2T +3n T n R +896n T −7n R −30complex ﬂoating point operations are necessary.The computa-tional effort of the Hassibi approach [5]can be approximated by f SQRD +f PSA .In case of MMSE-SQRD an optimized sorting is already given by the decomposition.Consequently,the PSA3A (a:b,c :d )denotes the submatrix of A with elements from rowsa,...,b and columns c,...,d .is only required in a fraction of all transmissions and therefore the complexity of the Hassibi approach serves as an upper bound for the expected overall complexity.Fig.2.Number of operations f in ﬂops for unsorted MMSE-QRD,MMSE-SQRD and the algorithm by Hassibi [5].Fig.2shows the required number of complex ﬂoating point operations for the unsorted MMSE-QR decomposition,the SQRD and the Hassibi approach for varying number of n T =n R antennas.Obviously the computational overhead of SQRD is extremely small and a signiﬁcant reduction in comparison to the worst-case can be observed.VI.P ERFORMANCE A NALYSISIn this section,we investigate the frame error rates (FER)for a MIMO system with n T =4transmit and n R =4receive antennas and QPSK modulation.We compare uncoded data streams and encoded streams,where the half rate (7,5)8convolutional code was used in each layer.E b denotes the average energy per information bit arriving at the receiver,thus E b /N 0=n R /(R c log 2(M )σ2n)holds.Fig.3shows the performance of MMSE-BLAST and MMSE-SQRD,bothFig.3.Frame Error Rate of a system with n T =4and n R =4antennas,frame length L =100,QPSK symbols,uncoded (continuous line)and convolutional encoded (dotted line)substreams.Comparing the simulation results of uncoded transmission,the successive detection schemes achieve an improved per-formance in comparison to the linear MMSE detector.Thestrong impact of ordering becomes obvious by comparing the unsorted (MMSE-QRD)and the schemes with optimized detection order.As the MMSE-SQRD does not assure the order,a performance gap between MMSE-BLAST and is observed.This gap is completely closed by the post-sorting algorithm,as MMSE-BLAST and ﬁnd the same detection sequence.However,for the coded system the performance loss of reduces to approximately 1dB for a FER of −3and is negligible for a FER of 10−2.On the other side,gain in comparison to the coded MMSE-QRD is enormous.observations can be explained in the following way.sorting maximizes the minimum SINR of all layers [8]and thereby the operation point of the convolutional code is achieved.As the effect of error propagation is reduced by the application of forward error correction codes,the inﬂuence of suboptimal sorting decreases.Thus,in many cases of interest,the MMSE-SQRD would be the ﬁrst choice for implementation due to the reduced complexity.VII.S UMMARY AND C ONCLUSIONSWe have proposed a new detection algorithm for V-BLAST systems with respect to the MMSE criterion.The algorithm utilizes an optimized QR decomposition of the channel matrix and leads to a simple successive detection.For those cases,where MMSE-SQRD does not ﬁnd the correct ordering,a reordering can easily be applied,thereby resulting in an optimum algorithm with reduced complexity.However,for coded transmission,the performance degradation of MMSE-SQRD compared to MMSE-BLAST is negligible.R EFERENCESP.W.Wolniansky,G.J.Foschini,G.D.Golden,and R.A.Valenzuela,“V-BLAST:An Architecture for Realizing Very High Data Rates Over the Rich-Scattering Wireless Channel,”in Proc.ISSE ,Pisa,Italy,September 1998.D.W¨u bben,R.B¨o hnke,J.Rinas,V .K¨u hn,and K. D.Kammeyer,“Efﬁcient Algorithm for Decoding Layered Space-Time Codes,”IEE Electronics Letters ,vol.37,no.22,pp.1348–1350,October 2001.D.W¨u bben,J.Rinas,R.B¨o hnke,V .K¨u hn,and K.D.Kammeyer,“Efﬁ-cient Algorithm for Detecting Layered Space-Time Codes,”in Proc.ITGConference on Source and Channel Coding ,Berlin,Germany,January 2002,pp.399–405.A.Benjebbour,H.Murata,and S.Yoshida,“Comparison of OrderedSuccessive Receivers for Space-Time Transmission,”in Proc.IEEE Vehicular Technology Conference (VTC),USA,Fall 2001.B.Hassibi,“An Efﬁcient Square-Root Algorithm for BLAST,”in Proc.IEEE Intl.Conf.Acoustic,Speech,Signal Processing ,Istanbul,Turkey,June 2000,pp.5–9.S.Verdu,Muliuser Detection ,2nd ed.Cambridge,U.K.:CambridgeUniversity Press,1998.J.Silverstein and Z.Bai,“On the Empirical Distribution of Eigenvalues ofa Class of Large Dimensional Random Matrices,”Journal of Multivariate Analysis ,vol.54,no.2,pp.175–192,1995.R.B¨o hnke, D.W¨u bben,V .K¨u hn,and K. D.Kammeyer,“ReducedComplexity MMSE Detection for BLAST Architectures,”in Proc.IEEE Global Communications Conference (Globecom’03),San Francisco,Cal-ifornia,USA,December 2003.[9]G.J.Foschini,G.D.Golden,A.Valenzela,and P.W.Wolniansky,“Sim-pliﬁed Processing for High Spectral Efﬁciency Wireless Communications Emplying Multi-Element Arrays,”IEEE Journal on Selected Areas in Commununications ,vol.17,no.11,pp.1841–1852,November 1999.。

一种存储复杂多边形包含关系的四叉树索引

一种存储复杂多边形包含关系的四叉树索引作者：汪红松周晓光来源：《湖南大学学报·自然科学版》2020年第04期摘要：地表覆盖/土地利用矢量数据中存在大量包含成千上万个空洞（甚至嵌套空洞）的复杂多边形，现有空间数据索引没有表达复杂多边形及其空洞之间的包含关系，导致空间数据冲突检测与更新等处理存在计算量大、效率低等问题. 针对此问题，提出了一种存储多边形包含关系的四叉树索引方法. 该方法根据结点中的多边形与四叉树相应象限中轴线相交的方式将多边形对象分为5种类型，即仅与X正轴相交、仅与X负轴相交、仅与Y正轴相交、仅与Y 负轴相交以及与XY轴都相交，并将这些多边形对象分别存储在相应层次索引结点中的5个子列表（桶）中，然后在结点多边形对象中存储多边形之间的父子包含关系. 最后设计并实现了该索引及相应的查询、插入、删除等算法，并用实际地表覆盖数据验证了本文方法的有效性. 实验结果表明，采用本文索引方法的复杂地表覆盖矢量数据增量更新效率数倍于现有四叉树索引方法，且随着数据量的增加效率提高更明显.关键词：空间索引;复杂多边形;包含关系;四叉树;空间数据管理中图分类号：P208 文献标志码：AAbstract：There are a large number of complex polygons containing thousands of holes （or even nested holes） in the land cover/land use vector data， and the existing spatial data indexing method has failed to indicate the inclusion relationship between complex polygons and their holes，resulting in computationally heavy and inefficient processing such as spatial data conflict detection and updating. In order to solve this problem， an improved quadtree spatial index method with inclusion relations of the complex polygons is presented in this paper. The method classifies the polygons in the nodes into five types according to the way they intersect the axes in the corresponding quadrant of the quadtree， i.e.， intersect only the X positive axis， intersect only the X negative axis， intersect only the Y positive axis， intersect only the Y negative axis， and intersect both Xand Y axes， and stores each of these polygons in five sublists （buckets） in the corresponding hierarchical index nodes， and then stores the parent-child inclusion relationship between the polygons in the node polygon objects. The authors developed the spatial index structure with inclusion relations and the algorithms of the corresponding operations（e.g.，insert， delete and query）for the complex polygons. The effectiveness of the approach in this paper is verified by an experiment of land cover data incremental updating， experimental results show that the time efficiency of the incremental updating is increased about several times using the proposed index method than that of the traditional quadtree index， and the improvement in efficiency is more significant with increasing data volume.Key words：spatial index;complex polygon;inclusion relation;quadtrees;spatial data management随着全球30 m地表覆盖地图GlobeLand30[1-2]和全球10 m地表覆盖数据FROM-GLC10[3]的完成与发布，全球地表覆盖数据已成为联合国等国际组织开展全球变化与可持续发展等重大科学研究的基础数据. 全球地表覆盖数据的验证、服务与持续更新成为本领域的研究热点[4-8].在全球地表覆盖矢量数据更新方面，周晓光等[7]提出了一种基于二维交细分拓扑关系的地表覆盖/土地利用数据增量更新方法，但由于地表覆盖矢量数据中存在大量包含成千上万个空洞（甚至嵌套空洞）的复杂多边形，目前空间数据模型中没有表达复杂多边形及其空洞之間的包含关系，导致在计算增量多边形与已存在的复杂多边形间二维交时存在计算量大、效率低等问题.GIS中空间数据处理一般包括过滤和精化计算两个步骤，空间数据索引用来过滤掉大部分无关的目标，使得精化计算仅在少数密切相关目标间进行的效率提升的特殊空间数据结构. 目前，空间数据索引方法包括传统矢量数据索引的四叉树[9-10]、R树[11-12]、R+树[13]、R*树[14-15]、网格索引[16]、Hilbert R树[17]等和轨迹数据索引Geohash-Trees[18]等. 上述传统矢量数据索引方法均采用目标的最小外接矩形（MBR）减小索引结构的存储量并提高过滤效率. 但是对于复杂多边形，现有索引方法仅存储了其外边界的MBR，不能表达复杂多边形及其空洞间的包含关系，在空间数据冲突检测与更新处理中，无关空洞不能通过索引而过滤掉，大量空洞需参与精化计算，是导致计算量大、效率低等问题的根本原因.图1所示为地表覆盖矢量数据的局部示例，图中C为一个包含上千个空洞的复杂多边形，图1（a）中B、 D、 E、 F等都是其空洞，其中阴影部分为其他图层图斑，在当前图层中为空白区域. P1为增量多边形（图1（b）），P1只与C和它的一个空洞多边形B和空白区域存在二维交，需要进行更新处理. 但采用现有索引方法，需要计算P1与C及其所有空洞多边形的拓扑关系，导致更新处理效率极低. 如果在索引结构中能够存储复杂多边形与其空洞间的包含关系，无关的空洞通过索引过滤掉，那么地表覆盖数据更新效率有望大大提高. 根据上述分析，本文提出一种存储复杂多边形包含关系的空间数据索引方法.地表覆盖矢量数据嵌套复杂，多边形MBR重叠严重，若构建索引R树结构，则索引性能不佳[15]，同时R树索引无法避免地重复存储空间对象，造成空间数据更新时存储的包含关系一致性维护困难. 处理面目标的四叉树结构主要有线性四叉树、PMR四叉树、CIF四叉树等结构[9，19-20]. 线性四叉树用自定义大小的网格映射空间目标[21]，由于地表覆盖矢量数据面积分布极度不均，难以选择大小合适的网格，同时索引中空间对象也无法避免重复存储;PMR四叉树索引以线段而非以面目标作为整体概念;CIF四叉树索引以分层的网格映射空间目标[22]，索引结构形态不依赖空间对象插入的顺序，不重复存储空间对象，同时空间数据更新时，结点变更较小[21-23]，本文在CIF四叉树基础上提出一种存储拓扑包含关系的四叉树空间索引方法.1 包含关系四叉树索引的建立1.1 多边形包含关系的表达复杂多边形与其空洞多边形之间的嵌套包含关系类似于父子关系，可通过父-子-孙间的序关系来表达多边形之间嵌套包含关系，如图2所示.图2中H的父多边形为F，F与H互为父多边形与子多边形，H与C不存在直接包含关系（H为C的孙子多边形）. 子多边形被包围在父多边形的相应内环（Ring in Parent，RIP）中，多边形F包含在C中的内环rc1中（即F的RIP为rc1），因此，内环是父多边形和子多边形之间的联系，内环嵌套体现多边形之间复杂包含关系. 一个多边形的内环可以被多个子多边形共享，F的内环rf2包含了T、J、I共 3个子多边形. 内环不是一个独立对象，因此不能直接建立内环对象与其所包围的子多边形对象的对应关系，但通过遍历内环所在多边形的子多边形，判断具有共同RIP的子多边形即可确定上述对应关系. 因此，一个多边形的直接包含关系可表达为：{父多边形，在父多边形中相应的环，包含的子多边形}.根据以上分析，设CP（Current Polygon）表示当前多边形，PP（Parent Polygon）为CP的父多边形，RIPID（Ring in Parent ID）为CP在PP中相应的内环序号，CPL（Children Polygon List ）为CP的子多边形指针数组，为每个内环建立子多边形列表，则四叉树结点中多边形对象的数据结构可表达为：{CP，PP，RIPID，CPL}.空間数据往往分图层构建，且具有铺盖特征，其中复杂多边形与其空洞多边形可能分别存储在不同图层中，导致一个图层中存在很多空白区域，如图1中的阴影部分（下同）. 当前图层只存储了空白区域的RIP，而无相应多边形对象. 由于空白区域的RIP不能独立存储为多边形对象，为完整表达该RIP相关的包含关系，本文引入内环虚拟多边形对象来填满复杂多边形内连续的空白区域，即内环虚拟多边形对象为一个复杂多边形内空洞边界构建的虚拟对象，其结构为{CP，PP，-RIPID，Ø}. 其中CP为内环虚拟多边形，PP为内环虚拟多边形的父多边形，-RIPID为该内环在PP中的序号（用负号来区别于实际存在的多边形）.为确定多边形间的包含关系，建立了如下4条判别规则. 设多边形P1、P2，若满足以下规则，则P1直接包含P2.规则1 P1有内环;规则2 P2的MBR被P1的MBR包含，同时与P1的某一内环r的MBR相等（如图2中F 与rc1）或相交（如图2中T与rf2）;规则3 P2上任取一顶点在r的环内或环上;规则4 不存在MBR小于P1的多边形包含P2.上述规则，每条都是建立在前一条规则的基础上. 其中规则2需要遍历P1的内环，对满足规则2的内环，再使用规则3判别. 若P2的子多边形均为简单多边形，则前3条规则可确定P1与P2的包含关系，否则利用规则4进一步约束，以排除嵌套的间接包含关系. 规则1判别多边形是否有内环的时间复杂度为常量O（1），若P1所有内环数的总边数为e，则规则2遍历P1的内环并判别MBR是否相交的主要开销为遍历P1内环计算MBR，其时间复杂度为GIS中空间数据处理一般包括过滤和精化计算两个步骤，空间数据索引用来过滤掉大部分无关的目标，使得精化计算仅在少数密切相关目标间进行的效率提升的特殊空间数据结构. 目前，空间数据索引方法包括传统矢量数据索引的四叉树[9-10]、R树[11-12]、R+树[13]、R*树[14-15]、网格索引[16]、Hilbert R树[17]等和轨迹数据索引Geohash-Trees[18]等. 上述传统矢量数据索引方法均采用目标的最小外接矩形（MBR）减小索引结构的存储量并提高过滤效率. 但是对于复杂多边形，现有索引方法仅存储了其外边界的MBR，不能表达复杂多边形及其空洞间的包含关系，在空间数据冲突检测与更新处理中，无关空洞不能通过索引而过滤掉，大量空洞需参与精化计算，是导致计算量大、效率低等问题的根本原因.图1所示为地表覆盖矢量数据的局部示例，图中C为一个包含上千个空洞的复杂多边形，图1（a）中B、 D、 E、 F等都是其空洞，其中阴影部分为其他图层图斑，在当前图层中为空白区域. P1为增量多边形（图1（b）），P1只与C和它的一个空洞多边形B和空白区域存在二维交，需要进行更新处理. 但采用现有索引方法，需要计算P1与C及其所有空洞多边形的拓扑关系，导致更新处理效率极低. 如果在索引结构中能够存储复杂多边形与其空洞间的包含关系，无关的空洞通过索引过滤掉，那么地表覆盖数据更新效率有望大大提高. 根据上述分析，本文提出一种存储复杂多边形包含关系的空间数据索引方法.地表覆盖矢量数据嵌套复杂，多边形MBR重叠严重，若构建索引R树结构，则索引性能不佳[15]，同时R树索引无法避免地重复存储空间对象，造成空间数据更新时存储的包含关系一致性维护困难. 处理面目标的四叉树结构主要有线性四叉树、PMR四叉树、CIF四叉树等结构[9，19-20]. 线性四叉树用自定义大小的网格映射空间目标[21]，由于地表覆盖矢量数据面积分布极度不均，难以选择大小合适的网格，同时索引中空间对象也无法避免重复存储;PMR四叉树索引以线段而非以面目标作为整体概念;CIF四叉树索引以分层的网格映射空间目标[22]，索引结构形态不依赖空间对象插入的顺序，不重复存储空间对象，同时空间数据更新时，结点变更较小[21-23]，本文在CIF四叉树基础上提出一种存储拓扑包含关系的四叉树空间索引方法.1 包含关系四叉树索引的建立1.1 多边形包含关系的表达复杂多边形与其空洞多边形之间的嵌套包含关系类似于父子关系，可通过父-子-孙间的序关系来表达多边形之间嵌套包含关系，如图2所示.图2中H的父多边形为F，F与H互为父多边形与子多边形，H与C不存在直接包含关系（H为C的孙子多边形）. 子多边形被包围在父多边形的相应内环（Ring in Parent，RIP）中，多边形F包含在C中的内环rc1中（即F的RIP为rc1），因此，内环是父多边形和子多边形之间的联系，内环嵌套体现多边形之间复杂包含关系. 一个多边形的内环可以被多个子多边形共享，F的内环rf2包含了T、J、I共 3个子多边形. 内环不是一个独立对象，因此不能直接建立内环对象与其所包围的子多边形对象的对应关系，但通过遍历内环所在多边形的子多边形，判断具有共同RIP的子多边形即可确定上述对应关系. 因此，一个多边形的直接包含关系可表达为：{父多边形，在父多边形中相应的环，包含的子多边形}.根据以上分析，设CP（Current Polygon）表示当前多边形，PP（Parent Polygon）为CP的父多边形，RIPID（Ring in Parent ID）为CP在PP中相应的内环序号，CPL（Children Polygon List ）为CP的子多边形指针数组，为每个内环建立子多边形列表，则四叉树结点中多边形对象的数据结构可表达为：{CP，PP，RIPID，CPL}.空间数据往往分图层构建，且具有铺盖特征，其中复杂多边形与其空洞多边形可能分别存储在不同图层中，导致一个图层中存在很多空白区域，如图1中的阴影部分（下同）. 当前图层只存储了空白区域的RIP，而无相应多边形对象. 由于空白区域的RIP不能独立存储为多边形对象，为完整表达该RIP相关的包含关系，本文引入内环虚拟多边形对象来填满复杂多边形内连续的空白区域，即内环虚拟多边形对象为一个复杂多边形内空洞边界构建的虚拟对象，其结构为{CP，PP，-RIPID，Ø}. 其中CP为内环虚拟多边形，PP为内环虚拟多边形的父多边形，-RIPID为该内环在PP中的序号（用负号来区别于实际存在的多边形）.为确定多边形间的包含关系，建立了如下4条判别规则. 设多边形P1、P2，若满足以下规则，则P1直接包含P2.规则1 P1有内环;规则2 P2的MBR被P1的MBR包含，同时与P1的某一内环r的MBR相等（如图2中F 与rc1）或相交（如图2中T与rf2）;规则3 P2上任取一顶点在r的环内或环上;规则4 不存在MBR小于P1的多边形包含P2.上述规则，每条都是建立在前一条规则的基础上. 其中规则2需要遍历P1的内环，对满足规则2的内环，再使用规則3判别. 若P2的子多边形均为简单多边形，则前3条规则可确定P1与P2的包含关系，否则利用规则4进一步约束，以排除嵌套的间接包含关系. 规则1判别多边形是否有内环的时间复杂度为常量O（1），若P1所有内环数的总边数为e，则规则2遍历P1的内环并判别MBR是否相交的主要开销为遍历P1内环计算MBR，其时间复杂度为GIS中空间数据处理一般包括过滤和精化计算两个步骤，空间数据索引用来过滤掉大部分无关的目标，使得精化计算仅在少数密切相关目标间进行的效率提升的特殊空间数据结构. 目前，空间数据索引方法包括传统矢量数据索引的四叉树[9-10]、R树[11-12]、R+树[13]、R*树[14-15]、网格索引[16]、Hilbert R树[17]等和轨迹数据索引Geohash-Trees[18]等. 上述传统矢量数据索引方法均采用目标的最小外接矩形（MBR）减小索引结构的存储量并提高过滤效率. 但是对于复杂多边形，现有索引方法仅存储了其外边界的MBR，不能表达复杂多边形及其空洞间的包含关系，在空间数据冲突检测与更新处理中，无关空洞不能通过索引而过滤掉，大量空洞需参与精化计算，是导致计算量大、效率低等问题的根本原因.图1所示为地表覆盖矢量数据的局部示例，图中C为一个包含上千个空洞的复杂多边形，图1（a）中B、 D、 E、 F等都是其空洞，其中阴影部分为其他图层图斑，在当前图层中为空白区域. P1为增量多边形（图1（b）），P1只与C和它的一个空洞多边形B和空白区域存在二维交，需要进行更新处理. 但采用现有索引方法，需要计算P1与C及其所有空洞多边形的拓扑关系，导致更新处理效率极低. 如果在索引结构中能够存储复杂多边形与其空洞间的包含关系，无关的空洞通过索引过滤掉，那么地表覆盖数据更新效率有望大大提高. 根据上述分析，本文提出一种存储复杂多边形包含关系的空间数据索引方法.地表覆盖矢量数据嵌套复杂，多边形MBR重叠严重，若构建索引R树结构，则索引性能不佳[15]，同时R树索引无法避免地重复存储空间对象，造成空间数据更新时存储的包含关系一致性维护困难. 处理面目标的四叉树结构主要有线性四叉树、PMR四叉树、CIF四叉树等结构[9，19-20]. 线性四叉树用自定义大小的网格映射空间目标[21]，由于地表覆盖矢量数据面积分布极度不均，难以选择大小合适的网格，同时索引中空间对象也无法避免重复存储;PMR四叉树索引以线段而非以面目标作为整体概念;CIF四叉树索引以分层的网格映射空间目标[22]，索引结构形态不依赖空间对象插入的顺序，不重复存储空间对象，同时空间数据更新时，结点变更较小[21-23]，本文在CIF四叉树基础上提出一种存储拓扑包含关系的四叉树空间索引方法.1 包含关系四叉树索引的建立1.1 多边形包含关系的表达復杂多边形与其空洞多边形之间的嵌套包含关系类似于父子关系，可通过父-子-孙间的序关系来表达多边形之间嵌套包含关系，如图2所示.图2中H的父多边形为F，F与H互为父多边形与子多边形，H与C不存在直接包含关系（H为C的孙子多边形）. 子多边形被包围在父多边形的相应内环（Ring in Parent，RIP）中，多边形F包含在C中的内环rc1中（即F的RIP为rc1），因此，内环是父多边形和子多边形之间的联系，内环嵌套体现多边形之间复杂包含关系. 一个多边形的内环可以被多个子多边形共享，F的内环rf2包含了T、J、I共 3个子多边形. 内环不是一个独立对象，因此不能直接建立内环对象与其所包围的子多边形对象的对应关系，但通过遍历内环所在多边形的子多边形，判断具有共同RIP的子多边形即可确定上述对应关系. 因此，一个多边形的直接包含关系可表达为：{父多边形，在父多边形中相应的环，包含的子多边形}.根据以上分析，设CP（Current Polygon）表示当前多边形，PP（Parent Polygon）为CP的父多边形，RIPID（Ring in Parent ID）为CP在PP中相应的内环序号，CPL（Children Polygon List ）为CP的子多边形指针数组，为每个内环建立子多边形列表，则四叉树结点中多边形对象的数据结构可表达为：{CP，PP，RIPID，CPL}.空间数据往往分图层构建，且具有铺盖特征，其中复杂多边形与其空洞多边形可能分别存储在不同图层中，导致一个图层中存在很多空白区域，如图1中的阴影部分（下同）. 当前图层只存储了空白区域的RIP，而无相应多边形对象. 由于空白区域的RIP不能独立存储为多边形对象，为完整表达该RIP相关的包含关系，本文引入内环虚拟多边形对象来填满复杂多边形内连续的空白区域，即内环虚拟多边形对象为一个复杂多边形内空洞边界构建的虚拟对象，其结构为{CP，PP，-RIPID，Ø}. 其中CP为内环虚拟多边形，PP为内环虚拟多边形的父多边形，-RIPID为该内环在PP中的序号（用负号来区别于实际存在的多边形）.为确定多边形间的包含关系，建立了如下4条判别规则. 设多边形P1、P2，若满足以下规则，则P1直接包含P2.规则1 P1有内环;规则2 P2的MBR被P1的MBR包含，同时与P1的某一内环r的MBR相等（如图2中F 与rc1）或相交（如图2中T与rf2）;规则3 P2上任取一顶点在r的环内或环上;规则4 不存在MBR小于P1的多边形包含P2.上述规则，每条都是建立在前一条规则的基础上. 其中规则2需要遍历P1的内环，对满足规则2的内环，再使用规则3判别. 若P2的子多边形均为简单多边形，则前3条规则可确定P1与P2的包含关系，否则利用规则4进一步约束，以排除嵌套的间接包含关系. 规则1判别多边形是否有内环的时间复杂度为常量O（1），若P1所有内环数的总边数为e，则规则2遍历P1的内环并判别MBR是否相交的主要开销为遍历P1内环计算MBR，其时间复杂度为GIS中空间数据处理一般包括过滤和精化计算两个步骤，空间数据索引用来过滤掉大部分无关的目标，使得精化计算仅在少数密切相关目标间进行的效率提升的特殊空间数据结构. 目前，空间数据索引方法包括传统矢量数据索引的四叉树[9-10]、R树[11-12]、R+树[13]、R*树[14-15]、网格索引[16]、Hilbert R树[17]等和轨迹数据索引Geohash-Trees[18]等. 上述传统矢量数据索引方法均采用目标的最小外接矩形（MBR）减小索引结构的存储量并提高过滤效率. 但是对于复杂多边形，现有索引方法仅存储了其外边界的MBR，不能表达复杂多边形及其空洞间的包含关系，在空间数据冲突检测与更新处理中，无关空洞不能通过索引而过滤掉，大量空洞需参与精化计算，是导致计算量大、效率低等问题的根本原因.图1所示为地表覆盖矢量数据的局部示例，图中C为一个包含上千个空洞的复杂多边形，图1（a）中B、 D、 E、 F等都是其空洞，其中阴影部分为其他图层图斑，在当前图层中为空白区域. P1为增量多边形（图1（b）），P1只与C和它的一个空洞多边形B和空白区域存在二维交，需要进行更新处理. 但采用现有索引方法，需要计算P1与C及其所有空洞多边形的拓扑关系，导致更新处理效率极低. 如果在索引结构中能够存储复杂多边形与其空洞间的包含关系，无关的空洞通过索引过滤掉，那么地表覆盖数据更新效率有望大大提高. 根据上述分析，本文提出一种存储复杂多边形包含关系的空间数据索引方法.地表覆盖矢量数据嵌套复杂，多边形MBR重叠严重，若构建索引R树结构，则索引性能不佳[15]，同时R树索引无法避免地重复存储空间对象，造成空间数据更新时存储的包含关系一致性维护困难. 处理面目标的四叉树结构主要有线性四叉树、PMR四叉树、CIF四叉树等结构[9，19-20]. 线性四叉树用自定义大小的网格映射空间目标[21]，由于地表覆盖矢量数据面积分布极度不均，难以选择大小合适的网格，同时索引中空间对象也无法避免重复存储;PMR四叉树索引以线段而非以面目标作为整体概念;CIF四叉树索引以分层的网格映射空间目标[22]，索引结构形态不依赖空间对象插入的顺序，不重复存储空间对象，同时空间数据更新时，结点变更较小[21-23]，本文在CIF四叉树基础上提出一种存储拓扑包含关系的四叉树空间索引方法.1 包含关系四叉树索引的建立1.1 多边形包含关系的表达复杂多边形与其空洞多边形之间的嵌套包含关系类似于父子关系，可通过父-子-孙间的序关系来表达多边形之间嵌套包含关系，如图2所示.图2中H的父多边形为F，F与H互为父多边形与子多边形，H与C不存在直接包含关系（H为C的孙子多边形）. 子多边形被包围在父多边形的相应内环（Ring in Parent，RIP）中，多边形F包含在C中的内环rc1中（即F的RIP为rc1），因此，内环是父多边形和子多边形之间的联系，内环嵌套体现多边形之间复杂包含关系. 一个多边形的内环可以被多个子多边形共享，F的内环rf2包含了T、J、I共 3个子多边形. 内环不是一个独立对象，因此不能直接建立内环对象与其所包围的子多边形对象的对应关系，但通过遍历内环所在多边形的子多边形，判断具有共同RIP的子多边形即可确定上述对应关系. 因此，一个多边形的直接包含关系可表达为：{父多边形，在父多边形中相应的环，包含的子多边形}.根据以上分析，设CP（Current Polygon）表示当前多边形，PP（Parent Polygon）为CP的父多边形，RIPID（Ring in Parent ID）为CP在PP中相应的内环序号，CPL（Children Polygon List ）为CP的子多边形指针数组，为每个内环建立子多边形列表，则四叉树结点中多边形对象的数据结构可表达为：{CP，PP，RIPID，CPL}.空间数据往往分图层构建，且具有铺盖特征，其中复杂多边形与其空洞多边形可能分别存储在不同图层中，导致一个图层中存在很多空白区域，如图1中的阴影部分（下同）. 当前图层只存储了空白区域的RIP，而无相应多边形对象. 由于空白区域的RIP不能独立存储为多边形对象，为完整表达该RIP相关的包含关系，本文引入内环虚拟多边形对象来填满复杂多边形内连续的空白区域，即内环虚拟多边形对象为一个复杂多边形内空洞边界构建的虚拟对象，。

希尔伯特曲线空间索引

希尔伯特曲线空间索引希尔伯特曲线是一种用于空间索引的曲线。

它是由德国数学家David Hilbert在20世纪初提出的，并被广泛应用于计算机科学领域。

希尔伯特曲线具有压缩和空间局部性等优点，适合用于多维空间中的数据索引和查询。

希尔伯特曲线是一条连续的曲线，被用于将多维空间的坐标映射到一维空间中。

这种映射方式使得相邻的数据在一维空间中的位置尽可能接近，从而提高了数据的局部性。

希尔伯特曲线的构建是通过重复应用一种特定的模式来完成的。

具体来说，希尔伯特曲线是通过将二维平面中的点映射到一维空间中的一条曲线上。

在构造过程中，将平面分成四个等分，并按照特定的顺序连接这四个小块，形成一条分形曲线。

然后，再将每个小块按照同样的方式划分，重复上述过程，直到达到所需的精度。

通过这种方式，平面中的点可以被映射到曲线上，并保持它们在曲线中的相对邻近性。

希尔伯特曲线的具体构造方式可以通过迭代算法来实现。

在每一次迭代中，需要将平面分成四个等分，并根据特定的连接顺序将这四个小块连接起来。

通常，这种连接顺序可以由一个二进制编码来表示，其中每一位表示用于连接的小块的位置。

一旦构建完成了希尔伯特曲线，就可以将多维空间中的数据点映射到曲线上。

这种映射方式可以用于索引和查询多维空间中的数据。

例如，在二维空间中，可以将每个数据点的坐标映射到希尔伯特曲线上，并使用曲线上的位置来代表该数据点。

这样，相邻的数据点在曲线上也会相互靠近，从而提高查询效率。

希尔伯特曲线在计算机科学领域有广泛的应用。

一方面，它被用于提高空间数据的存储和查询效率。

例如，在地理信息系统中，可以使用希尔伯特曲线对地理空间数据进行索引，从而快速地查询特定区域内的数据。

另一方面，希尔伯特曲线也可以用于数据压缩和图像处理等领域。

通过将二维空间中的数据点映射到一维空间中，可以减少数据的维度，并提高处理效率。

总而言之，希尔伯特曲线是一种用于空间索引的有效工具。

它能够将多维空间中的数据点映射到一维空间中的曲线上，并保持它们在曲线上的相邻性。

GoogleS2，球面几何，希尔伯特曲线

GoogleS2，球⾯⼏何，希尔伯特曲线⾸发于/s2-geometry-sphere-cells-hilbert-curve/–翻译⾃Google’s S2, geometry on the sphere, cells and Hilbert curve–Google S2 库是个珍宝，不仅因为它在空间索引⽅⾯的优秀表现，也因为它已经诞⽣4年多却没有受到应有的重视。

S2库被⽤在Google Map、MongoDB、Foursquare上；但除了Foursquare的⼀篇论⽂、Google的幻灯⽚以及源代码的注释，你不能找到任何相关⽂章或⽂档。

你也许在努⼒的寻找S2的bingding，但官⽅代码库已经丢失了Python库的Swig⽂件，感谢⼀些fork使我们还能获取Python的⼀部分binging。

据说最近Google正积极的对S2进⾏开发，也许不久我们就能获得这个库更详细的信息，但我决定分享⼀些使⽤该库的样例，还有该库这么酷的原因。

了解cell你会在整个S2代码⾥⾯看到cell的概念。

Cell是球⾯（对我们来说是地球，但不局限于此）层次分解之后对region和point的紧凑的表⽰。

Region也可以使⽤同样的Cell近似表⽰，这种Cell有不少优秀的属性：特别紧凑（由64-bit整数表⽰）具有地理特性上的解决⽅案（译者注：resolution for geographical features）分层的（具有不同level，相似level含有相似的范围）对任意region的包含查询⾮常快⾸先，S2将球⾯上的point/region投影到⽴⽅体上，⽴⽅体的每个⾯都有⼀棵四叉树，球⾯上的点就投影在这棵四叉树上。

然后，进⾏⼀些转换（详细原因查看Google的幻灯⽚）将空间离散。

接着，Cell被映射在希尔伯特曲线上，这也是S2如此优秀的原因。

希尔伯特曲线是⼀种空间填充曲线，它将多维转为⼀维，并拥有特殊的空间特征：含有局域性信息。

希尔伯特曲线的构造算法

希尔伯特曲线的构造算法
希尔伯特曲线是一种空间填充曲线，可以用来将二维平面上的点按照一定的顺序连接起来，使得相邻的点在曲线上也是相邻的。

希尔伯特曲线的构造算法如下：
1.选择一个正整数n，确定曲线的阶数为2^n。

阶数表示曲线
所占用的空间大小，例如n=2时，阶数为4，曲线将以4*4的
方格为基础进行构造。

2.根据阶数，初始化一个二维数组，用于表示空间格点的相对
顺序关系。

数组的大小为2^n * 2^n。

3.递归地进行构造，直到阶数为1时结束。

递归的过程如下：
- 对于当前阶数的曲线，将其划分为四个区域，分别为左上、右上、右下、左下。

- 依次遍历四个区域，并按照逆时针的顺序依次连接相应的点，连接方式为水平线、竖直线、曲线，曲线方向为先上半部分再下半部分。

- 对于每个区域，将其内部的阶数减一，然后递归调用构造
算法，构造该区域的曲线。

4.最终得到的曲线就是希尔伯特曲线。

构造算法的时间复杂度为O(4^n)，其中n为阶数的对数。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

超酷算法：用四叉树和希尔伯特曲线做空间索引阅读·四叉树, 希尔伯特曲线, 空间索引, 算法∙Avalon探索之旅基础教程---- 简单绑定∙Gopher China 2015 上海大会∙Android必学-异步加载∙Android必学-BaseAdapter的使用与优化本文由伯乐在线 - demoZ翻译，黄利民校稿。

未经许可，禁止转载！英文出处：。

欢迎加入翻译组。

随着越来越多的数据和应用和地理空间相关，空间索引变得愈加重要。

然而，有效地查询地理空间数据是相当大的挑战，因为数据是二维的（有时候更高），不能用标准的索引技术来查询位置。

空间索引通过各种各样的技术来解决这个问题。

在这篇博文中，我将介绍几种：四叉树，geohash（不要和geohashing混淆）以及空间填充曲线，并揭示它们是怎样相互关联的。

四叉树四叉树是种很直接的空间索引技术。

在四叉树中，每个节点表示覆盖了部分进行索引的空间的边界框，根节点覆盖了整个区域。

每个节点要么是叶节点，有包含一个或多个索引点的列表，没有孩子。

要么是内部节点，有四个孩子，每个孩子对应将区域沿两根轴对半分得到的四个象限中的一个，四叉树也因此得名。

图1 展示四叉树是怎样划分索引区域的来源：维基百科将数据插入四叉树很简单：从根节点开始，判断你的数据点属于哪个象限。

递归到相应的节点，重复步骤，直到到达叶节点，然后将该点加入节点的索引点列表中。

如果列表中的元素个数超出了预设的最大数目，则将节点分裂，将其中的索引点移动到相应的子节点中去。

图2 四叉树的内部结构查询四叉树时从根节点开始，检查每个子节点看是否与查询的区域相交。

如果是，则递归进入该子节点。

当到达叶节点时，检查点列表中的每一个项看是否与查询区域相交，如果是则返回此项。

注意四叉树是非常规则的，事实上它是一种字典树，因为树节点的值不依赖于插入的数据。

因此我们可以用直接的方式给节点编号：用二进制给每个象限编号（左上是00，右上是10等等译者注：第一个比特位为0表示在左半平面，为1在右半平面。

第二个比特位为0表示在上半平面，为1在下半平面），任一节点的编号是由从根开始，它的各祖先的象限号码串接而成的。

在这个编号系统中，图2中右下角节点的编号是1101。

如果我们定义了树的最大深度，不需通过树就可以计算数据点所在节点的编号：只要把节点的坐标标准化到适当的整数区间中（比如32位整数），然后把转化后x, y坐标的比特位交错组合。

每对比特指定了假想的四叉树中的一个象限。

（译者注：不了解的读者可看看Z-order，它和下文的希尔伯特曲线都是将二维的点映射到一维的方法）Geohash上述编号系统可能看起来有些熟悉，没错，就是geohash！此刻，你可以把四叉树扔掉了。

节点编号，或者说geohash，包含了对于节点在树中位置我们需要的全部信息。

全高树中的每个叶节点是个完整的geohash，每个内部节点代表从它最小的叶节点到最大的叶节点的区间。

因此，通过查询所需的节点覆盖的数值区间中的一切（在geohash上索引），你可以有效地定位任意内部节点下的所有数据点。

一旦我们丢掉了四叉树，查询就变得复杂一点了。

我们需要事先构建搜索集合而不是在树中递归地精炼搜索集合。

首先，找到完全覆盖查询区域的最小前缀（或者说四叉树节点译者注：注意在我们的编号系统中节点由比特串表示）。

在最坏情况下，这可能远大于实际的查询区域，比如对于在索引区域中心、和四个象限都相交的小块地方，查询将要从根节点开始。

现在的目标是构建一组完全包含查询区域的前缀，并且尽可能少包含区域外的部分。

如果没有其他约束，我们可以简单地选择与查询区域相交的叶节点，但这会造成大量的查询。

所以要加一个约束：使得要查询的不同区间最少。

一种达到这个目的的方法是先设置我们愿意承受的查询区间的最大数目。

构建一组区间，最开始都设为我们之前指定的前缀。

从中选择可以再分裂而不超出最大区间数并将从查询区域删除最不受欢迎区域的节点。

重复这个过程直到集合中再没有区间可以细分。

最后，检查得到的集合，如果可能的话合并相邻的区间。

下面的图说明了这对于查询一个圆形区域且限制最大5个查询区间是如何工作的。

图3 一个对区域的查询是怎样分解成一连串geohash前缀/区间的这个方法工作地很好，它使我们避免了递归查找。

我们执行的一整套区间查找都可以并行完成。

由于每次查找都预期要一次硬盘搜索，将查询并行化大大减少了返回结果需要的时间。

然而，我们还可以做得更好。

你可能注意到上图中我们要查询的所有区域都是相邻的，但我们却只能将其中两个合并（选择区域的右下角的两个）成一个单独的查询，进而只要4次单独查询。

（译者注：这两个区域可以合并是因为它们在geohash 以Z字形遍历区域的路径上是相邻的）这个后果部分是由于geohash访问子区域的顺序，在每个象限中从左到右，从上到下。

从右上角象限到左下角象限的不连续性使得我们不得不将本可以使之连续的区间分裂。

如果以不同的顺序访问区域，可能我们就可以最小化或者消除这些不连续性，使得更多的区域可以被看做是相邻的，一次查询就可得到结果。

通过这样效率上的提升，对于同样的覆盖区域，我们可以做更少的查询，或者相反地，同样的查询次数的情况下包含更少的无关区域。

图4 geohash访问象限的顺序希尔伯特曲线现在假设我们以U字形来访问区域。

在每个象限中，我们同样以U字形来访问子象限，但是要调整好U字形的朝向使得和相邻的象限衔接起来。

如果我们正确地组织了这些U字形的朝向，我们就能完全消除不连续性，不管我们选择了什么分辨率，都能连续地访问整个区域，可以在完全地探访了一个区域后才移动到下一个。

这个方案不仅消除了不连续性，而且提高了总体的局域性。

按照这个方案得到的图案看起来有些熟悉，没错，就是希尔伯特曲线。

希尔伯特曲线属于一类被称为空间填充曲线的一维分形，因为它们虽然是一维的线，却可以填充固定区域的所有空间。

它们相当有名，部分是由于XKCD把它们用于互联网地图。

如你所见，对于空间索引它们也是有用的，因为它们展现的正是我们需要的局域性和连续性。

再看看之前用一组查询来覆盖圆的例子，我们发现（应用希尔伯特曲线）还可以减少一次查询：左下方的小区域现在和它右边的区域连起来了（减少一次），虽然底部的两块区域不再连续了（增加一次），右下角的区域现在却和它上方的连续了（减少一次）。

图5 希尔伯特曲线访问象限的顺序到目前为止，我们优雅的系统还缺一样东西：将(x,y)坐标转换为希尔伯特曲线上相应位置的方法。

对于geohash，这是简单而明显的–只需将x, y坐标交错，但没有明显的方法修改这个方案使之对希尔伯特曲线也适用。

在网上搜索，你很可能遇到很多关于希尔伯特曲线是怎样画出来的描述，但很少有关于找到任意点（在曲线上）位置的。

为了搞定它，我们需要更仔细看看希尔伯特曲线是怎么递归构建的。

首先要注意到虽然大多数关于希尔伯特曲线的文献都关注曲线是怎么画出来的，却容易让我们忽略曲线的本质属性以及其重要性：曲线规定了平面上点的顺序。

如果我们用这顺序来表达希尔伯特曲线，画曲线就不值一提了：仅仅是把点连起来。

忘记怎么把子曲线连起来吧，把注意力集中在怎么递归地列举点上。

图6 希尔伯特曲线规定了二维平面上的点的顺序在根这一层，列举点很简单：选定一个方向和一个起始点，环绕四个象限，用0到3给他们编号。

当我们要确定访问子象限的顺序同时维护总体的邻接属性，困难就来了。

通过检查我们发现，子象限的曲线是原曲线的简单变换，而且只有四种变换。

自然地，这个结论也适用于子子象限，等等。

对于一个给定的象限，我们在其中画出的曲线是由象限所在大的方形的曲线以及该象限的位置决定的。

只需要费一点力，我们就能构建出如下概况所有情况的表。

图7假设我们想用这个表来确定某个点在第三层希尔伯特曲线上的位置。

在这个例子中，假设点的坐标是(5,2)。

（译者注：请参照图8）从上图的第一个方形开始，找到你的点所在的象限。

在这个例子中，是在右上方的象限。

那么点在希尔伯特曲线上的位置的第一部分是3（二进制是11）。

接着我们进入象限3里面的方块，在这个例子中，它是（图7中的）第二个方块。

重复刚才的过程：我们的点落在哪个子象限？这次是左下角，意味着位置的下一部分是1（二进制01），我们将进入的小方块又是第二个。

最后一次重复这个过程，发现点落在右上角的子子象限，因此位置的最后部分是3（二进制11）。

把这些位置连接起来，我们得到点在曲线上的位置是二进制的110111，或者十进制的55。

图8 三阶希尔伯特曲线让我们更系统一些，写出从x, y 坐标到希尔伯特曲线位置转换的方法。

首先，我们要以计算机看得懂的形式表达图7：1 2 3 hilbert_map = {'a': {(0, 0): (0, 'd'), (0, 1): (1, 'a'), (1, 0): (3, 'b'), (1, 1): (2, 'a')}, 'b': {(0, 0): (2, 'b'), (0, 1): (1, 'b'), (1, 0): (3, 'a'), (1, 1): (0, 'c')},4 5 'c': {(0, 0): (2, 'c'), (0, 1): (3, 'd'), (1, 0): (1, 'c'), (1, 1): (0, 'b')}, 'd': {(0, 0): (0, 'a'), (0, 1): (3, 'c'), (1, 0): (1, 'd'), (1, 1): (2, 'd')}}上面的代码中，每个hilbert_map的元素对应图7四个方形中的一个。

为了容易区分，我用一个字母来标识每个方块：’a'是第一个方块，’b'是第二个，等等。

每个方块的值是个字典，将(子)象限的x, y坐标映射到曲线上的位置（元组值的第一部分）以及下一个用到的方块（元组值的第二部分）。

下面的代码展示了怎么用这个来将x, y坐标转换成希尔伯特曲线上的位置：1 2 3 4 5 6 7 8 9 10 def point_to_hilbert(x, y, order=16):current_square = 'a'position = 0for i in range(order - 1, -1, -1):position <<= 2quad_x = 1 if x & (1 << i) else 0quad_y = 1 if y & (1 << i) else 0quad_position, current_square = hilbert_map[current_square][(quad_x, quad_y)] position |= quad_positionreturn position函数的输入是为整数的x, y坐标和曲线的阶。

用四叉树和希尔伯特曲线做空间索引

一种存储复杂多边形包含关系的四叉树索引

Hilbert曲线编码

空间索引使用的意义及网格索引和四叉树索引简单介绍 转

MMSE Extension of V-BLAST Sorted QR Decompositionbased on

一种存储复杂多边形包含关系的四叉树索引

希尔伯特曲线 空间索引

GoogleS2，球面几何，希尔伯特曲线

希尔伯特曲线的构造算法

空间索引使用的意义及网格索引和四叉树索引简单介绍转

希尔伯特曲线空间索引