GIS应用Chapter 4

合集下载

《GIS应用案例》课件

详细描述
GIS通过与大数据技术结合，可以实现更高效的数据处理、分析和可视化，为各行业提供更精准、全面的地理信息服务。例如，通过大数据分析城市交通流量，优化城市交通布局；利用大数据监测自然灾害，提高灾害应对能力等。
GIS与人工智能的结合
总结词
人工智能技术的进步为GIS的发展提供了新的机遇，两者的结合将进一步拓展GIS的应用领域。
GIS用于城市规划、交通管理、公共设施布局等方面，提高城市管理的科学性和效率。
公共安全与应急响应
GIS用于犯罪分析、灾害救援、应急响应等方面，提高公共安全和应急响应能力。
02
GIS应用案例展示
Chapter
城市规划
城市规划决策支持
GIS技术可以提供可视化分析和空间决策支持，帮助规划师更好地理解城市空间结构和功能布局，提高规划的科学性和合理性。
旅游路线规划
GIS可以根据游客需求和偏好，提供个性化的旅游路线规划和推荐，提高游客的旅游体验和满意度。
旅游应急管理
GIS可以实时监测旅游区的安全状况，及时发现和处理旅游安全问题，保障游客的安全和权益。
03
GIS未来发展趋势
Chapter
GIS与大数据的融合
总结词
随着大数据技术的不断发展，GIS与大数据的融合成为未来发展的重要趋势。
GIS技术的前景与挑战
技术发展
介绍GIS技术的发展趋势，如大数据、云计算、人工智能等技术的应用，以及未来GIS技术的发
展方向。
应用领域拓展
探讨GIS技术在更多领域的应用可能性，如智慧旅游、智慧医疗等，以及如何更好地发挥GIS技
术在这在发展过程中所面临的挑战，如数据安全、技术更新换代等，同时指出GIS技术的

《GIS应用软》课件

2 环境保护
GIS应用软件可以为城市规划师提供地理数据分析和空间模拟功能，帮助他们更好地评估和规划城市的可持续发展。
GIS应用软件在环境保护领域中发挥着重要作用，可以帮助研究人员监测生态系统变化、分析自然灾害风险等。
3 农业管理
4 地质勘探
GIS应用软件可以协助农业管理者进行土地评估、作物监测、气象分析等工作，提高农业生产效率和资源利用。
GIS应用软件是地质勘探工作中的重要工具，可用于分析地形数据、矿产资源分布等，提供决策支持和风险评估。
软件功能特点
数据可视化
GIS应用软件能够将地理数据转化为图形展示，帮助用户更直观地理解地理现象和空间关系。
空间分析
GIS应用软件提供了丰富的空间分析工具，包括缓冲区分析、叠加分析等，帮助用户深入挖掘地理数据背后的信息。
《GIS应用软》PPT课件
欢迎来到《GIS应用软》PPT课件。本课程将为您介绍关于GIS应用软件的重要信息，包括软件介绍、应用领域、功能特点、使用方法、案例分析、优点和不足，以及结语。让我们一起开始探索这个令人兴奋的主题吧！
软件介绍
什么是GIS应用软件？
GIS应用软件是利用地理信息系统技术开发的工具，用于处理、分析和可视化地理数据，有助于理解地理空间关系和支持决策。
数据编辑
GIS应用软件允许用户对地理数据进行编辑和更新，以便及时反映现实世界的变化。
软件使用方法
1
1. 获取地理数据
通过GIS应用软件提供的数据源或导入外
2. 数据预处理
2

部数据，获取需要处理和分析的地理数据。
清洗和处理地理数据，包括数据格式转
换、数据裁剪、数据拓扑修复等。

《GIS的应用》课件

总结词
三维GIS能够实现地理信息的三维可视化和管理，更加真实地反映地理信息数据。
详细描述
三维GIS通过三维模型构建技术，将地理信息数据以三维形式呈现，可以更加直观地展示地形、地貌、建筑物等信息，同时可以进行三维空间分析和模拟，为城市规划、土地资源管理等领域提供更加精准的决策支持。
虚拟现实GIS
总结词Leabharlann 环境监测0203
环境影响评价
GIS可以结合遥感技术，对环境进行实时监测，为环境治理提供数据支持。
GIS可以对建设项目进行环境影响评价，为项目的可持续发展提供保障。
交通管理
交通流量分析
GIS可以对交通流量进行实时监测和分析，为交通管理提供科学依据。
交通规划
GIS可以整合各种交通数据，进行交通需求预测等，为交通规划提供支持。
领域提供更加直观的应用体验。
04
CHAPTER
GIS的实践操作
GIS数据的获取与处理
GIS数据来源
地图、遥感影像、GPS数据、社会经济统计数据等。
数据预处理
数据清洗、格式转换、坐标系统一等。
数据编辑与更新
地理要素的添加、修改、删除等操作。
GIS的空间分析方法
空间查询
基于空间位置和属性条件进行数据检索。
《GIS的应用》PPT课件
目录
CONTENTS
• GIS的基本概念 • GIS的应用领域 • GIS的发展趋势 • GIS的实践操作 • GIS的案例分析
01
CHAPTER
GIS的基本概念
GIS的定义
总结词
地理信息系统
详细描述
地理信息系统（GIS）是一种用于处理、分析和可视化地理数据的系统。它结合了计算机硬件、软件和相关地理数据，以提供对地理环境信息的查询、存储、分析和可视化等功能。

(完整版)ArcGisChapter04

第二部分基本数量方法及其应用111第四章基于GIS 的服务区分析及其在商业地理和区域规划中的应用“仅仅提供优质的商品和服务是不够的，成功的商家还需要考虑三个因素，那就是：区位、区位、再区位”（Taneja, 1999: 136）。

商店选址时，例行的服务区分析十分重要。

服务区是指“顾客分布的主要区域，在其范围内该店的商品销售量或服务营业额超过其竞争对手”（Ghosh and McLafferty, 1987:62）。

对于一家新店，研究服务区可以在现存竞争对手（包括那些属于同一连锁店的商家）背后发掘商机，从而有利于确定最佳选址。

对于现有商店，通过服务区分析可以考察市场潜力，评价经营业绩。

此外，服务区分析还有助于企业开展下述活动：确定广告覆盖的重点地区，揭示顾客较少的薄弱地段，提出企业扩张计划等等（Berman and Evans, 2001:293-294）。

划分服务区的方法有类比法、邻域法、重力法等几种。

类比法是一种非地理方法，常用的是回归分析法。

邻域法和重力法都是地理方法，可以借助GIS 技术来实现。

类比法和邻域法比较简单，将在第4.1节中介绍。

重力法是本章的重点，将在第4.2节详细阐述。

因为本书重在GIS 应用，所以第4.3和第4.4节将举例说明如何通过GIS 来实现两种地理方法（邻域法和重力法）。

案例4A 从一个新的视角来演示传统的商业地理问题，例子并不是典型的零售店服务区的界定，而是分析芝加哥两支专业棒球队的球迷分布。

案例4B 演示了服务区分析方法在区域规划中的应用，实例是划分中国东北大城市的腹地范围（影响区）。

腹地划分是区域规划中常见的一个重要任务。

第4.5节是本章小结。

4.1服务区分析的基本方法4.1.1类比法及回归模型类比法最早由阿波巴姆(Applebaum ，1966, 1968)提出，是第一个基于经验数据系统预测零售服务的模型。

该模型基于现有商店营运情况来预测类似商店的销量。

阿波巴姆的类比法最初并不包括回归分析，它通过一定样本的问卷调查来获取类比商店的顾客信息：地理分布、55 5612人口结构特征、消费习惯等。

如何使用ArcGIS进行地理空间数据分析

如何使用ArcGIS进行地理空间数据分析Chapter 1：ArcGIS基础知识ArcGIS是由美国环球信息系统公司（Esri）开发的一套地理信息系统（GIS）软件。

它提供了一系列工具和功能来处理地理空间数据，并进行数据分析。

在开始使用ArcGIS进行地理空间数据分析之前，我们首先需要了解基本的ArcGIS知识。

1.1 ArcGIS组成部分ArcGIS由ArcMap、ArcCatalog和ArcToolbox三个主要组件组成。

- ArcMap：用于创建、编辑和分析地图，可以展示地理空间数据的可视化结果。

- ArcCatalog：用于管理地理空间数据，包括浏览、搜索、导入、导出和组织等操作。

- ArcToolbox：提供了各种工具和模型，用于进行地理空间数据的分析和处理。

1.2 数据格式ArcGIS支持多种地理空间数据格式，包括矢量数据（如点、线、面）、栅格数据（如DEM、遥感影像）和表格数据。

在进行地理空间数据分析时，我们需要确保数据格式的正确性和一致性。

1.3 ArcGIS工作空间在ArcGIS中，工作空间是指存储地理空间数据和分析结果的文件夹。

通过创建和管理工作空间，我们可以更方便地进行地理空间数据的管理和分析。

Chapter 2：地理空间数据分析流程使用ArcGIS进行地理空间数据分析的一般流程包括数据准备、数据导入、数据预处理、数据分析和结果输出等步骤。

2.1 数据准备对于地理空间数据分析，首先需要明确研究的目标和涉及的地理数据。

根据目标选择合适的数据源，并进行数据采集和整理。

2.2 数据导入通过ArcCatalog将数据导入ArcGIS，并按照需要创建要素类（Feature Class）、栅格数据集（Raster Dataset）和数据表（Table）等数据集合。

2.3 数据预处理在进行地理空间数据分析之前，通常需要对数据进行预处理。

可以通过数据编辑、数据投影、数据剪裁、数据拓扑检查等操作来清洗和优化数据。

ArcGIS应用教程精华版-4

ESRI 中国（北京）培训中心ESRI China (Beijing) Training Center1ArcGIS 应用教程（精华版）第六章空间参考•空间参考•坐标系统•大地基准•投影与形变•ArcGIS的空间参考策略什么是空间参考?坐标系统u Geographic coordinate system u度量单位为经纬度uProjected coordinate systemu 度量单位为长度单位-X + Y -X -Y + X -YX+ X + YDatausually here大地基准Local datum NAD27Ellipsoid CLARKE 1866Local datum NAD27Ellipsoid CLARKE 1866 Earth-centered datum NAD83Ellipsoid GRS80Earth-centered datum NAD83 Ellipsoid GRS80定位参照I-10 throughRedlands, CAUTM NAD27UTM NAD27UTM NAD83UTM NAD83地图投影ConeCylinder Plane投影形变ShapeS hape Area A rea Distance D istance DirectionDirection 三维地球二维地图坐标系统的组成Datum EllipsoidDatum Ellipsoid ProjectedCoordinate SystemProjected Coordinate System EquationsParametersEquations Parameters GeographicCoordinate System Geographic Coordinate SystemArcMap的投影策略•动态投影Project on-the-fly第六章总结•Geographic Coordinate System ，大地坐标系统，3D ，经纬度单位•Projected Coordinate System ，投影坐标系统，2D ，长度单位•ArcGIS 的投影策略:project on-the-flyPDF 文件使用 "pdfFactory Pro" 试用版本创建。

地信GIS软件及其应用复习

第一章基于MAPGIS的人口数据库建立基于MapGIS的人口数据库建立的基本步骤为数据库设计，空间数据和属性数据的输入，数据的编辑修改，地图整饰和输出等。

结合本课程实验一，简述基于MapGIS的湖南省人口数据库建立的基本过程。

第二章ARCGIS应用基础For personal use only in study and research; not for commercial use空间分析是从空间物体的空间位置、联系等方面去研究空间事物，以对空间事物做出定量的描述。

其主要任务是对空间构成的描述和分析。

For personal use only in study and research; not for commercial use空间分析是从GIS目标之间的空间关系中获取派生的信息和新的知识。

分析对象是地理目标的空间关系。

空间分析内容由拓扑空间查询、缓冲区分析、叠置分析、空间集合分析和地学分析组成。

基于GIS的空间分析按空间数据结构类型可分为栅格数据分析和矢量数据分析两种不同的空间分析模式。

基于GIS的空间分析按分析对象的维数来看，包括一维、二维、三维及多维分析。

For personal use only in study and research; not for commercial use基于GIS的空间分析按分析复杂程度来看，可分为空间查询分析、空间信息提取、空间综合分析、数据挖掘与知识发现、模型构建等。

ArcGIS9由ESRI2004年推出，由数据服务器ArcSDE及4个基础框架组成：桌面软件Desktop GIS、服务器Server GIS、嵌入式Embedded GIS和移动Mobile GIS。

ArcMap、ArcCatalog和Geoprocessing是ArcGIS的基础模块。

For personal use only in study and research; not for commercial useArcMap用于显示、查询、编辑和分析地图数据，具有地图制图的所有功能。

GIS-4-1

单选复选半径选择矩形框选择边界选择
2 . 查询选择Selection
• 查询选择可以通过查询一张表，根据记录或对象的属性选出满足某种条件的对象或记录。 • 查询结果将被命名为Selection工作表，并在地图窗口中突出显示。 • Selection工作表与普通Table一样可进行各种操作，但它是临时表，若要永久保存需起名另存。
采用GIS分析方法
建数据库
• • • • • 林区分布河、湖、海分布道路分布墓地分布地形地貌
空间分析手段
分析
• • • • • 可采面积：面积量算、查询、统计河、湖、海分布： 1km缓冲区道路分布：可通行性分析、 5km缓冲区墓地分布： 10km缓冲区地形地貌：制作DEM及DTM 划分海拔1000m以下、小于5°区域
3.SQL选择
• SQL----Structured Query Language 结构化查询语言 • 20世纪70年代，IBM公司 SEQL---- Structured English Query Language • 1981年，IBM公司 SQL---- Structured Query Language • 1986年美国国家标准局 SQL---- Standard Query Language • 国际标准化组织ISO (International Standardization Organization)
决策
第四章 GIS分析
1.空间信息量算 2.空间信息分类、统计数字化分层只是一种最基本的分类查询、选择、分区是常用的分类方法 3.地理空间分析：空间叠置、缓冲空间信息的量算
1. 长度量算
P5 P2 P1 P3 P4
d = ( x2 − x1 ) + ( y2 − y1 )

地理信息系统应用指导书

地理信息系统应用指导书一、引言地理信息系统（Geographic Information System，简称GIS）是一种用于收集、管理、分析和展示地理数据的工具。

它将地理数据与地图结合起来，为用户提供空间分析和决策支持的功能。

本指导书旨在介绍地理信息系统的应用，并提供相应的操作指南，以帮助用户更好地利用地理信息系统。

二、地理信息系统的基本概念地理信息系统是由地理数据和地图构成的。

地理数据是指描述地球表面上地物和现象的数字化信息，包括地形、地貌、气候、人口等等；地图则是将地理数据以图像的形式表现出来，帮助人们更好地理解地理信息。

地理信息系统的基本概念包括：1. 空间数据：地理信息系统主要处理空间数据，这些数据包括点、线、面等几何要素，以及与之相关的属性信息。

2. 空间参考系统：地理信息系统使用空间参考系统来准确定义地球表面上的位置，常用的空间参考系统包括经纬度坐标系和投影坐标系。

3. 数据获取：地理信息系统可以通过多种手段获取地理数据，包括遥感影像、地面调查等。

4. 数据管理：地理信息系统需要对收集到的地理数据进行管理，包括数据存储、数据更新、数据质量控制等。

5. 空间分析：地理信息系统可以对地理数据进行各种空间分析，如缓冲区分析、叠加分析等，以揭示地理现象之间的关系。

三、地理信息系统的应用领域地理信息系统在各个领域都得到了广泛的应用。

以下是几个常见的应用领域：1. 城市规划：地理信息系统可以提供城市发展规划所需的空间数据，包括土地利用、交通网络、人口分布等，帮助城市规划师做出科学决策。

2. 环境保护：地理信息系统可以用于监测和评估环境变化，如水体污染、森林覆盖变化等，以及制定环境保护政策。

3. 农业管理：地理信息系统可以帮助农民合理安排土地利用，优化农作物种植结构，提高农业生产效益。

4. 水资源管理：地理信息系统可以对水资源进行分析和管理，包括水源地保护、水库调度等。

5. 交通管理：地理信息系统可以优化交通网络设计，监测交通流量，提供交通导航服务，缓解交通拥堵问题。

地理信息系统GIS技术与应用 ppt课件

——森林动态监测信息系统、水资源管理信息系统、矿业资源信息系统、农作物估产信息系统、草场资源管理信息系统、水土流失信息系统等。
2）区域信息系统（Regional GIS），主要以区域综合研究和全面的信息服务为目标，可以有不同的规模，如国家级的、地区或省级的、市级和县级等为各不同级别行政区服务的区域信息系统；也可以按自然分区或流域为单位的区域信息系统。
由于物流对地理空间有较大的依赖性，采用 GIS技术建立企业的物流管理系统可以实现企业物流的可视化、实时动态管理。
38
二、GIS物流管理系统的特点与结构
在物流管理领域，与单纯的数据库及CAD 技术相比，GIS具有独特的技术优势。
1.图形显示输出上的优势 2.分析功能上的优势 3.模型模拟上的优势
第二种观点认为地理信息系统是信息系统的特例。
第三种观点强调地理信息系统的社会作用，认为GIS从根本上改变了一个组织或部门（如政府规划部门）运作的方式。
2020/10/28
பைடு நூலகம்
7
一、GIS的定义
地理信息系统区别于其他信息系统和信息技术的四大特点：
首先，GIS具备处理地理数据的能力。
其二，GIS是在统一的地表定位坐标系统下，以特定的数据模型输入、组织、存储和管理（包括更新）地理数据，并允许用户根据地理空间位置访问数据，或根据专题属性访问数据，以地图的形式表示地理数据。
2020/10/28
35
六、地理空间数据的获取和处理
图幅拼接
图形变换
空间数据的处理
图像纠正
数据格式的转换
图像解释
2020/10/28
36
第四节GIS在物流信息系统中的作用

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Chapter 4Data StructuresThere are a number of different ways to organize the data inside any information system. The choice of a particular spatial data structure is one of the important early decisions in designing a geographic information system. While very few of us will ever design a GIS from start to finish, a knowledge of data structures is valuable from several points of view other than system design. Fundamentally, users must be aware of the characteristics of several different structures, since several different standard forms are commonly used, and the choice of a data structure can affect both data storage volume and processing efficiency. From another point of view, when we are collecting our own data, we must make a choice of data structure for storage. Also, in an operational or research environment, it is often necessary to convert datasets between several different data structures, either to work with several kinds of data at the same time, or to import an unusual dataset into an existing system. It is very important to be able to understand how these conversions affect the underlying information itself.As we stated in the introduction, each different type kind of spatial data or theme in a GIS is referred to as a data layer or data plane. In each of these data layers there are three primitive types of geometrical entities to encode (after Peucker and Chrisman, 1975): points, lines, and polygons or planes. Some authors make a distinction between the representation of a truly three-dimensional surface, such as elevation datasets, and a representation of space in two dimensions, such as legal boundaries of land ownership on a flat map. In this chapter, we will focus on the latter.The essential function of the spatial data we store and manipulate is to subdivide the Earth's surface into meaningful entities or objects that can be characterized. In this way, the contents of a spatial database is a model of the Earth. Points, such as the locations of oil and water wells, and lines, such as the centerlines of roadways or streams, are key elements of this breakdown into component parts. When we consider bounded regions, such as the borders of a subdivision or the edges of a reservoir, we often focus on the boundary lines, and call the enclosed regions polygons. These polygonal regions are not necessarily defined in the precise terms of geometry, where a polygon is ordinarily a planar figure bounded by a series of straight line segments. In spatial data processing, common usage relaxes the requirement that the bounding line segments be straight; we use the term polygon even when the boundaries are curved. We note, however, that not all GISs can work directly with curves as such, but more often permit a single line to have interior digitized points in addition to the end points. Many sophisticated applications have been developed around networks of lines, such as the network developed by the arteries of a transportation or communications system, or a variety of piping systems such as a sanitary sewer or a pressurized gas delivery system.The above discussion concentrates on the geometry of the data. Equally important is the non-spatial or attribute data, which in some systems requires a greater amount of data storage. For a simple spatial object like a water well, the essential spatial information is the geodetic location of the well. The attribute data can include wide range of ancillary information about that well, including its depth, date of drilling, production volume, ownership, and so forth. Many geographic information systems have specialized capabilities for storing and manipulating the attribute data in addition to the spatial information.4.1 Raster Data StructuresOne of the simplest data structures is a raster or cellular organization of spatial data. In a raster structure, a value for the parameter of interest -- elevation in meters above datum, land use class from a specified list, plant biomass in grams per square meter, and so forth -- is developed for every cell in a (frequently regular) array over space. For example, in Figure 4.1, elevation in meters above mean sea level has been recorded at locations on a regular grid. The original data is from a topographic map, from which we have extracted the contour lines. The raster array of elevations is derived from these contour lines using procedures discussed in section 6.7. This kind of data structure is intuitive; we might imagine a survey team determining elevations at regular distances along lines of constant latitude.4.1.1 Simple Raster ArraysThe horizontal dimension of the simplest raster, along the rows of the array, is often oriented parallel to the east-west direction for convenience. Following the conventional practice in image processing, raster elements in this direction along the rows of the array are sometimes called samples, and numbered from the left (or west) margin. Positions in the vertical direction aligning with the columns of the array, are often numbered starting from the top (or northern) boundary. This numbering scheme comes from the computer graphics field, in which displays are often painted on the computer screen or printer from the top down. Thus, the origin of the raster is frequently the upper left corner. This location is considered position (1,1) in some systems of notation, and position (0,0) in others - please be aware of the difference !Note that this referencing system for cells in a raster is different from more traditional georeferencing systems such as latitude-longitude in which one specific point on the Earth's surface (such as the point where the prime meridian crosses the equator) is the origin. It is also different from the Universal Transverse Mercator system, where (in the northern hemisphere) the origin of the coordinate system is in the lower left corner, which is similar to a conventional cartesian system. Often, the distances between cells in the raster are constant in both the row and column directions; in other words, the cells in the raster are square. In this case, it is natural to store the data on a computer in a two-dimensional array.While a simple rectangular raster structure is a very popular approach, there are at least two limitations. First, for a raster structure at a particular scale, there is afinite limit to our ability to specify location. We are in either one cell or another - there is nothing in between. This is true because the line separating adjacent cells is considered to be infinitely narrow. This is the case for any raster data structure, including the non-rectangular forms we'll examine in a moment.Second, adjoining cells may not be evenly spaced, depending on how we define the word adjoining. Consider the center cell in Figure 4.2a. The cells above and below are 1 unit of distance from the center cell, while the cells on the diagonal are approximately 1.41 (the square root of 2) units of distance from the center. For searches through the data, if we include only cells above and below a cell of interest, we are working in a 4-connected neighborhood (Figure 4.2a), and all cells are equidistant from their neighbors. Thus, neighboring cells share an edge. If we include elements on the diagonal, we are working in an 8-connected neighborhood (Figure 4.2b), and now cells are not evenly spaced. In this latter case, some cells in the neighborhood share an edge, while others share only a vertex. Since all the cells in these two examples have neighbors of the same size and shape, we say that we have spatial neighborhood similarity (Tobler, 1979). An alternative is that each cell in the neighborhood has a different size and organization. While this is certainly reasonable for many kinds of geographic phenomena (urban neighborhoods are of different sizes and shapes, for example), it is not a common data model in a raster geographic information system.Whether it is more appropriate to base an analysis on 4-connected or 8-conaected space must be determined by the characteristics of the data and the objectives of the exercise. In many metropolitan areas of recent design, the streets are laid out in a rectangular network. If a raster data layer is used to represent this transportation network, a 4-conaected analysis of travel distances is appropriate, since diagonal motion is not ordinarily possible. In contrast, an 8-connected analysis of travel might be appropriate for a raster database of the weight of cotton harvested per hectare, where the objects traveling through space are airborne insect pests that do not respect the orientation of the local roads.It isn't necessary that the size of a raster be the same in the row and column directions. But for reasons of simplicity and symmetry, applications based on the use of rectangular raster cells are rare. It is also important to recognize that there are two different theoretical interpretations of the value stored in a cell of our sample raster of elevations. A cell's value might represent an elevation measured at the center of the cell, or the value might represent the average elevation of the entire cell. It is likely that neither interpretation is truly correct, although we typically operate as if the latter is our underlying model of truth. For practical purposes, the distinction rarely matters if we've chosen an appropriately small cell size.The size of the raster cell in a dataset is sometimes confused with the minimum mapping unit, i.e., the smallest element we can uniquely represent in our data. However, raster cell size and minimum mapping unit are not quite the same. Choosing an appropriate minimum mapping unit (or resel, an abbreviation for resolution element, as discussed in Chapter 1) for a study is a very important decision in the design phase of a project. Consider a dataset in which we wish to recordvegetation types. In Figure 4.3a is a sketch map that might have come from a field mapping exercise. In the study area, a patch of evergreen trees is surrounded by grassland. One way to manually convert this data to a raster form is to overlay a regular grid on the sketch map (Figure 4.3b) and to assign a vegetation class to each cell, based on which class covers the majority of the cell.Consider the problem shown in Figure 4.3c. A different grid with larger cells has replaced the first grid. Four raster cells overlay the small area where we have grassland surrounding the stand of evergreen trees. Based on the majority rule we chose above, the evergreen stand will not appear in the database. Each of four cells is covered by a small piece of the evergreen stand, but in each of these cells, the evergreen stand is only a minority component of the total cell area.If the size of the raster cells had been different, the stand's existence would have been captured, as it was in Figure 4.3b. We might have even captured this stand if the cells in the raster had been oriented differently. This illustrates the need to understand the relationship between the minimum mapping unit and the raster cell size. A convenient rule of thumb, based on statistical sampling theory, is to use a raster cell half the length (or one-fourth the area) of the smallest feature you wish to record. A more conservative suggestion would be to use a raster cell one-third or one-fourth the length of the smallest desired feature.To reinforce this important issue, the size of the resels in a dataset must be significantly larger than the size of the raster cells, or we run the risk of losing important spatial objects when we build the database.Geometrical figures that completely cover a flat surface are called tessellations. The square raster structure we've just discussed is one such tessellation. Triangles and hexagons are two other tessellations of the plane (Figure 4.4). There has been a continuing interest in regular hexagonal cells as the basis of spatial data structures, in part because in a hexagonal tessellation of the plane, all neighboring cells are equidistant, unlike the situation in a raster of square cells (Burt, 1980). However, the use of a hexagonal or even a triangular data structure creates two problems that may be significant in some circumstances. First, the cells cannot be recursively subdivided into smaller cells of the same shape as the original cells, as is the case in a square system. Conversely, a hexagon made up of smaller hexagons will not be the same shape as those smaller hexagons. A third and less important point is that a numbering system for a hexagonal system is more complex than that of a square system, imposing at least a small additional overhead in system operations.Let's return briefly to the golf course example from the first chapter and create a raster database for this site. The simplest of the three datasets we considered was the map from the local planning agency, which describes the legal property boundaries, streets, and restrictions on construction and development due to easement for public utilities. In Figures 4.5a and 4.5b, respectively, we show a version of the original map, and a raster-converted representation of the map. The numbers in each cell indicate the permitted land use for each cell, using a majority rule as discussed above. By adding up the number of cells in each category, we can determine the fractional area coverage of each land use in the map:Land-Use Total PercentCategory Class Cells of Total--------------------- ------- ------ ---------Roads 1 213 39.0%Easement Restrictions 2 129 23.6%Unrestricted Development 3 204 37.4%Raster datasets in practical use can be very large. As an example, satellite remote sensing data is frequently used to distinguish categories of land cover over large areas. The standard view of the earth from the U.S.'s Landsat series of satellites covers approximately 30,000 square kilometers, which at a nominal pixel size of 30 meters, corresponds to approximately 35 million raster cells or pixels (where pixel is a contraction of "picture element").When dealing with such large datasets, there are several algorithms used to compress the data. Some of these algorithms are completely reversible; that is, we may recover exactly the original datasets. Others minimize the volume of the stored data by losing a (preferably) small and controlled amount of the original information. We will briefly mention two perfect-recovery compression mechanisms. The first, called run-length encoding, tries to exploit the fact that many datasets have large homogeneous regions. Consider the raster data in Figure 4.5b. The data attribute values at the beginning of the sixth row from the top are:1 1 1 3 3 323 3 3 3 3 3 3 3 3In a run-length encoded version, the original data is replaced by data pairs or tuples. The first number in the pair is a counter, indicating how many repetitions of the second number, the data value, occur starting at that point in the row. Thus, three cells in a row with data value 1 are compressed from three elements (1 1 1 ) to two (3 1). The data from the beginning of row 6 in our example would then become:(3 1) (3 3) (1 2) (9 3)Thus, we have three 1s, followed by three 3s, followed by a single 2, and then nine 3s. In this case, the original data occupied 16 elements and the compressed data 8 elements, for a compression factor of 50%. Note that we have assumed that the data elements in the run-length encoded file-both attribute values and repeat counts - occupy the same amount of space. The effectiveness of this compression mechanism varies with the dataset. In the worst case, where there are no repeating sequences at all along the rows of the array, the algorithm will make the dataset twice as large. For binary data (in other words, data with only two possible classes), Burrough (1986) shows another kind of run-length coding, where in a single row, the position of a cell where a run begins is stored.A second technique for compressing raster datasets uses what are called chain codes. In some instances we can consider a map as a set of spatially referenced objects placed on top of a background. The use of chain codes takes this point of view. The coordinates of a starting point on the border of an object (for example, a reservoir) are recorded, and then we store the sequence of cardinal directions of the cells that make up the boundary (Figure 4.3c). This may be an efficient means to store areas,particularly since each spatial object is kept as a separate entity in the database. However, some kinds of processing will require that the entire raster array be reconstituted which may be an unacceptable cost.4.1.2 Hierarchical Raster StructuresThere has been a great deal of interest recently in a family of enhancements to the standard square-celled raster structure. We will introduce these modifications through an example. Consider a set of digital elevation data values, where the fundamental data are stored on a 50-meter square grid (that is, each cell represents a square that is 50 meters on a side). Rather than storing this information as a single layer in our GIS, we shall store it in several interrelated layers. One layer corresponds to the original 50-meter interval raster data. A second layer consists of data resampled to a 100-meter interval. Each cell in the 100-meter layer is the algebraic average of four cells in the 50-meter layer (Figure 4.6a). A third layer is created by averaging four 100-meter cells to create 200-meter cells. And we could continue this spatial averaging process, decreasing the spatial resolution at each "higher" layer, until at the highest layer we might have a single pixel, whose elevation value is the numerical average of all the data in the original 50-meter layer. As an aside, this only works perfectly - that is, yields a single pixel in the highest layer - if the original 50-meter data was a square array of 2n pixels on a side.In general, this is called a pyramidal data structure, since we can imagine each of the derived layers stacked on top of previous layers, in the shape of a pyramid. If each higher level layer has pixels that are exactly twice as wide as the previous (and thus, four times the area), as in our example, this is called a quadtree data structure (Samet, 1984). The name quadtree comes from the four-fold reduction in number of pixels in each layer, and the fact that that structure is easily pictured (as well as represented in the digital database) as a tree (Figure 4.6b). Leaves at the bottom of the tree, in this case, represent the elevation values in the original 50-meter data layer, and branches in the tree represent the averaging process as connections between pixels or cells in the different layers.Tobler and Chen (1986) discuss a modified quadtree system that may be useful for coding data for the entire Earth's surface. The single node at the top level of the tree represents the entire planet. At the 15th level, the resolution of the cells is comparable to that of meteorological satellites. At the 26th level, the spatial resolution is comparable to most aerial photography, and at level 30 there is centimeter-scale resolution over the globe, which Tobler and Chen indicate is adequate for geodetic control points in many applications.Data stored in a simple quadtree will occupy somewhat more storage space than the original raster. This is true because the original raster is a component of the pyramid, and forms the lowest layer in the quadtree. Consider a raster with 32 pixels on a side. This raster requires (32 times 32) or 1024 cells be stored. If we include the higher level layers, we require:Layer Width in Cells Total Cells1 32 10242 16 2563 8 644 4 165 2 46 1 1which totals 1365 cells. For this small example, we increase the storage overhead 33%, since there are 33% more cells in the complete tree than in the original raster.However, this storage cost can be offset by some distinct advantages. There may be some processing steps that do not require the full resolution of the original raster dataset. If the data have been stored in a quadtree, a given step in the analysis may be based on whatever layer in the quadtree holds information of the appropriate resolution or scale, potentially saving both computer system input/output and processing time. A situation in which this advantage may be significant is where different datasets in the geographic information system have different mean resolutions. This is one of many examples we will discuss of the classic database management trade-off of storage costs versus processing costs, which we will see again in our discussions.Additionally, the hierarchical nature of the quadtree may permit us to minimize some kinds of search through the database (Smith et al., 1987). For example, if we are looking for a high elevation in the database (but we don't necessarily need the highest elevation), we could instruct our system to work down the quadtree: first find the quadrant in the next-to-uppermost layer with the highest average elevation, and ignore the rest of the data. In Figure 4.7, the lower-right quadrant of the two-by-two layer of the quadtree is the highest, with an elevation of 121 meters. We therefore consider only this fraction of the data at the next level down, removing 75% of the data from further consideration.Continuing down towards the highest-precision data layers in Figure 4.7, we find the highest cell in the lower-right corner of the four-by-four layer at an elevation of 127 meters. Repeating the procedure in the next level of the quadtree, we locate the cell marked as 134 meters. In total, we have examined 12 cells, out of total database of 64 elemental cells at the bottom of the tree, or 84 cells in the complete tree. In this way, we can search only a small fraction of the entire database, and still find an acceptably high place for many purposes. However, with this search strategy there is no guarantee that we will find the highest location in the database.If we stored more than just a single data value at each node in the quadtree, we can even satisfy a specific search for the highest location in the database. Consider a version in which we store three values at each quadtree node: the mean elevation of the appropriate area, as well as the minimum and maximum known elevation values in this area. In this case, we can use a search strategy parallel to the one discussed above to be able to find the highest (or lowest) cell in the entire region in a very small amount of time. This is another example of the compromise between storage and processing. Here we have additional storage costs (since we store three values in each cell) in order to decrease the time required to search the database.Another way in which a quadtree data structure can be used to minimizesearch effort involves categorical data. Imagine a quadtree for storing a number of land-use classes. At the base of the tree, each cell takes the attribute for the dominant class in the cell. However, each composite cell at the each higher level of the tree does not have to store just the dominant class for the cell or just the fact that the cell is not homogeneous; the higher-level cells are not limited to storing a single value. Instead, each higher-level cell could store a list of all the land-use classes stored in the cells beneath it. In this way, when we search through the quadtree looking for areas with certain kinds of land use, going from less geographic detail to more, we can discard limbs of the tree with no error.A modification of the quadtree can be used to minimize system storage under some circumstances. In a maximum block representation, we systematically eliminate all the redundant information in the tree. Consider the land-use example in Figure 4.8, where each cell in the original raster (Figure 4.8a) is coded in a binary fashion for land use. In this example, 0 represents undeveloped land surrounding a city, and 1 represents urban land use. In the associated quadtree, open circles represent the undeveloped land, and solid boxes represent the urban land use. For each lower-resolution level, we average the pixel values in the higher-resolution layer, to code for the fractional coverage of urban area in a cell.The numbering scheme to specify cells in a quadtree proceeds from the upper left in the data array; cell numbers are recorded in the upper right corner of the individual cells in the accompanying figures. Starting with the 2-by-2 area in the upper left, cells 1 through 4 are the upper left, upper right, lower left, and lower right, in sequence. The 2-by-2 area in the upper right is then numbered in the same pattern, then the area in the lower left, and finally we number the cells in the lower right. In a larger data array, we would then number the cells in the upper right corner 4-by-4 array, and so on.However, notice that there is redundancy in the companion quadtree to this raster array (Figure 4.8b). Cells 5, 6, 7, and 8 are all class 0, thus their average is zero; therefore specifying the exact attribute value of zero in the higher level layer tells us everything about the coordinating pixels in the lower-level layer. In this case, we could leave out the nodes in the tree at the lower (or increased resolution) level, thus decreasing the storage costs without losing any information. This is the maximum block representation (Figures 4.8c and 4.8d) of the quadtree. In this case, the maximum block quadtree requires eight fewer nodes to describe the spatial arrangement of land use. This revised tree requires a bit more processing to construct and use than a simple quadtree, since one must test to determine whether a cell at a particular resolution exists as a unique entity, or is represented as a part of a composite larger-area node in a different layer.One problem with the quadtree data structure is that it is not invariant under translation, rotation, or scaling. This is the same problem that is found with any arbitrary subdivision of datasets. Arbitrary boundaries are placed on the underlying continuous data to develop the quadtree, and the locations of the boundaries can strongly affect the resulting derived data. Consider a simple example of translation along one of the major axes of the raster array. Figure 4.9a shows a simple objectcomposed of six shaded raster cells, along with the maximum block quadtree representation. The shaded cells in the raster array are indicated in the tree by filled squares; the white background is indicated in the tree by open circles. What happens when we translate the object to the east by one cell width? This could represent either another object of the same kind in a different location, or another quadtree with the same cell size but a different starting location. The object in Figure 4.9b looks the same to the human interpreter, but evolves into a quadtree with a very different shape. One solution to this problem is offered by Scott and Iyengar (1986), who discuss a related data structure that is translation invariant – that is, if the entire figure is moved, the characteristics of the data structure will not change. Their system is based upon (1) recognizing homogeneous square regions in a raster, and (2) recording both the size of the block and the coordinates of the block's upper left-hand corner.4.2 Vector Data StructuresA mathematician might define a vector as a quantity with a starting coordinate, and an associated displacement and direction (or bearing). In a description of spatial data based on vectors, we make the assumption that an element may be located at any location, without the positional constraints of a raster array. To briefly review the previous section, raster data structures are based on (usually regular) decompositions (i.e., "breakdowns") of the plane. In raster-based systems for representing spatial data, our ability to specify a location in space is limited by the size of the raster elements, since we are unable to know anything about different locations within a raster cell. In other words, there is a limit to geographic specificity.Vector data structures are based on elemental points whose locations are known to arbitrary precision, in contrast to the raster or cellular data structures we have described. As a simple example, to store a circle in one of the raster data structures above, we might find and encode all the raster cells (of a pre-defined shape and size) whose locations correspond to the boundary of the circle. This might be called a low-level description of the circle. A high-level description, on the other hand, might efficiently store the circle by recording a point location for the center of the circle, and specifying the radius. In this example, note that the high level description, based on a vector representation is more efficient in terms of the amount of data required, as well as more precise, if we have a means to indicate the geometric object circle. Many computer graphics and computer-aided-design (CAD) systems use vector-like models as their internal data organization, using primitives such as points, lines, and circles. These elements may also be found in the computer graphics language standards, such as GKS (Enderle et al., 1984), as well as many personal computer languages such as BASIC. These advantages may disappear, however, if we have to store the circle as a connected sequence of straight-line segments.For spatial data in most geographic information systems, the coordinate data is encoded, and after input processing, is stored as some combination of points, lines, and areas or polygons (Males, 1977; Peucker and Chrisman, 1975; and Peuquet, 1977). Several forms of vector data structures are in common use, both as。