数据仓库--外文翻译

合集下载

SQL数据库外文翻译--数据库的工作

SQL数据库外文翻译--数据库的工作

Working with DatabasesThis chapter describes how to use SQL statements in embedded applications to control databases. There are three database statements that set up and open databases for access: SET DATABASE declares a database handle, associates the handle with an actual database file, and optionally assigns operational parameters for the database.SET NAMES optionally specifies the character set a client application uses for CHAR, VARCHAR, and text Blob data. The server uses this information totransli terate from a database‟s default character set to the client‟s character set on SELECT operations, and to transliterate from a client application‟s character set to the database character set on INSERT and UPDATE operations.g CONNECT opens a database, allocates system resources for it, and optionally assigns operational parameters for the database.All databases must be closed before a program ends. A database can be closed by using DISCONNECT, or by appending the RELEASE option to the final COMMIT or ROLLBACK in a program. Declaring a databaseBefore a database can be opened and used in a program, it must first be declared with SET DATABASE to:CHAPTER 3 WORKING WITH DATABASES. Establish a database handle. Associate the database handle with a database file stored on a local or remote node.A database handle is a unique, abbreviated alias for an actual database name. Database handles are used in subsequent CONNECT, COMMIT RELEASE, and ROLLBACK RELEASE statements to specify which databases they should affect. Except in dynamic SQL (DSQL) applications, database handles can also be used inside transaction blocks to qualify, or differentiate, table names when two or more open databases contain identically named tables.Each database handle must be unique among all variables used in a program. Database handles cannot duplicate host-language reserved words, and cannot be InterBase reserved words.The following statement illustrates a simple database declaration:EXEC SQLSET DATABASE DB1 = ‟employee.gdb‟;This database declaration identifies the database file, employee.gdb, as a database the program uses, and assigns the database a handle, or alias, DB1.If a program runs in a directory different from the directory that contains the database file, then the file name specification in SET DATABASE must include a full path name, too. For example, the following SET DATABASE declaration specifies the full path to employee.gdb:EXEC SQLSET DATABASE DB1 = ‟/interbase/examples/employee.gdb‟;If a program and a database file it uses reside on different hosts, then the file name specification must also include a host name. The following declaration illustrates how a Unix host name is included as part of the database file specification on a TCP/IP network:EXEC SQLSET DATABASE DB1 = ‟jupiter:/usr/interbase/examples/employee.gdb‟;On a Windows network that uses the Netbeui protocol, specify the path as follows: EXEC SQLSET DATABASE DB1 = ‟//venus/C:/Interbase/examples/employee.gdb‟; DECLARING A DATABASEEMBEDDED SQL GUIDE 37Declaring multiple databasesAn SQL program, but not a DSQL program, can access multiple databases at the same time. In multi-database programs, database handles are required. A handle is used to:1. Reference individual databases in a multi-database transaction.2. Qualify table names.3. Specify databases to open in CONNECT statements.Indicate databases to close with DISCONNECT, COMMIT RELEASE, and ROLLBACK RELEASE.DSQL programs can access only a single database at a time, so database handle use is restricted to connecting to and disconnecting from a database.In multi-database programs, each database must be declared in a separate SET DATABASE statement. For example, the following code contains two SET DATABASE statements:. . .EXEC SQLSET DATABASE DB2 = ‟employee2.gdb‟;EXEC SQLSET DATABASE DB1 = ‟employee.gdb‟;. . .4Using handles for table namesWhen the same table name occurs in more than one simultaneously accessed database, a database handle must be used to differentiate one table name from another. The database handle is used as a prefix to table names, and takes the formhandle.table.For example, in the following code, the database handles, TEST and EMP, are used to distinguish between two tables, each named EMPLOYEE:. . .EXEC SQLDECLARE IDMATCH CURSOR FORSELECT TESTNO INTO :matchid FROM TEST.EMPLOYEEWHERE TESTNO > 100;EXEC SQLDECLARE EIDMATCH CURSOR FORSELECT EMPNO INTO :empid FROM EMP.EMPLOYEEWHERE EMPNO = :matchid;. . .CHAPTER 3 WORKING WITH DATABASES38 INTERBASE 6IMPORTANTThis use of database handles applies only to embedded SQL applications. DSQL applications cannot access multiple databases simultaneously.4Using handles with operationsIn multi-database programs, database handles must be specified in CONNECT statements to identify which databases among several to open and prepare for use in subsequent transactions.Database handles can also be used with DISCONNECT, COMMIT RELEASE, and ROLLBACKRELEASE to specify a subset of open databases to close.To open and prepare a database w ith CONNECT, see “Opening a database” on page 41.To close a database with DISCONNECT, COMMIT RELEASE, or ROLLBACK RELEASE, see“Closing a database” on page 49. To learn more about using database handles in transactions, see “Accessing an open database” on p age 48.Preprocessing and run time databasesNormally, each SET DATABASE statement specifies a single database file to associate with a handle. When a program is preprocessed, gpre uses the specified file to validate the program‟s table and column referenc es. Later, when a user runs the program, the same database file is accessed. Different databases can be specified for preprocessing and run time when necessary.4Using the COMPILETIME clause A program can be designed to run against any one of several identically structured databases. In other cases, the actual database that a program will use at runtime is not available when a program is preprocessed and compiled. In such cases, SET DATABASE can include a COMPILETIME clause to specify a database for gpre to test against during preprocessing. For example, the following SET DATABASE statement declares that employee.gdb is to be used by gpre during preprocessing: EXEC SQLSET DATABASE EMP = COMPILETIME ‟employee.gdb‟;IMPORTANTThe file specification that follows the COMPILETIME keyword must always be a hard-coded, quoted string.DECLARING A DATABASEEMBEDDED SQL GUIDE 39When SET DATABASE uses the COMPILETIME clause, but no RUNTIME clause, and does not specify a different database file specification in a subsequent CONNECT statement, the same database file is used both for preprocessing and run time. To specify different preprocessing and runtime databases with SET DATABASE, use both the COMPILETIME andRUNTIME clauses.4Using the RUNTIME clauseWhen a database file is specified for use during preprocessing, SET DATABASE can specify a different database to use at run time by including the RUNTIME keyword and a runtime file specification:EXEC SQLSET DATABASE EMP = COMPILETIME ‟employee.gdb‟RUNTIME ‟employee2.gdb‟;The file specification that follows the RUNTIME keyword can be either ahard-coded, quoted string, or a host-language variable. For example, the following C code fragment prompts the user for a database name, and stores the name in a variable that is used later in SET DATABASE:. . .char db_name[125];. . .printf("Enter the desired database name, including node and path):\n");gets(db_name);EXEC SQLSET DATABASE EMP = COMPILETIME ‟employee.gdb‟ RUNTIME : db_name; . . .Note host-language variables in SET DATABASE must be preceded, as always, by a colon.Controlling SET DATABASE scopeBy default, SET DATABASE creates a handle that is global to all modules in an application.A global handle is one that may be referenced in all host-language modules comprising the program. SET DATABASE provides two optional keywords to change the scope of a declaration:g STATIC limits declaration scope to the module containing the SET DATABASE statement. No other program modules can see or use a database handle declared STATIC.CHAPTER 3 WORKING WITH DATABASES40 INTERBASE 6EXTERN notifies gpre that a SET DATABASE statement in a module duplicates a globally-declared database in another module. If the EXTERN keyword is used, then another module must contain the actual SET DATABASE statement, or an error occurs during compilation.The STATIC keyword is used in a multi-module program to restrict database handle access to the single module where it is declared. The following example illustrates the use of theSTATIC keyword:EXEC SQLSET DATABASE EMP = STATIC ‟employee.gdb‟;The EXTERN keyword is used in a multi-module program to signal that SET DATABASE in one module is not an actual declaration, but refers to a declaration made in a different module. Gpre uses this information during preprocessing. The following example illustrates the use of the EXTERN keyword:EXEC SQLSET DATABASE EMP = EXTERN ‟employee.gdb‟;If an application contains an EXTERN reference, then when it is used at run time, the actual SET DATABASE declaration must be processed first, and the database connected before other modules can access it.A single SET DATABASE statement can contain either the STATIC or EXTERN keyword, but not both. A scope declaration in SET DATABASE applies to both COMPILETIME and RUNTIME databases.Specifying a connection character setWhen a client application connects to a database, it may have its own character set requirements. The server providing database access to the client does not know about these requirements unless the client specifies them. The client application specifies its character set requirement using the SET NAMES statement before it connects to the database.SET NAMES specifies the character set the server should use when translating data from the database to the client application. Similarly, when the client sends data to the database, the server translates the data from the client‟s character set to the database‟s default character set (or the character set for an individual column if it differs from the databas e‟s default character set). For example, the following statements specify that the client is using the DOS437 character set, then connect to the database:EXEC SQLOPENING A DATABASEEMBEDDED SQL GUIDE 41SET NAMES DOS437;EXEC SQLCONNECT ‟europe.gdb‟ USER ‟JAMES‟ PASSWORD ‟U4EEAH‟;For more information about character sets, see the Data Definition Guide. For the complete syntax of SET NAMES and CONNECT, see the Language Reference. Opening a databaseAfter a database is declared, it must be attached with a CONNECT statement before it can be used. CONNECT:1. Allocates system resources for the database.2. Determines if the database file is local, residing on the same host where the application itself is running, or remote, residing on a different host.3. Opens the database and examines it to make sure it is valid.InterBase provides transparent access to all databases, whether local or remote. If the database structure is invalid, the on-disk structure (ODS) number does not correspond to the one required by InterBase, or if the database is corrupt, InterBase reports an error, and permits no further access. Optionally, CONNECT can be used to specify:4. A user name and password combination that is checked against the server‟s security database before allowing the connect to succeed. User names can be up to 31 characters.Passwords are restricted to 8 characters.5. An SQL role name that the user adopts on connection to the database, provided that the user has previously been granted membership in the role. Regardless of role memberships granted, the user belongs to no role unless specified with this ROLE clause.The client can specify at most one role per connection, and cannot switch roles except by reconnecting.6. The size of the database buffer cache to allocate to the application when the default cache size is inappropriate.Using simple CONNECT statementsIn its simplest form, CONNECT requires one or more database parameters, each specifying the name of a database to open. The name of the database can be a: Database handle declared in a previous SET DATABASE statement.CHAPTER 3 WORKING WITH DATABASES42 INTERBASE 61. Host-language variable.2. Hard-coded file name.4Using a database handleIf a program uses SET DATABASE to provide database handles, those handles should be used in subsequent CONNECT statements instead of hard-coded names. For example,. . .EXEC SQLSET DATABASE DB1 = ‟employee.gdb‟;EXEC SQLSET DATABASE DB2 = ‟employee2.gdb‟;EXEC SQLCONNECT DB1;EXEC SQLCONNECT DB2;. . .There are several advantages to using a database handle with CONNECT:1. Long file specifications can be replaced by shorter, mnemonic handles.2. Handles can be used to qualify table names in multi-database transactions. DSQL applications do not support multi-database transactions.3. Handles can be reassigned to other databases as needed.4. The number of database cache buffers can be specified as an additional CONNECT parameter.For more information about setting the number of database cache buffers, see “Setting d atabase cache buffers” on page 47. 4Using strings or host-language variables Instead of using a database handle, CONNECT can use a database name supplied at run time. The database name can be supplied as either a host-language variable or a hard-coded, quoted string.The following C code demonstrates how a program accessing only a single database might implement CONNECT using a file name solicited from a user at run time:. . .char fname[125];. . .printf(‟Enter the desired database name, including nodeand path):\n‟);OPENING A DATABASEEMBEDDED SQL GUIDE 43gets(fname);. . .EXEC SQLCONNECT :fname;. . .TipThis technique is especially useful for programs that are designed to work with many identically structured databases, one at a time, such as CAD/CAM or architectural databases.MULTIPLE DATABASE IMPLEMENTATIONTo use a database specified by the user as a host-language variable in a CONNECT statement in multi-database programs, follow these steps:1. Declare a database handle using the following SET DATABASE syntax:EXEC SQLSET DATABASE handle = COMPILETIME ‟ dbname‟;Here, handle is a hard-coded database handle supplied by the programmer, dbnameis a quoted, hard-coded database name used by gpre during preprocessing.2. Prompt the user for a database to open.3. Store the database name entered by the user in a host-language variable.4. Use the handle to open the database, associating the host-language variable with the handle using the following CONNECT syntax:EXEC SQLCONNECT : variable AS handle;The following C code illustrates these steps:. . .char fname[125];. . .EXEC SQLSET DATABASE DB1 = ‟employee.gdb‟;printf("Enter the desired database name, including nodeand path):\n");gets(fname);EXEC SQLCONNECT :fname AS DB1;. . .CHAPTER 3 WORKING WITH DATABASES44 INTERBASE 6In this example, SET DATABASE provides a hard-coded database file name for preprocessing with gpre. When a user runs the program, the database specified in the variable, fname, is used instead. 4Using a hard-coded database namesIN SINGE-DATABASE PROGRAMSIn a single-database program that omits SET DATABASE, CONNECT must contain a hard-coded, quoted file name in the following format:EXEC SQLCONNECT ‟[ host[ path]] filename‟; host is required only if a program and a dat abase file it uses reside on different nodes.Similarly, path is required only if the database file does not reside in the current working directory. For example, the following CONNECT statement contains ahard-coded file name that includes both a Unix host name and a path name:EXEC SQLCONNECT ‟valdez:usr/interbase/examples/employee.gdb‟;Note Host syntax is specific to each server platform.IMPORTANT A program that accesses multiple databases cannot use this form of CONNECT.IN MULTI-DATABASE PROGRAMSA program that accesses multiple databases must declare handles for each of them in separate SET DATABASE statements. These handles must be used in subsequent CONNECT statements to identify specific databases to open:. . .EXEC SQLSET DATABASE DB1 = ‟employee.gdb‟;EXEC SQLSET DATABASE DB2 = ‟employee2.gdb‟;EXEC SQLCONNECT DB1;EXEC SQLCONNECT DB2;. . .Later, when the program closes these databases, the database handles are no longer in use. These handles can be reassigned to other databases by hard-coding a file name in a subsequent CONNECT statement. For example,OPENING A DATABASEEMBEDDED SQL GUIDE 45. . .EXEC SQLDISCONNECT DB1, DB2;EXEC SQLCONNECT ‟project.gdb‟ AS DB1;. . .Additional CONNECT syntaxCONNECT supports several formats for opening databases to provide programming flexibility. The following table outlines each possible syntax, provides descriptions and examples, and indicates whether CONNECT can be used in programs that access single or multiple databases:For a complete discussion of CONNECT syntax and its uses, see the Language Reference.Syntax Description ExampleSingle accessMultiple accessCONNECT …dbfile‟; Opens a single, hard-coded database file, dbfile.EXEC SQLCONNECT …employee.gdb‟;Yes NoCONNECT handle; Opens the database file associated with a previously declared database handle. This is the preferred CONNECT syntax.EXEC SQLCONNECT EMP;Yes YesCONNECT …dbfile‟ AS handle;Opens a hard-coded database file, dbfile, and assigns a previously declared database handle to it.EXEC SQLCONNECT …employee.gdb‟AS EMP;Yes Yes CONNECT :varname AS handle;Opens the database file stored in the host-language variable, varname, and assigns a previously declared database handle to it.EXEC SQL CONNECT :fname AS EMP;Yes YesTABLE 3.1 CONNECT syntax summaryCHAPTER 3 WORKING WITH DATABASES46 INTERBASE 6Attaching to multiple databasesCONNECT can attach to multiple databases. To open all databases specified in previous SETDATABASE statements, use either of the following CONNECT syntax options: EXEC SQLCONNECT ALL;EXEC SQLCONNECT DEFAULT;CONNECT can also attach to a specified list of databases. Separate each database request from others with commas. For example, the following statement opens two databases specified by their handles:EXEC SQLCONNECT DB1, DB2;The next statement opens two hard-coded database files and also assigns them to previously declared handles:EXEC SQLCONNECT ‟employee.gdb‟ AS DB1, ‟employee2.gdb‟ AS DB2;Tip Opening multiple databases with a single CONNECT is most effective when a program‟s database access is simple and clear. In complex programs that open and close several databases, that substitute database names with host-language variables, or that assign multiple handles to the same database, use separate CONNECT statements to make program code easier to read, debug, and modify.Handling CONNECT errors. The WHENEVER statement should be used to trap and handle runtime errors that occur during database declaration. The following C code fragment illustrates an error-handling routine that displays error messages and ends the program in an orderly fashion:. . .EXEC SQLWHENEVER SQLERRORGOTO error_exit;. . .OPENING A DATABASEEMBEDDED SQL GUIDE 47:error_exitisc_print_sqlerr(sqlcode, status_vector);EXEC SQLDISCONNECT ALL;exit(1);. . .For a complete discussion of SQL error handling, see Chapter 12, “Error Handling and Recovery.”数据库的工作这章描述怎样使用在嵌入式应用过程中的SQL语句控制数据库。

大数据技术-大数据数据仓库

大数据技术-大数据数据仓库

大数据数据仓库1、数据仓库基本概念数据仓库即Data Warehouse,简称DW,主要研究和解决从数据中获取信息的问题,为企业所有级别的决策制定过程,提供所有类型数据支持的战略集合。

本质上,数据仓库试图提供一种从操作型系统到决策支持系统的数据流架构模型。

主要是解决多重数据复制带来的高成本问题。

在有数仓之前,需要大量的冗余数据来支撑多个决策支持系统,尽管每个系统服务于不同的用户,但是这些系统经常需要大量相同的数据。

决策支持系统Decision support System,即DDS,是用于支持业务或组织决策活动的信息系统,服务于组织管理、运营和规划管理层(通常是中层或高级管理层),帮助人们对可能快速变化并且不容易预测结果的问题做出决策。

数据仓库就是为决策系统提供数据支持的。

1.1、数据仓库的概念数据仓库之父Bill Inmon提出了被广泛认可的数据仓库定义,把数据仓库定义为是一个面向主题的、集成的、非易失的和时变的数据集合,用于支持管理者的决策过程。

1.2、数据仓库的特性面向主题传统的操作型系统是围绕功能性应用来组织数据的,各个业务系统可能是相互隔离的,而数据仓库是面向主题的。

主题是一个抽象概念,简单的来说就是用户使用数据仓库进行决策时所关心的重点方面,一个主题通常与多个操作型系统相关。

例如一个保险公司要分析销售数据,就可以建立一个专注销售数据的数据仓库,通过这个数据仓库就可以得到“过去半年销售保险总额以及各险种的占比”,这个场景下销售就是一个数据主题,同时数据仓库设计时要排除对于决策无用的数据,上述场景决策关注销售,理赔不是这个主题的,因此理赔的数据就不需要在这个数据仓库中存储。

集成集成是与面向主题密切相关。

我们上面的保险的例子中,建立一个保险公司的销售数据仓库,不同的险种由不同的部门负责,他们有各自独立的销售数据库,此时要想从公司层面整体分析销售数据,必须将各分散的数据源统一成一致的、无歧义的数据格式后,再存储在数据仓库中,因此就需要解决各数据源的矛盾之处,例如字段的同名异意、异名同意、字段数据类型不一致,长度不一致,计量单位不一致等等。

外文翻译---数据库管理

外文翻译---数据库管理

英文资料翻译资料出处:From /china/ database英文原文:Database ManagementDatabase (sometimes spelled database) is also called an electronic database, referring to any collections of data, or information, that is specially organized for rapid search and retrieval by a computer. Databases are structured to facilitate the storage, retrieval modification and deletion of data in conjunction with various data-processing operations. Database can be stored on magnetic disk or tape, optical disk, or some other secondary storage device.A database consists of a file or a set of files. The information in the these files may be broken down into records, each of which consists of one or more fields are the basic units of data storage, and each field typically contains information pertaining to one aspect or attribute of the entity described by the database. Using keywords and various sorting commands, users can rapidly search, rearrange, group, and select the fields in many records to retrieve or create reports on particular aggregates of data.Database records and files must be organized to allow retrieval of the information. Early system were arranged sequentially (i.e., alphabetically, numerically, or chronologically); the development of direct-access storage devices made possible random access to data via indexes. Queries are the main way users retrieve database information. Typically the user provides a string of characters, and the computer searches the database for a corresponding sequence and provides the source materials in which those characters appear. A user can request, for example, all records in which the content of the field for a person’s last name is the word Smith.The many users of a large database must be able to manipulate the information within it quickly at any given time. Moreover, large business and other organizations tend to build up many independent files containing related and even overlapping data, and their data, processing activities often require the linking of data from several files.Several different types of database management systems have been developed to support these requirements: flat, hierarchical, network, relational, and object-oriented.In flat databases, records are organized according to a simple list of entities; many simple databases for personal computers are flat in structure. The records in hierarchical databases are organized in a treelike structure, with each level of records branching off into a set of smaller categories. Unlike hierarchical databases, which provide single links between sets of records at different levels, network databases create multiple linkages between sets by placing links, or pointers, to one set of records in another; the speed and versatility of network databases have led to their wide use in business. Relational databases are used where associations among files or records cannot be expressed by links; a simple flat list becomes one table, or “relation”, and multiple relations can be mathematically associated to yield desired information. Object-oriented databases store and manipulate more complex data structures, called “objects”, which are organized into hierarchical classes that may inherit properties from classes higher in the chain; this database structure is the most flexible and adaptable.The information in many databases consists of natural-language texts of documents; number-oriented database primarily contain information such as statistics, tables, financial data, and raw scientific and technical data. Small databases can be maintained on personal-computer systems and may be used by individuals at home. These and larger databases have become increasingly important in business life. Typical commercial applications include airline reservations, production management, medical records in hospitals, and legal records of insurance companies. The largest databases are usually maintained by governmental agencies, business organizations, and universities. These databases may contain texts of such materials as catalogs of various kinds. Reference databases contain bibliographies or indexes that serve as guides to the location of information in books, periodicals, and other published literature. Thousands of these publicly accessible databases now exist, covering topics ranging from law, medicine, and engineering to news and current events, games, classified advertisements, and instructional courses. Professionals such as scientists,doctors, lawyers, financial analysts, stockbrokers, and researchers of all types increasingly rely on these databases for quick, selective access to large volumes of information.一、DBMS Structuring TechniquesSequential, direct, and other file processing approaches are used to organize and structure data in single files. But a DBMS is able to integrate data elements from several files to answer specific user inquiries for information. That is, the DBMS is able to structure and tie together the logically related data from several large files.Logical Structures. Identifying these logical relationships is a job of the data administrator. A data definition language is used for this purpose. The DBMS may then employ one of the following logical structuring techniques during storage access, and retrieval operations.List structures. In this logical approach, records are linked together by the use of pointers. A pointer is a data item in one record that identifies the storage location of another logically related record. Records in a customer master file, for example, will contain the name and address of each customer, and each record in this file is identified by an account number. During an accounting period, a customer may buy a number of items on different days. Thus, the company may maintain an invoice file to reflect these transactions. A list structure could be used in this situation to show the unpaid invoices at any given time. Each record in the customer in the invoice file. This invoice record, in turn, would be linked to later invoices for the customer. The last invoice in the chain would be identified by the use of a special character as a pointer.Hierarchical (tree) structures. In this logical approach, data units are structured in multiple levels that graphically resemble an “upside down”tree with the root at the top and the branches formed below. There’s a superior-subordinate relationship in a hierarchical (tree) structure. Below the single-root data component are subordinate elements or nodes, each of which, in turn, “own”one or more other elements (or none). Each element or branch in this structure below the root has only a single owner. Thus, a customer owns an invoice, and the invoice has subordinate items. Thebranches in a tree structure are not connected.Network Structures. Unlike the tree approach, which does not permit the connection of branches, the network structure permits the connection of the nodes in a multidirectional manner. Thus, each node may have several owners and may, in turn, own any number of other data units. Data management software permits the extraction of the needed information from such a structure by beginning with any record in a file.Relational structures. A relational structure is made up of many tables. The data are stored in the form of “relations”in these tables. For example, relation tables could be established to link a college course with the instructor of the course, and with the location of the class.To find the name of the instructor and the location of the English class, the course/instructor relation is searched to get the name (“Fitt”), and the course/locati on relation is a relatively new database structuring approach that’s expected to be widely implemented in the future.Physical Structures. People visualize or structure data in logical ways for their own purposes. Thus, records R1 and R2 may always be logically linked and processed in sequence in one particular application. However, in a computer system it’s quite possible that these records that are logically contiguous in one application are not physically stored together. Rather, the physical structure of the records in media and hardware may depend not only on the I/O and storage devices and techniques used, but also on the different logical relationships that users may assign to the data found in R1and R2. For example, R1 and R2 may be records of credit customers who have shipments send to the same block in the same city every 2 weeks. From the shipping department manager’s perspective, then, R1 and R2 are sequential entries on a geographically organized shipping report. But in the A/R application, the customers represented by R1 and R2 may be identified, and their accounts may be processed, according to their account numbers which are widely separated. In short, then, the physical location of the stored records in many computer-based information systems is invisible to users.二、Database Management Features of OracleOracle includes many features that make the database easier to manage. We’ve divided the discussion in this section into three categories: Oracle Enterprise Manager, add-on packs, backup and recovery.1.Oracle Enterprise ManagerAs part of every Database Server, Oracle provides the Oracle Enterprise Manager (EM), a database management tool framework with a graphical interface used to manage database users, instances, and features (such as replication) that can provide additional information about the Oracle environment.Prior to the Oracle8i database, the EM software had to be installed on Windows 95/98 or NT-based systems and each repository could be accessed by only a single database manager at a time. Now you can use EM from a browser or load it onto Windows 95/98/2000 or NT-based systems. Multiple database administrators can access the EM repository at the same time. In the EM repository for Oracle9i, the super administrator can define services that should be displayed on other administrators’consoles, and management regions can be set up.2. Add-on packsSeveral optional add-on packs are available for Oracle, as described in the following sections. In addition to these database-management packs, management packs are available for Oracle Applications and for SAP R/3.(1) standard Management PackThe Standard Management Pack for Oracle provides tools for the management of small Oracle databases (e.g., Oracle Server/Standard Edition). Features include support for performance monitoring of database contention, I/O, load, memory use and instance metrics, session analysis, index tuning, and change investigation and tracking.(2) Diagnostics PackYou can use the Diagnostic Pack to monitor, diagnose, and maintain the health of Enterprise Edition databases, operating systems, and applications. With both historical and real-time analysis, you can automatically avoid problems before theyoccur. The pack also provides capacity planning features that help you plan and track future system-resource requirements.(3)Tuning PackWith the Tuning Pack, you can optimise system performance by identifying and tuning Enterprise Edition databases and application bottlenecks such as inefficient SQL, poor data design, and the improper use of system resources. The pack can proactively discover tuning opportunities and automatically generate the analysis and required changes to tune the systems.(4) Change Management PackThe Change Management Pack helps eliminate errors and loss of data when upgrading Enterprise Edition databases to support new applications. It impact and complex dependencies associated with application changes and automatically perform database upgrades. Users can initiate changes with easy-to-use wizards that teach the systematic steps necessary to upgrade.(5) AvailabilityOracle Enterprise Manager can be used for managing Oracle Standard Edition and/or Enterprise Edition. Additional functionality is provided by separate Diagnostics, Tuning, and Change Management Packs.3. Backup and RecoveryAs every database administrator knows, backing up a database is a rather mundane but necessary task. An improper backup makes recovery difficult, if not impossible. Unfortunately, people often realize the extreme importance of this everyday task only when it is too late –usually after losing business-critical data due to a failure of a related system.The following sections describe some products and techniques for performing database backup operations.(1) Recovery ManagerTypical backups include complete database backups (the most common type), database backups, control file backups, and recovery of the database. Previously,Oracle’s Enterprise Backup Utility (EBU) provided a similar solution on some platforms. However, RMAN, with its Recovery Catalog stored in an Oracle database, provides a much more complete solution. RMAN can automatically locate, back up, restore, and recover databases, control files, and archived redo logs. RMAN for Oracle9i can restart backups and restores and implement recovery window policies when backups expire. The Oracle Enterprise Manager Backup Manager provides a GUI-based interface to RMAN.(2) Incremental backup and recoveryRMAN can perform incremental backups of Enterprise Edition databases. Incremental backups back up only the blocks modified since the last backup of a datafile, tablespace, or database; thus, they’re smaller and faster than complete backups. RMAN can also perform point-in-time recovery, which allows the recovery of data until just prior to a undesirable event.(3) Legato Storage ManagerVarious media-management software vendors support RMAN. Oracle bundles Legato Storage Manager with Oracle to provide media-management services, including the tracking of tape volumes, for up to four devices. RMAN interfaces automatically with the media-management software to request the mounting of tapes as needed for backup and recovery operations.(4)AvailabilityWhile basic recovery facilities are available for both Oracle Standard Edition and Enterprise Edition, incremental backups have typically been limited to Enterprise Edition.Data IndependenceAn important point about database systems is that the database should exist independently of any of the specific applications. Traditional data processing applications are data dependent. COBOL programs contain file descriptions and record descriptions that carefully describe the format and characteristics of the data.Users should be able to change the structure of the database without affecting the applications that use it. For example, suppose that the requirements of yourapplications change. A simple example would be expanding ZIP codes from five digits to nine digits. On a traditional approach using COBOL programs each individual COBOL application program that used that particular field would have to be changed, recompiled, and retested. The programs would be unable to recognize or access a file that had been changed and contained a new data description; this, in turn, might cause disruption in processing unless the change were carefully planned.Most database programs provide the ability to change the database structure by simply changing the ZIP code field and the data-entry form. In this case, data independence allows for minimal disruption of current and existing applications. Users can continue to work and can even ignore the nine-digit code if they choose. Eventually, the file will be converted to the new nine-digit ZIP code, but the ease with which the changeover takes place emphasizes the importance of data independence.Data IntegrityData integrity refers to the accuracy, correctness, or validity of the data in the database. In a database system, data integrity means safeguarding the data against invalid alteration or destruction arise. The first has to do with many users accessing the database concurrently. For example, if thousands of travel agents and airline reservation clerks are accessing the database concurrently. For example, if thousands of travel agents and airline reservation clerks are accessing the same database at once, and two agents book the same seat on the same flight, the first agent’s booking will be lost. In such case the technique of locking the record or field provides the means for preventing one user from accessing a record while another user is updating the same record.The second complication relates to hardwires, software, or human error during the course of processing and involves database transactions treated as a single . For example, an agent booking an airline reservation involves several database updates (i.e., adding the passenger’s name and address and updating the seats-available field), which comprise a single transaction. The database transaction is not considered to be completed until all updates have been completed; otherwise, none of the updates will be allowed to take place.Data SecurityData security refers to the protection of a database against unauthorized or illegal access or modification. For example, a high-level password might allow a user to read from, write to, and modify the database structure, whereas a low-level password history of the modifications to a database-can be used to identify where and when a database was tampered with and it can also be used to restore the file to its original condition.三、Choosing between Oracle and SQL ServerI have to decide between using the Oracle database and WebDB vs. Microsoft SQL Server with Visual Studio. This choice will guide our future Web projects. What are the strong points of each of these combinations and what are the negatives?Lori: Making your decision will depend on what you already have. For instance, if you want to implement a Web-based database application and you are a Windows-only shop, SQL Server and the Visual Studio package would be fine. But the Oracle solution would be better with mixed platforms.There are other things to consider, such as what extras you get and what skills are required. WebDB is a content management and development tool that can be used by content creators, database administrators, and developers without any programming experience. WebDB is a browser-based tool that helps ease content creation and provides monitoring and maintenance tools. This is a good solution for organizations already using Oracle. Oracle also scales better than SQL Server, but you will need to have a competent Oracle administrator on hand.The SQL Sever/Visual Studio approach is more difficult to use and requires an experienced object-oriented programmer or some extensive training. However, you do get a fistful of development tools with Visual Studio: Visual Basic, Visual C++, and Visual InterDev for only $1,619. Plus, you will have to add the cost of the SQL Server, which will run you $1,999 for 10 clients or $3,999 for 25 clients-a less expensive solution than Oracle’s.Oracle also has a package solution that starts at $6,767, depending on the platform selected. The suite includes not only WebDB and Oracle8i butalso other tools for development such as the Oracle application server, JDeveloper, and Workplace Templates, and the suite runs on more platforms than the Microsoft solution does. This can be a good solution if you are a start-up or a small to midsize business. Buying these tools in a package is less costly than purchasing them individually.Much depends on your skill level, hardware resources, and budget. I hope this helps in your decision-making.Brooks: I totally agree that this decision depends in large part on what infrastructure and expertise you already have. If the decision is close, you need to figure out who’s going to be doing the work and what your priorities are.These two products have different approaches, and they reflect the different personalities of the two vendors. In general, Oracle products are designed for very professional development efforts by top-notch programmers and project leaders. The learning period is fairly long, and the solution is pricey; but if you stick it out you will ultimately have greater scalability and greater reliability.If your project has tight deadlines and you don’t have the time and/or money to hire a team of very expensive, very experienced developers, you may find that the Oracle solution is an easy way to get yourself in trouble. There’s nothing worse than a poorly developed Oracle application.What Microsoft offers is a solution that’s aimed at rapid development and low-cost implementation. The tools are cheaper, the servers you’ll run it on are cheaper, and the developers you need will be cheaper. Choosing SQL Sever and Visual Studio is an excellent way to start fast.Of course, there are trade-offs. The key problem I have with Visual Studio and SQL Server is that you’ll be tied to Microsoft operating systems and Intel hardware. If the day comes when you need to support hundreds of thousands of users, you really don’t have anywhere to go other than buying hundreds of servers, which is a management nightmare.If you go with the Microsoft approach, it sounds like you may not need morethan Visual Interdev. If you already know that you’re going to be developing ActiveX components in Visual Basic or Visual C++, that’s warning sign that maybe you should look at the Oracle solution more closely.I want to emphasize that, although these platforms have their relative strengths and weaknesses, if you do it right you can build a world-class application on either one. So if you have an organizational bias toward one of the vendors, by all means go with it. If you’re starting out from scratch, you’re going to have to ask yourself whether your organization leans more toward perfectionism or pragmatism, and realize that both “isms”have their faults.中文译文:数据库管理数据库(也称DataBase)也称为电子数据库,是指由计算机特别组织的用下快速查找和检索的任意的数据或信息集合。

数据库 专业英语翻译

数据库 专业英语翻译

数据库数据库(Database)是按照数据结构来组织、存储和管理数据的仓库,它产生于距今五十年前,随着信息技术和市场的发展,特别是二十世纪九十年代以后,数据管理不再仅仅是存储和管理数据,而转变成用户所需要的各种数据管理的方式。

数据库有很多种类型,从最简单的存储有各种数据的表格到能够进行海量数据存储的大型数据库系统都在各个方面得到了广泛的应用。

一.数据管理的诞生数据库的历史可以追溯到五十年前,那时的数据管理非常简单。

通过大量的分类、比较和表格绘制的机器运行数百万穿孔卡片来进行数据的处理,其运行结果在纸上打印出来或者制成新的穿孔卡片。

而数据管理就是对所有这些穿孔卡片进行物理的储存和处理。

然而,1 9 5 1 年雷明顿兰德公司(Remington Rand Inc.)的一种叫做Univac I 的计算机推出了一种一秒钟可以输入数百条记录的磁带驱动器,从而引发了数据管理的革命。

1956 年IBM生产出第一个磁盘驱动器——the Model 305 RAMAC。

此驱动器有50 个盘片,每个盘片直径是2 英尺,可以储存5MB的数据。

使用磁盘最大的好处是可以随机地存取数据,而穿孔卡片和磁带只能顺序存取数据。

数据库系统的萌芽出现于60 年代。

当时计算机开始广泛地应用于数据管理,对数据的共享提出了越来越高的要求。

传统的文件系统已经不能满足人们的需要。

能够统一管理和共享数据的数据库管理系统(DBMS)应运而生。

数据模型是数据库系统的核心和基础,各种DBMS 软件都是基于某种数据模型的。

所以通常也按照数据模型的特点将传统数据库系统分成网状数据库、层次数据库和关系数据库三类。

二.结构化查询语言(SQL)1974 年,IBM的Ray Boyce和Don Chamberlin将Codd关系数据库的12条准则的数学定义以简单的关键字语法表现出来,里程碑式地提出了SQL(Structured Query Language)语言。

数据仓库原理

数据仓库原理

数据仓库原理-by zvane 1.数据仓库概念因为,管理人员往往传统数据库以及OLTP(On-Line Transaction Processing 联机事务处理)在日常的管理事务处理中获得了巨大的成功,但是对管理人员的决策分析要求却无法满足。

希翼能够通过对组织中的大量数据进行分析,了解业务的发展趋势。

而传统数据库只保留了当前的业务处理信息,缺乏决策分析所需要的大量的历史信息。

为满足管理人员的决策分析需要,就需要在数据库的基础上产生适应决策分析的数据环境——数据仓库(Data Warehouse)。

1.1定义William H.Inmon 在1993 年所写的论著《Building the DataWarehouse》首先系统地阐述了关于数据仓库的思想、理论,为数据仓库的发展奠定了历史基石。

文中他将数据仓库定义为:A data warehouse is a subject-oriented, integrated, non-volatile, time-variant collection of data in support of management decisions.一个面向主题的、集成的、非易失性的、随时间变化的数据的集合,以用于支持管理层决策过程。

1.2特性1.2.1subject-oriented(面向主题性)面向主题表示了数据仓库中数据组织的基本原则,数据仓库中的数由数据都是环绕着某一主题组织展开的。

由于数据仓库的用户大多是企业的管理决策者,这些人所面对的往往是一些比较抽象的、层次较高的管理分析对象。

例如,企业中的客户、产品、供应商等都可以作为主题看待。

从信息管理的角度看,主题就是在一个较高的管理层次上对信息系统的数据按照某一具体的管理对象进行综合、归类所形成的分析对象。

从数据组织的角度看,主题是一些数据集合,这些数据集合对分析对象作了比较完整的、一致的描述,这种描述不仅涉及到数据自身,而且涉及到数据之间的关系。

数据仓库(外文翻译)

数据仓库(外文翻译)

DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latestmust-have marketing weapon —— a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.(3).Time-variant: Data are stored to provide information from a historical perspective(e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information onwhich an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending),(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies,(3) analyzing operations and looking for sources of profit,(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which informationfrom multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2.But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kindsof data, it is necessary to maintain separate databases.数据仓库数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。

SQL数据库外文翻译

SQL数据库外文翻译

Working with DatabasesThis chapter describes how to use SQL statements in embedded applications to control databases. There are three database statements that set up and open databases for access: SET DATABASE declares a database handle, associates the handle with an actual database file, and optionally assigns operational parameters for the database.SET NAMES optionally specifies the character set a client application uses for CHAR, VARCHAR, and text Blob data. The server uses this information totransli terate from a database’s default character set to the client’s character set on SELECT operations, and to transliterate from a client application’s character set to the database character set on INSERT and UPDATE operations.g CONNECT opens a database, allocates system resources for it, and optionally assigns operational parameters for the database.All databases must be closed before a program ends. A database can be closed by using DISCONNECT, or by appending the RELEASE option to the final COMMIT or ROLLBACK in a program. Declaring a databaseBefore a database can be opened and used in a program, it must first be declared with SET DATABASE to:CHAPTER 3 WORKING WITH DATABASES. Establish a database handle. Associate the database handle with a database file stored on a local or remote node.A database handle is a unique, abbreviated alias for an actual database name. Database handles are used in subsequent CONNECT, COMMIT RELEASE, and ROLLBACK RELEASE statements to specify which databases they should affect. Except in dynamic SQL (DSQL) applications, database handles can also be used inside transaction blocks to qualify, or differentiate, table names when two or more open databases contain identically named tables.Each database handle must be unique among all variables used in a program. Database handles cannot duplicate host-language reserved words, and cannot be InterBase reserved words.The following statement illustrates a simple database declaration:EXEC SQLSET DATABASE DB1 = ’employee.gdb’;This database declaration identifies the database file, employee.gdb, as a database the program uses, and assigns the database a handle, or alias, DB1.If a program runs in a directory different from the directory that contains the database file, then the file name specification in SET DATABASE must include a full path name, too. For example, the following SET DATABASE declaration specifies the full path to employee.gdb:EXEC SQLSET DATABASE DB1 = ’/interbase/examples/employee.gdb’;If a program and a database file it uses reside on different hosts, then the file name specification must also include a host name. The following declaration illustrates how a Unix host name is included as part of the database file specification on a TCP/IP network:EXEC SQLSET DATABASE DB1 = ’jupiter:/usr/interbase/examples/employee.gdb’;On a Windows network that uses the Netbeui protocol, specify the path as follows: EXEC SQLSET DATABASE DB1 = ’//venus/C:/Interbase/examples/employee.gdb’; DECLARING A DATABASEEMBEDDED SQL GUIDE 37Declaring multiple databasesAn SQL program, but not a DSQL program, can access multiple databases at the same time. In multi-database programs, database handles are required. A handle is used to:1. Reference individual databases in a multi-database transaction.2. Qualify table names.3. Specify databases to open in CONNECT statements.Indicate databases to close with DISCONNECT, COMMIT RELEASE, and ROLLBACK RELEASE.DSQL programs can access only a single database at a time, so database handle use is restricted to connecting to and disconnecting from a database.In multi-database programs, each database must be declared in a separate SET DATABASE statement. For example, the following code contains two SET DATABASE statements:. . .EXEC SQLSET DATABASE DB2 = ’employee2.gdb’;EXEC SQLSET DATABASE DB1 = ’employee.gdb’;. . .4Using handles for table namesWhen the same table name occurs in more than one simultaneously accessed database, a database handle must be used to differentiate one table name from another. The database handle is used as a prefix to table names, and takes the formhandle.table.For example, in the following code, the database handles, TEST and EMP, are used to distinguish between two tables, each named EMPLOYEE:. . .EXEC SQLDECLARE IDMATCH CURSOR FORSELECT TESTNO INTO :matchid FROM TEST.EMPLOYEEWHERE TESTNO > 100;EXEC SQLDECLARE EIDMATCH CURSOR FORSELECT EMPNO INTO :empid FROM EMP.EMPLOYEEWHERE EMPNO = :matchid;. . .CHAPTER 3 WORKING WITH DATABASES38 INTERBASE 6IMPORTANTThis use of database handles applies only to embedded SQL applications. DSQL applications cannot access multiple databases simultaneously.4Using handles with operationsIn multi-database programs, database handles must be specified in CONNECT statements to identify which databases among several to open and prepare for use in subsequent transactions.Database handles can also be used with DISCONNECT, COMMIT RELEASE, and ROLLBACKRELEASE to specify a subset of open databases to close.To open and prepare a database w ith CONNECT, see “Opening a database” on page 41.To close a database with DISCONNECT, COMMIT RELEASE, or ROLLBACK RELEASE, see“Closing a database” on page 49. To learn more about using database handles in transactions, see “Accessing an open database” on p age 48.Preprocessing and run time databasesNormally, each SET DATABASE statement specifies a single database file to associate with a handle. When a program is preprocessed, gpre uses the specified file to validate the program’s table and column referenc es. Later, when a user runs the program, the same database file is accessed. Different databases can be specified for preprocessing and run time when necessary.4Using the COMPILETIME clause A program can be designed to run against any one of several identically structured databases. In other cases, the actual database that a program will use at runtime is not available when a program is preprocessed and compiled. In such cases, SET DATABASE can include a COMPILETIME clause to specify a database for gpre to test against during preprocessing. For example, the following SET DATABASE statement declares that employee.gdb is to be used by gpre during preprocessing: EXEC SQLSET DATABASE EMP = COMPILETIME ’employee.gdb’;IMPORTANTThe file specification that follows the COMPILETIME keyword must always be a hard-coded, quoted string.DECLARING A DATABASEEMBEDDED SQL GUIDE 39When SET DATABASE uses the COMPILETIME clause, but no RUNTIME clause, and does not specify a different database file specification in a subsequent CONNECT statement, the same database file is used both for preprocessing and run time. To specify different preprocessing and runtime databases with SET DATABASE, use both the COMPILETIME andRUNTIME clauses.4Using the RUNTIME clauseWhen a database file is specified for use during preprocessing, SET DATABASE can specify a different database to use at run time by including the RUNTIME keyword and a runtime file specification:EXEC SQLSET DATABASE EMP = COMPILETIME ’employee.gdb’RUNTIME ’employee2.gdb’;The file specification that follows the RUNTIME keyword can be either ahard-coded, quoted string, or a host-language variable. For example, the following C code fragment prompts the user for a database name, and stores the name in a variable that is used later in SET DATABASE:. . .char db_name[125];. . .printf("Enter the desired database name, including node and path):\n");gets(db_name);EXEC SQLSET DATABASE EMP = COMPILETIME ’employee.gdb’ RUNTIME : db_name; . . .Note host-language variables in SET DATABASE must be preceded, as always, by a colon.Controlling SET DATABASE scopeBy default, SET DATABASE creates a handle that is global to all modules in an application.A global handle is one that may be referenced in all host-language modules comprising the program. SET DATABASE provides two optional keywords to change the scope of a declaration:g STATIC limits declaration scope to the module containing the SET DATABASE statement. No other program modules can see or use a database handle declared STATIC.CHAPTER 3 WORKING WITH DATABASES40 INTERBASE 6EXTERN notifies gpre that a SET DATABASE statement in a module duplicates a globally-declared database in another module. If the EXTERN keyword is used, then another module must contain the actual SET DATABASE statement, or an error occurs during compilation.The STATIC keyword is used in a multi-module program to restrict database handle access to the single module where it is declared. The following example illustrates the use of theSTATIC keyword:EXEC SQLSET DATABASE EMP = STATIC ’employee.gdb’;The EXTERN keyword is used in a multi-module program to signal that SET DATABASE in one module is not an actual declaration, but refers to a declaration made in a different module. Gpre uses this information during preprocessing. The following example illustrates the use of the EXTERN keyword:EXEC SQLSET DATABASE EMP = EXTERN ’employee.gdb’;If an application contains an EXTERN reference, then when it is used at run time, the actual SET DATABASE declaration must be processed first, and the database connected before other modules can access it.A single SET DATABASE statement can contain either the STATIC or EXTERN keyword, but not both. A scope declaration in SET DATABASE applies to both COMPILETIME and RUNTIME databases.Specifying a connection character setWhen a client application connects to a database, it may have its own character set requirements. The server providing database access to the client does not know about these requirements unless the client specifies them. The client application specifies its character set requirement using the SET NAMES statement before it connects to the database.SET NAMES specifies the character set the server should use when translating data from the database to the client application. Similarly, when the client sends data to the database, the server translates the data from the client’s character set to the database’s default character set (or the character set for an individual column if it differs from the databas e’s default character set). For example, the following statements specify that the client is using the DOS437 character set, then connect to the database:EXEC SQLOPENING A DATABASEEMBEDDED SQL GUIDE 41SET NAMES DOS437;EXEC SQLCONNECT ’europe.gdb’ USER ’JAMES’ PASSWORD ’U4EEAH’;For more information about character sets, see the Data Definition Guide. For the complete syntax of SET NAMES and CONNECT, see the Language Reference. Opening a databaseAfter a database is declared, it must be attached with a CONNECT statement before it can be used. CONNECT:1. Allocates system resources for the database.2. Determines if the database file is local, residing on the same host where the application itself is running, or remote, residing on a different host.3. Opens the database and examines it to make sure it is valid.InterBase provides transparent access to all databases, whether local or remote. If the database structure is invalid, the on-disk structure (ODS) number does not correspond to the one required by InterBase, or if the database is corrupt, InterBase reports an error, and permits no further access. Optionally, CONNECT can be used to specify:4. A user name and password combination that is checked against the server’s security database before allowing the connect to succeed. User names can be up to 31 characters.Passwords are restricted to 8 characters.5. An SQL role name that the user adopts on connection to the database, provided that the user has previously been granted membership in the role. Regardless of role memberships granted, the user belongs to no role unless specified with this ROLE clause.The client can specify at most one role per connection, and cannot switch roles except by reconnecting.6. The size of the database buffer cache to allocate to the application when the default cache size is inappropriate.Using simple CONNECT statementsIn its simplest form, CONNECT requires one or more database parameters, each specifying the name of a database to open. The name of the database can be a: Database handle declared in a previous SET DATABASE statement.CHAPTER 3 WORKING WITH DATABASES42 INTERBASE 61. Host-language variable.2. Hard-coded file name.4Using a database handleIf a program uses SET DATABASE to provide database handles, those handles should be used in subsequent CONNECT statements instead of hard-coded names. For example,. . .EXEC SQLSET DATABASE DB1 = ’employee.gdb’;EXEC SQLSET DATABASE DB2 = ’employee2.gdb’;EXEC SQLCONNECT DB1;EXEC SQLCONNECT DB2;. . .There are several advantages to using a database handle with CONNECT:1. Long file specifications can be replaced by shorter, mnemonic handles.2. Handles can be used to qualify table names in multi-database transactions. DSQL applications do not support multi-database transactions.3. Handles can be reassigned to other databases as needed.4. The number of database cache buffers can be specified as an additional CONNECT parameter.For more information about setting the number of database cache buffers, see “Setting d atabase cache buffers” on page 47. 4Using strings or host-language variables Instead of using a database handle, CONNECT can use a database name supplied at run time. The database name can be supplied as either a host-language variable or a hard-coded, quoted string.The following C code demonstrates how a program accessing only a single database might implement CONNECT using a file name solicited from a user at run time:. . .char fname[125];. . .printf(’Enter the desired database name, including nodeand path):\n’);OPENING A DATABASEEMBEDDED SQL GUIDE 43gets(fname);. . .EXEC SQLCONNECT :fname;. . .TipThis technique is especially useful for programs that are designed to work with many identically structured databases, one at a time, such as CAD/CAM or architectural databases.MULTIPLE DATABASE IMPLEMENTATIONTo use a database specified by the user as a host-language variable in a CONNECT statement in multi-database programs, follow these steps:1. Declare a database handle using the following SET DATABASE syntax:EXEC SQLSET DATABASE handle = COMPILETIME ’ dbname’;Here, handle is a hard-coded database handle supplied by the programmer, dbnameis a quoted, hard-coded database name used by gpre during preprocessing.2. Prompt the user for a database to open.3. Store the database name entered by the user in a host-language variable.4. Use the handle to open the database, associating the host-language variable with the handle using the following CONNECT syntax:EXEC SQLCONNECT : variable AS handle;The following C code illustrates these steps:. . .char fname[125];. . .EXEC SQLSET DATABASE DB1 = ’employee.gdb’;printf("Enter the desired database name, including nodeand path):\n");gets(fname);EXEC SQLCONNECT :fname AS DB1;. . .CHAPTER 3 WORKING WITH DATABASES44 INTERBASE 6In this example, SET DATABASE provides a hard-coded database file name for preprocessing with gpre. When a user runs the program, the database specified in the variable, fname, is used instead. 4Using a hard-coded database namesIN SINGE-DATABASE PROGRAMSIn a single-database program that omits SET DATABASE, CONNECT must contain a hard-coded, quoted file name in the following format:EXEC SQLCONNECT ’[ host[ path]] filename’; host is required only if a program and a dat abase file it uses reside on different nodes.Similarly, path is required only if the database file does not reside in the current working directory. For example, the following CONNECT statement contains ahard-coded file name that includes both a Unix host name and a path name:EXEC SQLCONNECT ’valdez:usr/interbase/examples/employee.gdb’;Note Host syntax is specific to each server platform.IMPORTANT A program that accesses multiple databases cannot use this form of CONNECT.IN MULTI-DATABASE PROGRAMSA program that accesses multiple databases must declare handles for each of them in separate SET DATABASE statements. These handles must be used in subsequent CONNECT statements to identify specific databases to open:. . .EXEC SQLSET DATABASE DB1 = ’employee.gdb’;EXEC SQLSET DATABASE DB2 = ’employee2.gdb’;EXEC SQLCONNECT DB1;EXEC SQLCONNECT DB2;. . .Later, when the program closes these databases, the database handles are no longer in use. These handles can be reassigned to other databases by hard-coding a file name in a subsequent CONNECT statement. For example,OPENING A DATABASEEMBEDDED SQL GUIDE 45. . .EXEC SQLDISCONNECT DB1, DB2;EXEC SQLCONNECT ’project.gdb’ AS DB1;. . .Additional CONNECT syntaxCONNECT supports several formats for opening databases to provide programming flexibility. The following table outlines each possible syntax, provides descriptions and examples, and indicates whether CONNECT can be used in programs that access single or multiple databases:For a complete discussion of CONNECT syntax and its uses, see the Language Reference.Syntax Description ExampleSingle accessMultiple accessCONNECT ‘dbfile’; Opens a single, hard-coded database file, dbfile.EXEC SQLCONNECT ‘employee.gdb’;Yes NoCONNECT handle; Opens the database file associated with a previously declared database handle. This is the preferred CONNECT syntax.EXEC SQLCONNECT EMP;Yes YesCONNECT ‘dbfile’ AS handle;Opens a hard-coded database file, dbfile, and assigns a previously declared database handle to it.EXEC SQLCONNECT ‘employee.gdb’AS EMP;Yes Yes CONNECT :varname AS handle;Opens the database file stored in the host-language variable, varname, and assigns a previously declared database handle to it.EXEC SQL CONNECT :fname AS EMP;Yes YesTABLE 3.1 CONNECT syntax summaryCHAPTER 3 WORKING WITH DATABASES46 INTERBASE 6Attaching to multiple databasesCONNECT can attach to multiple databases. To open all databases specified in previous SETDATABASE statements, use either of the following CONNECT syntax options: EXEC SQLCONNECT ALL;EXEC SQLCONNECT DEFAULT;CONNECT can also attach to a specified list of databases. Separate each database request from others with commas. For example, the following statement opens two databases specified by their handles:EXEC SQLCONNECT DB1, DB2;The next statement opens two hard-coded database files and also assigns them to previously declared handles:EXEC SQLCONNECT ’employee.gdb’ AS DB1, ’employee2.gdb’ AS DB2;Tip Opening multiple databases with a single CONNECT is most effective when a program’s database access is simple and clear. In complex programs that open and close several databases, that substitute database names with host-language variables, or that assign multiple handles to the same database, use separate CONNECT statements to make program code easier to read, debug, and modify.Handling CONNECT errors. The WHENEVER statement should be used to trap and handle runtime errors that occur during database declaration. The following C code fragment illustrates an error-handling routine that displays error messages and ends the program in an orderly fashion:. . .EXEC SQLWHENEVER SQLERRORGOTO error_exit;. . .OPENING A DATABASEEMBEDDED SQL GUIDE 47:error_exitisc_print_sqlerr(sqlcode, status_vector);EXEC SQLDISCONNECT ALL;exit(1);. . .For a complete discussion of SQL error handling, see Chapter 12, “Error Handling and Recovery.”数据库的工作这章描述怎样使用在嵌入式应用过程中的SQL语句控制数据库。

SAP数据仓库帮助目录翻译

SAP数据仓库帮助目录翻译
Analysis and Repair Environment .分析和修理环境
Archivation of Request Administration Data .请求管理数据的档案集
BI Background Management .BI后台管理
Clients in BW .BW的客户
Transferring Transaction Data Using Web Services (RDA) .使用WEB服务传送和处理事务数据
Monitor for Real-Time Data Acquisition .实时数据获取的监视(监控)
Troubleshooting Real-Time Data Acquisition. 实时数据获取的困难解决
InfoSource .信息源。
Migration of Update Rules, 3.x InfoSources, and Transfer Rules .对更新规则、 3.x 信息源和传输规则的迁移。
Old Transformation Concept .旧的转型概念。
Further Processing Data .进一步处理数据。
Data Warehousing .数据仓库。
ห้องสมุดไป่ตู้
The Data Warehouse Concept .数据仓库概念。
Using a Data Warehouse .使用数据仓库。
Architecture of a Data Warehouse .数据仓库的构建。
Enterprise Data Warehouse (EDW) .企业数据仓库(EDW)。
Controlling Real-Time Data Acquisition with Process Chains .带处理链的实时数据获取控制
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

数据仓库数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。

大量组织机构已经发现,在当今这个充满竞争、快速发展的世界,数据仓库是一个有价值的工具。

在过去的几年中,许多公司已花费数百万美元,建立企业范围的数据仓库。

许多人感到,随着工业竞争的加剧,数据仓库成了必备的最新营销武器——通过更多地了解客户需求而保住客户的途径。

“那么”,你可能会充满神秘地问,“到底什么是数据仓库?”数据仓库已被多种方式定义,使得很难严格地定义它。

宽松地讲,数据仓库是一个数据库,它与组织机构的操作数据库分别维护。

数据仓库系统允许将各种应用系统集成在一起,为统一的历史数据分析提供坚实的平台,对信息处理提供支持。

按照W. H. Inmon,一位数据仓库系统构造方面的领头建筑师的说法,“数据仓库是一个面向主题的、集成的、时变的、非易失的数据集合,支持管理决策制定”。

这个简短、全面的定义指出了数据仓库的主要特征。

四个关键词,面向主题的、集成的、时变的、非易失的,将数据仓库与其它数据存储系统(如,关系数据库系统、事务处理系统、和文件系统)相区别。

让我们进一步看看这些关键特征。

(1)、面向主题的:数据仓库围绕一些主题,如顾客、供应商、产品和销售组织。

数据仓库关注决策者的数据建模与分析,而不是构造组织机构的日常操作和事务处理。

因此,数据仓库排除对于决策无用的数据,提供特定主题的简明视图。

(2)、集成的:通常,构造数据仓库是将多个异种数据源,如关系数据库、一般文件和联机事务处理记录,集成在一起。

使用数据清理和数据集成技术,确保命名约定、编码结构、属性度量的一致性等。

(3)、时变的:数据存储从历史的角度(例如,过去5-10 年)提供信息。

数据仓库中的关键结构,隐式或显式地包含时间元素。

(4)、非易失的:数据仓库总是物理地分离存放数据;这些数据源于操作环境下的应用数据。

由于这种分离,数据仓库不需要事务处理、恢复和并行控制机制。

通常,它只需要两种数据访问:数据的初始化装入和数据访问。

概言之,数据仓库是一种语义上一致的数据存储,它充当决策支持数据模型的物理实现,并存放企业决策所需信息。

数据仓库也常常被看作一种体系结构,通过将异种数据源中的数据集成在一起而构造,支持结构化和启发式查询、分析报告和决策制定。

“好”,你现在问,“那么,什么是建立数据仓库?”根据上面的讨论,我们把建立数据仓库看作构造和使用数据仓库的过程。

数据仓库的构造需要数据集成、数据清理、和数据统一。

利用数据仓库常常需要一些决策支持技术。

这使得“知识工人”(例如,经理、分析人员和主管)能够使用数据仓库,快捷、方便地得到数据的总体视图,根据数据仓库中的信息做出准确的决策。

有些作者使用术语“建立数据仓库”表示构造数据仓库的过程,而用术语“仓库DBMS”表示管理和使用数据仓库。

我们将不区分二者。

“组织机构如何使用数据仓库中的信息?”许多组织机构正在使用这些信息支持商务决策活动,包括:(1)、增加顾客关注,包括分析顾客购买模式(如,喜爱买什么、购买时间、预算周期、消费习惯);(2)、根据季度、年、地区的营销情况比较,重新配置产品和管理投资,调整生产策略;(3)、分析运作和查找利润源;(4)、管理顾客关系、进行环境调整、管理合股人的资产开销。

从异种数据库集成的角度看,数据仓库也是十分有用的。

许多组织收集了形形色色数据,并由多个异种的、自治的、分布的数据源维护大型数据库。

集成这些数据,并提供简便、有效的访问是非常希望的,并且也是一种挑战。

数据库工业界和研究界都正朝着实现这一目标竭尽全力。

对于异种数据库的集成,传统的数据库做法是:在多个异种数据库上,建立一个包装程序和一个集成程序(或仲裁程序)。

这方面的例子包括IBM 的数据连接程序和Informix的数据刀。

当一个查询提交客户站点,首先使用元数据字典对查询进行转换,将它转换成相应异种站点上的查询。

然后,将这些查询映射和发送到局部查询处理器。

由不同站点返回的结果被集成为全局回答。

这种查询驱动的方法需要复杂的信息过滤和集成处理,并且与局部数据源上的处理竞争资源。

这种方法是低效的,并且对于频繁的查询,特别是需要聚集操作的查询,开销很大。

对于异种数据库集成的传统方法,数据仓库提供了一个有趣的替代方案。

数据仓库使用更新驱动的方法,而不是查询驱动的方法。

这种方法将来自多个异种源的信息预先集成,并存储在数据仓库中,供直接查询和分析。

与联机事务处理数据库不同,数据仓库不包含最近的信息。

然而,数据仓库为集成的异种数据库系统带来了高性能,因为数据被拷贝、预处理、集成、注释、汇总,并重新组织到一个语义一致的数据存储中。

在数据仓库中进行的查询处理并不影响在局部源上进行的处理。

此外,数据仓库存储并集成历史信息,支持复杂的多维查询。

这样,建立数据仓库在工业界已非常流行。

1.操作数据库系统与数据仓库的区别由于大多数人都熟悉商品关系数据库系统,将数据仓库与之比较,就容易理解什么是数据仓库。

联机操作数据库系统的主要任务是执行联机事务和查询处理。

这种系统称为联机事务处理(OLTP)系统。

它们涵盖了一个组织的大部分日常操作,如购买、库存、制造、银行、工资、注册、记帐等。

另一方面,数据仓库系统在数据分析和决策方面为用户或“知识工人”提供服务。

这种系统可以用不同的格式组织和提供数据,以便满足不同用户的形形色色需求。

这种系统称为联机分析处理(OLAP)系统。

OLTP 和OLAP 的主要区别概述如下。

(1)、用户和系统的面向性:OLTP 是面向顾客的,用于办事员、客户、和信息技术专业人员的事务和查询处理。

OLAP 是面向市场的,用于知识工人(包括经理、主管、和分析人员)的数据分析。

(2)、数据内容:OLTP 系统管理当前数据。

通常,这种数据太琐碎,难以方便地用于决策。

OLAP 系统管理大量历史数据,提供汇总和聚集机制,并在不同的粒度级别上存储和管理信息。

这些特点使得数据容易用于见多识广的决策。

(3)、数据库设计:通常,OLTP 系统采用实体-联系(ER)模型和面向应用的数据库设计。

而OLAP 系统通常采用星形或雪花模型和面向主题的数据库设计。

(4)、视图:OLTP 系统主要关注一个企业或部门内部的当前数据,而不涉及历史数据或不同组织的数据。

相比之下,由于组织的变化,OLAP 系统常常跨越数据库模式的多个版本。

OLAP 系统也处理来自不同组织的信息,由多个数据存储集成的信息。

由于数据量巨大,OLAP 数据也存放在多个存储介质上。

(5)、访问模式:OLTP 系统的访问主要由短的、原子事务组成。

这种系统需要并行控制和恢复机制。

然而,对OLAP 系统的访问大部分是只读操作(由于大部分数据仓库存放历史数据,而不是当前数据),尽管许多可能是复杂的查询。

OLTP 和OLAP 的其它区别包括数据库大小、操作的频繁程度、性能度量等。

2.但是,为什么需要一个分离的数据仓库“既然操作数据库存放了大量数据”,你注意到,“为什么不直接在这种数据库上进行联机分析处理,而是另外花费时间和资源去构造一个分离的数据仓库?”分离的主要原因是提高两个系统的性能。

操作数据库是为已知的任务和负载设计的,如使用主关键字索引和散列,检索特定的记录,和优化“罐装的”查询。

另一方面,数据仓库的查询通常是复杂的,涉及大量数据在汇总级的计算,可能需要特殊的数据组织、存取方法和基于多维视图的实现方法。

在操作数据库上处理OLAP 查询,可能会大大降低操作任务的性能。

此外,操作数据库支持多事务的并行处理,需要加锁和日志等并行控制和恢复机制,以确保一致性和事务的强健性。

通常,OLAP 查询只需要对数据记录进行只读访问,以进行汇总和聚集。

如果将并行控制和恢复机制用于这种OLAP 操作,就会危害并行事务的运行,从而大大降低OLTP 系统的吞吐量。

最后,数据仓库与操作数据库分离是由于这两种系统中数据的结构、内容和用法都不相同。

决策支持需要历史数据,而操作数据库一般不维护历史数据。

在这种情况下,操作数据库中的数据尽管很丰富,但对于决策,常常还是远远不够的。

决策支持需要将来自异种源的数据统一(如,聚集和汇总),产生高质量的、纯净的和集成的数据。

相比之下,操作数据库只维护详细的原始数据(如事务),这些数据在进行分析之前需要统一。

由于两个系统提供很不相同的功能,需要不同类型的数据,因此需要维护分离的数据库。

Data warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latest must-have marketing weapon ——a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling andanalysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.(3).Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending),(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies,(3) analyzing operations and looking for sources of profit,(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging.Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1. Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2. But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases.。

相关文档
最新文档