teradata_参考资料(某著名外企内部培训所用资料)

合集下载

Teradata总体概述

图 8-16 Teradata 内部并行处理机制说明这里假设系统配置有 4 个虚拟处理器(VPROC)，某个复杂查询被优化器分
解成了 7 个步骤，图中 SUPPLIERS、PARTS、PARTSUPP 等为数据库中表的名字。在每个步骤执行时，4 个 VPROC 同时处理与各自相关的数据块，例如搜索 SUPPLIERS 表(步骤 1.1)，该表的记录是通过 HASH 算法均匀分布在四个 VPROC 各自负责的磁盘中的，搜索时 4 个 VPROC 将同时进行，把相关的记录搜索出来，这就是所谓的查询并行；步骤 1.1 和 1.2、2.1 和 2.2 也是同时执行的，这是所谓的多步并行；步骤 2.2(或步骤 1.2)中包含有三个操作，它们借助于一种管线(Pipeline)的机制实现了步内的并行处理。除了上面描述的多维并行处理机制外，Teradata 还作了进一步的优化和扩展，使得处理复杂查询时响应速度进一步加快。举例来说，在多用户环境下，一个部门中许多用户的查询常常是大同小异的，经过优化器分解后，它们具有一些相同的步骤，由于每个步骤的执行结果会在一个系统缓冲区中暂存，相同的步骤往往只需要执行一次即可。从而大大减少了磁盘 I/O，提高了响应速度。我们知道，对于 OLTP 系统来说，由于其查询相对简单，依靠建立适当的索引就能保证查询的速度，从而对 RDBMS 并行处理的能力要求不高。但对于数据仓库来说，它主要提供的是 OLAP 应用，许多业务问题相当复杂，如果依靠索引来提高查询速度，将存在两方面的问题：一是索引过多会占用太多的磁盘空间，增加系统的复杂性和管理成本。许多 OLTP RDBMS 用于数据仓库时，其磁盘使用率(Disk Ratio，指数据库大小与真正的用户数据的比例)在 5 以上，有时甚至高达 10，原因就在于此。而基于 Teradata 建立的数据仓库，磁盘使用率一般在 1.5 至 3 之间。二是建立一个索引意味着事先定义好一些与之相关的问题，当提出其它问题时常常需要建立另外的索引。也就是说，索引只能解决那些预先定义好的问题，如一些业务报表等。而数据仓库除了要产生大量的业务报表外，另一个主要的应用就是回答那些不能预知的、动态的业务查询，我们称这种动态查询为 Ad-hoc 查询。你无法想象当管理人员提出一个问题时，DBA 回答说：“对不起，我没想到你会提这个问题，请稍等一些，我建个索引就可以了”。因此， RDBMS 具有强大的并行处理能力是数据仓库应用成功与否的关键。 Teradata 从诞生之日开始，就是专门针对决策支持应用而设计的，它的专长不在于 OLTP，而在于数据的综合分析和处理，其内部的并行处理机制被设计得十分完善。目前，NCR 公司已经在全世界为各行各业的用户成功地实施了 1000

Teradata学习笔记

基本知识Teradata 存取架构PE （Parsing Engine解析引擎）作用：把SQL 命令转换成AMP 可识别的消息，接收且传递数据MPL （Message Passing Layer）消息传输层负责分发消息给合适的AMPAMP （Access Module Processor ）访问计算单元为Teradata的最小逻辑处理单元，直接负责起所负责的磁盘数据的读写工作从PE接收命令然后读写VKVK （Virtual Disk）一般有个以上的物理磁盘组成AMP 工作模式：◆每张表的的行被平均分散到所有的AMP上◆每个AMP控制一个逻辑存储（VK），其有多个物理磁盘组成◆每个AMP 只管理其自己VK 上的数据◆一个数据库AMP的数量可能会很多有几百个以上◆全表扫描操作所有的AMP会并行工作去扫描自己所管理的数据Teradata 线性扩展能力Teradata 的对象tables, views, macros, triggers, stored procedures, user-defined functions, or indexes (join and hash).Macros （宏）：预先定义的一组SQL语句，用于常用来执行的。

只能拥有一个事物被存储于字典表中接受参数控制接受交互式Teradata 一些命令●Help：用来显示数据库对象的信息●Show：显示数据库对象的DDL信息●Explain ：显示SQL语句的执行计划EXPLAIN SELECT last_name, department_number FROM Employee; Explanation (full)---------------------------------------------------------------------------1) First, we lock a distinct CUSTOMER_SERVICE."pseudo table" for read on a RowHash to prevent global deadlock for CUSTOMER_SERVICE.Employee.2) Next, we lock CUSTOMER_SERVICE.Employee for read.3) We do an all-AMPs RETRIEVE step from CUSTOMER_SERVICE.Employee by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated to be 24 rows. The estimated time for this step is 0.15 seconds.4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0 hours and 0.15 seconds.Teradata 数据库架构Teradata and MPP SystemsThe BYNET (BanYan NETwork) 是一个软硬件结合的产品，为Teradata MPP(Massively Parallel Processing) 系统提供高性能网络交换能力。

Teradata基础教程(中文)

Teradata SQL基础教程第一章关系数据库基础1.1关系数据库模型关系数据库理论最早是由Codd博士提出的，一个关系的数学描述其实就是一个二维表，这些二维表按照业务运行的规律组合起来，就是关系数据库模型。

这种模型可以简洁地表达出企业或机构的业务运作规律，抓住事物本质，因此非常实用。

每个二维表被称为一个实体(Entity)，它可以是人、地点或者某种事物等。

表中的每个列被称为属性(Attribute)或者字段(Field)，表中的每一行代表了该实体的一个特定实例，称为记录(Record)。

表1-1、1-2和1-3分别给出了一个雇员表、部门表和工作表的实例。

表1-1 雇员表(Employee Table)EMPLOYEE NUMBER MANAGEREMPLOYEENUMBERDEPARTMENT NUMBERJOBCODELASTNAMEFIRSTNAMEHIREDATEBIRTHDATESALARYAMOUNTPK FK FK FK1018 1017 501 512101RatzlaffLarry1978-07-151954-05-3154000.00 1022 1003 401 412102MachadoAlbert1979-03-011957-07-1432300.00 1014 1011 402 422101CraneRobert1978-01-151960-07-0424500.00 1003 801 401 411100TraderJames1976-07-311947-06-1937850.00 1007 1005 403 432101VillegasArnando1977-01-021937-01-3149700.00 1010 1003 401 412101RogersFrank1977-03-011935-04-2346000.00 表1-2 部门表(Department Table). 1 .department_number department_name budget_amount manager_employee_number PK FK308000.001011support402 software982300.001003support401 customer1025293800.00201 technicaloperations801100 president 400000.001017308000.00501 marketingsales1005403 education 932000.00表1-3 工作表(Job Table)job_code description hourly_billing_rate hourly_cost_rate PK421100 Manager - Software Support 0.00 0.00Rep 0.00 0.00512101 Sales511100 Manager - Marketing Sales 0.00 0.00Engineer 0.00 0.00312101 Software411100 Manager - Customer Support 0.00 0.00431100 Manager - Education 0.00 0.00413201 Dispatcher 0.00 0.00432101 Instructor 0.00 0.00Analyst 0.00 0.00422101 Software321100 Manager - Product Planning 0.00 0.00在一个关系数据库模型中，表和表之间是有关联的，这种关联常用所谓的E-R 图(Entity-Relationship Diagram)来表示。

企业培训师复习资料（三级）

企业培训师复习资料（三级）第一章岗位职务描述第一模块能力点1-岗位职务描述基础工作的基本环节：1．确定岗位职务分析对象2．成立岗位职务分析描述工作组3．制订具体工作计划4．搜集有关资料、文件等5．制定岗位职务分析图表6．对岗位职务分析人员进行培训7．按工作规模、专业技术要求划分工作小组8．工作人员填写各种岗位描述表格9．对所填写的表格进行审核10．打印、上报、审定11．制成岗位职务分析档案能力点2 -把握好岗位信息搜集工作的要点：１．信息搜集要尽量全面２．信息搜集要准确３．信息的整理要清楚能力点3-岗位职务描述信息搜集常用的方法：１．观察法２．问卷调查法３．面谈法４．工作实践与工作日写实法５．功能性工作分析法６．关键事件法能力点4-正确认识岗位职务描述的基本内容１．岗位职务基本情况描述２．生产活动的主要内容或范围描述３．岗位职务设备与技术条件支持方面的描述４．员工匹配描述知识点一、岗位职务描述的含义：即按照职业、工种属性要求，并结合一定的相对独立活动组织所承担活动内容的要求，描述出某个特定岗位所应具备的综合素质元素的图表，即称岗位职位描述。

知识点二、岗位职务描述的作用：１．提高经济效益与工作效率２．使技术投入更加合理３．双向选择的依据４．组织内部用人的标准５．培训大纲的功能６．绩效评估的尺度知识点三、岗位职务描述的基本原则：１．实用性原则２．专家行为原则３．个性化原则４．动态管理原则知识点四、岗位职务描述涉及的基本概念：１．关于基本单位（包含微动作、元素、任务、职责）２．岗位（又称职位，构成一个员工全部工作任务和责任的集合）３．工种（是活动对象或劳动对象的分类称谓，也称工作种类）４．职业（是指具有一定工作能力的人，为获取生活所得，运用个人能力进行活动（或工作）的范围）职业的三大特性：社会性特征、经济性特征、技能性特征知识点五、岗位职务描述与职业培训工作的关系P17第二模块能力点一、填写岗位职务描述工具表Ｐ２０能力点二、岗位职务档案的使用技巧１．针对性２．灵活性３．培训导向性４．进一步细化５．跟踪评估知识点一、岗位职务描述工具表的基本类型：P22职业明细表岗位职务明细表岗位操作明细表知识点二、岗位设定和职务分析１．关于岗位(4个因素)２．关于职务分析知识点三、影响岗位职务分析的主要因素：１．专家的素质２．专家队伍构成因素３．信息搜集的准确度４．组织管理与过程控制第二章培训项目开发第一模块培训工作与企业发展能力点、提升对培训认识能力的方法与技巧１．通过持续学习提高认识能力２．通过亲身实践提高认识能力３．通过评估培训效果提高认识能力知识点一、企业发展的要素：企业是指从事生产、服务活动，向社会提供产品或劳务，并获取经济效益的独立经济单位企业发展的要素包括：１．人员要素２．信息要素３．时间要素４．资金要素５．商品与服务项目要素６．场所与设施要素知识点二、人员素质的内涵：企业人员素质，包括个体素质和群体素质两方面：１．从个体素质看。

大数据职称考试教材资料

大数据考试职称系统研发-中级教材一、大数据法律法规、政策性文件、相关标准1.1法律法规1.1.1民法典中隐私权和个人信息保护有关内容原文概述1）隐私权基本权利《中华人民共和国民法典》（以下简称“民法典”）第一千零三十二条规定：“自然人享有隐私权。

任何组织或者个人不得以刺探、侵扰、泄露、公开等方式侵害他人的隐私权。

隐私是自然人的私人生活安宁和不愿为他人知晓的私密空间、私密活动、私密信息。

”此条款确立了隐私权作为自然人基本权利的法律地位，并明确了隐私权的核心内容及其保护范围。

2）隐私权具体保护民法典在保护隐私权方面，进一步细化了相关保护措施。

例如，第一千零三十三条规定了侵害隐私权的具体行为，包括：“ （一）以电话、短信、即时通讯工具、电子邮件、传单等方式侵扰他人的私人生活安宁；（二）进入、拍摄、窥视他人的住宅、宾馆房间等私密空间；（三）拍摄、窥视、窃听、公开他人的私密活动；（四）拍摄、窥视他人身体的私密部位；（五）处理他人的私密信息；（六）以其他方式侵害他人的隐私权。

”这些规定为隐私权的司法实践提供了具体指导。

3）个人信息保护关于个人信息保护，民法典第一千零三十四条规定：“自然人的个人信息受法律保护。

个人信息是以电子或者其他方式记录的能够单独或者与其他信息结合识别特定自然人的各种信息，包括自然人的姓名、出生日期、身份证件号码、生物识别信息、住址、电话号码、电子邮箱、健康信息、行踪信息等。

个人信息中的私密信息，适用有关隐私权的规定；没有规定的，适用有关个人信息保护的规定。

”此条款明确了个人信息的定义及其与隐私权的关系，为个人信息保护提供了法律基础。

4）处理个人信息原则民法典在个人信息处理方面，确立了一系列原则。

例如，第一千零三十五条规定：“处理个人信息的，应当遵循合法、正当、必要原则，不得过度处理，并符合下列条件：（一）征得该自然人或者其监护人同意，但是法律、行政法规另有规定的除外；（二）公开处理信息的规则；（三）明示处理信息的目的、方式和范围；（四）不违反法律、行政法规的规定和双方的约定。

Teradata高级文档

Tera Blog 收藏Teradata SQL调优1.优化过程：依照运行时间，数据量和复杂度来定位瓶颈。

查看sql执行计划，判断其合理性。

性能监控==》目标选取==》性能分析==》过程优化==》运行跟踪（性能监控）注意：每个过程中都会产生必须的文档2.性能分析：? Review PDM --表定义--PI的选择--表的记录数与空间占用? Review SQL --关联的表--逻辑处理复杂度--整体逻辑--多余的处理? 测试运行--响应时间? 查看EXPLAIN --瓶颈定位3.过程优化：? 业务规则理解--合理选取数据访问路径? PDM设计--调整PDM ? SQL写法不优化，忽略了Teradata的机理与特性--调整SQL ? Teradata优化器未得到足够的统计信息--Collect Statistics4.Multiple Insert/select --> Multi-Statement Insert/Select * 并行插入空表不记录Transient Journal * 充分利用Teradata向空表Insert较快以及并行操作的特性如：? 现状INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... ) SELECT …FROM SRC1 ; INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... ) SELECT …FROM SRC2 ; INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... ) SELECT …FROM SRC3 ; 说明：串行执行，多个Transaction ? 优化后：INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... ) SELECT …FROM SRC1 ;INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... ) SELECT …FROM SRC2 ;INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... ) SELECT …FROM SRC3 ; 说明：并行执行，单个Transaction5.Insert/Select with Union/Union all --> Multi-Statement Insert/Select * Union 需要排除重复记录，Union all虽不需要排重，但都需要占用大量的Spool空间，都需要进行重新组织数据如：现状：INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... ) SELECT …FROM SRC1 ; UNION ALL SELECT …FROM SRC2 ; UNION ALL SELECT …FROM SRC3 ; …调整后: INSERT INTO ${TARGETDB}.DES (Party_Id ,Party_Name ... ) SELECT …FROM SRC1 ;INSERT INTO ${TARGETDB}.T01_DES (Party_Id ,Party_Name ... ) SELECT …FROM SRC2 ;INSERT INTO ${TARGETDB}.T01_DES (Party_Id ,Party_Name ... ) SELECT …FROM SRC3 ;6.排除重复记录* 针对单表内的重复记录使用ROW_ NUMBER函数排重* 排重方式多了一层子查询* 增加了大量的数据重新分布的时间现状：……INSERT INTO ${TARGETDB}.T01_INDIV (Party_Id ,Party_Name ... ) SELECT COALESCE(b1.Party_Id,'-1') , COALESCE(TRIM(b1.Party_name),'') ... FROM ( select party_id party_name, …, ROW_NUMBER() OVER (PARTITION BY Party_Id ORDER BY Party_Name ) as rownum from ${TEMPDB}.T01_INDIV b1 …) AA where AA.rownum ＝1 ……建议做法：INSERT INTO ${TEMPDB}.T01_INDIV …INSERT INTO ${TEMPDB}.T01_INDIV ………INSERT INTO ${TARGETDB}.T01_INDIV (Party_Id ,Party_Name ... ) SELECT party_id party_name, …From ${TEMPDB}.T01_INDIV b1 Qualify ROW_NUMBER() OVER (PARTITION BY Party_Id ORDER BY Party_Name ) = 1 ? 运用Qualify + ROW_ NUMBER函数? SQL语句简洁明了? 避免子查询优化前explain：……4) We do an all-AMPs STAT FUNCTION step from PTEMP.VT_T01_INDIV_cur by way of an all-rows scan with no residual conditions into Spool 5 (Last Use), which is assumed to be redistributed by value to all AMPs. The result rows are put into Spool 3 (all_amps), which is built locally on the AMPs. 5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan into Spool 1 (all_amps), which is built locally on the AMPs. The result spool file will not be cachedin memory. The size of Spool 1 is estimated with no confidence to be 6,781,130 rows. The estimated time for this step is 16.01 seconds. 6) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of an all-rows scan with a condition of ("ROWNUMBER = 1") into Spool 8 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 8 by row hash. The result spool file will not be cached in memory. The size of Spool 8 is estimated with no confidence to be 6,781,130 rows. The estimated time for this step is 1 minute. 7) We do an all-AMPs MERGE into PDATA.T01_INDIV from Spool 8 (Last Use). 优化后explain: ……4) We do an all-AMPs STAT FUNCTION step from PTEMP.VT_T01_INDIV_cur by way of an all-rows scan with no residual conditions into Spool 5 (Last Use), which is assumed to be redistributed by value to all AMPs. The result rows are put into Spool 3 (all_amps), which is built locally on the AMPs. 5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of an all-rows scan with a condition of ("Field_10 = 1") into Spool 1 (all_amps), which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 1 by row hash. The result spool file will not be cached in memory. The size of Spool 1 is estimated with no confidence to be 6,781,130 rows. The estimated time for this step is 1 minute. 6) We do an all-AMPs MERGE into PDATA.T01_INDIV from Spool 1 (Last Use).BTEQ中不能用length函数LENGTH()不是Teradata 的标准函数，但是Teradata SQL Assitant 支持它。

teradata_基础_精简

teradata_基础_精简SQL编码规范1).缩进对于存储过程文件，缩进为8个空格对于C#里的SQL字符串，不可有缩进，即每一行字符串不可以空格开头2).换行1>.Select/From/Where/Order by/Group by等子句必须另其一行写2>.Select子句内容如果只有一项，与Select同行写3>.Select子句内容如果多于一项，每一项单独占一行，在对应Select的基础上向右缩进8个空格（C#无缩进）4>.From子句内容如果只有一项，与From同行写5>.From子句内容如果多于一项，每一项单独占一行，在对应From的基础上向右缩进8个空格（C#无缩进）6>.Where子句的条件如果有多项，每一个条件占一行，以AND 开头，且无缩进7>.(Update)Set子句内容每一项单独占一行，无缩进8>.Insert子句内容每个表字段单独占一行，无缩进；values每一项单独占一行，无缩进9>.SQL文中间不允许出现空行10>.C#里单引号必须跟所属的SQL子句处在同一行，连接符（"+"）必须在行首3).空格1>.SQL内算数运算符、逻辑运算符连接的两个元素之间必须用空格分隔2>.逗号之后必须接一个空格3>.关键字、保留字和左括号之间必须有一个空格BASEWhat is AMPs?AMP, acronym for "Access Module Processor," is the type of vproc used to manage the database, handle file tasks and and manipulate the disk subsystem in the multi-tasking and possibly parallel-processing environment of the Teradata Database.What is BTEQ?BTEQ is a Teradata native query tool for DBA and programmers. BTEQ (Basic TEradata Query) is a command-driven utility used to 1) access and manipulate data, and 2) format reports for both print and screen output.Teradata的帮助系统主要由三条命令组成，一条是HELP，一条是SHOW，另一条是EXPLAIN。

teradata2014年度内部培训课件_05 FS Writes

®
File System Writes
After completing this module, you will be able to:
• Describe File System Write Access. • Describe what happens when Teradata inserts a new row into a table.
Start Sector
: 5 10 8 4 6 6 :
Start Sector
: 0270 0301 0349 0470 0481 0550 :
Sector Count
3 5 5 4 6 5
Read the block into memory (FSG cache).
To Data Block
®
New Row INSERT – Part 2 (continued)
DBDs
:: 0 0 0 0 :
Part #
: 00938, 1 00998, 1 01010, 3 01185, 2 :
Lowest Row ID
: 0 0 0 0 :
Part #
: 00996 01010 01177 01258 :
Highest RowHash
: 0093 0789 0301 0056 :
Read the block into memory (FSG cache).
1. If the block has enough free contiguous bytes, then insert row into block and update CI.
525
2. If the block has enough free bytes, then defragment the block and insert row into block and update CI.

TeraData数据库学习笔记

处理节点（node）、用于节点间通信的内部高速互联（InterConnection）和数据存储介质（一般是磁盘阵列）。

每个节点都是SMP结构的单机，节点的物理和逻辑结构如图1所示单个节点就是一个就是一个smp 处理单元，一台多CPU或多核的计算机。

硬件包括CPU、内存、用于安装操作系统和应用软件的本地磁盘，与外界交互的网卡及bynet端口；节点网卡一种是与IBM MainFrame链接的Channel Adapter，另一种是局域网网卡，通常一个节点只有一种网卡，但有很多块网卡，分别用于不同的连接（比如：备份等）和冗余。

多个节点一起构成MPP系统，多个节点之间的内部高速互联时通过BYNET的硬件实现Shared Nothing Architecture The Teradata Database virtual processors, or vprocs (which are the PEs and AMPs), share the components of the nodes (memory and cpu). The main component of the "shared-nothing" architecture is that each AMP manages its own dedicated portion of the system's disk space (called the vdisk) and this space is not shared with other AMPs. Each AMP uses system resources independently of the other AMPs so they can all work in parallel for high system performance overall.Modul-2一个关系数据库是存储在关系数据库管理系统里的相关联的表的集合。

03.TD员工入职技术培训之Teradata加载卸载工具V1汇编

•
• •
• •
Flexible and easy-to-use report writer.
Limited ability to branch forward to a LABEL, based on a return code or an activity count. BTEQ does error reporting, not error capture. The .OS command allows the execution of operating system commands.
.IF ERRORLEVEL >= 16 THEN .QUIT 16 ;
You can assign an error level (SEVERITY) for each error code returned and make decisions based on the level you assign. ERRORCODE Tests last SQL statement only. ERRORLEVEL Set by user and retained until reset. Capabilities:
– BTEQ (Basic Teradata Query) operates in either a Batch or Interactive mode. – BTEQ is a CLI-based utility.
• •
Runs on every supported platform — laptop to mainframe. Exports data to a client system from the Teradata database: – As displayable characters suitable for reports, or – In native host format, suitable for other applications. Imports data from a host or server data file and can use that data within SQL statements (INSERT, UPDATE, or DELETE).

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

STAGE 5
ACTIVE WAREHOUSING MAKING it happen!
Primarily Batch
Increase in Ad Hoc Queries
Ad Hoc
Analytical Modeling Grows
Analytics
Continuous Update & Time Sensitive Queries Become Important
• • • • •
Based on enterprise-wide model Can begin small but may grow large rapidly Populated by extraction/loading data from operational systems Responds to end-user “what if” queries Can store detailed as well as summary data Operational Data
Module 1: Teradata Product Overview
After completing this module, you will be able to: • Describe the purpose of the Teradata product • Give a brief history of the product • List major architectural features of the product
Designed for Today’s Business
Teradata’s Charter meets the business needs of today and tomorrow with:
• Relational database – standard for database design • Enormous capacity – billions of rows, terabytes of
Win 2000 Win XP
Teradata DATABASE
UNIX Client
Mainframe Client
Teradata – A Brief History
1979 – Teradata Corp founded in Los Angeles, California – Development begins on a massively parallel computer 1982 – YNET technology is patented 1984 – Teradata markets the first database computer DBC/1012 – First system purchased by Wells Fargo Bank of Cal. – Total revenue for year - $3 million 1987 – First public offering of stock 1989 – Teradata and NCR partner on next generation of DBC 1991 – NCR Corporation is acquired by AT&T – Teradata revenues at $280 million 1992 – Teradata is merged into NCR 1996 – AT&T spins off NCR Corp. with Teradata product 1997 – Teradata database becomes industry leader in data warehousing 2000 – 100+ Terabyte system in production 2002 – Teradata V2R5 released 12/2002; major release including features such as PPI, roles and profiles, multi-value compression, and more. 2003 – Teradata V2R5.1 released 12/2003; includes UDFs, BLOBs, CLOBs, and more.
What is a Data Warehouse?
A Data Warehouse is a central, enterprise-wide database that contains information extracted from Operational Data Stores (ODS).
How Large is a Trillion?
1 Kilobyte 1 Megabyte 1 Gigabyte 1 Terabyte 1 Petabyte = 103 = 106 = 109 = 1012 = 1015 = 1000 bytes = 1,000,000 bytes = 1,000,000,000 bytes = 1,000,000,000,000 bytes = 1,000,000,000,000,000 bytes = 11.57 days = 31.6 years = 31,688 years = 15.7 miles = 15,700,000 miles
data
• High performance parallel processing • Single database server for multiple clients – “Single
Version of the Truth”
• Network and mainframe connectivity • Industry standard access language – Structured
• Primarily batch feeds and updates • Ad hoc queries to support strategic decisions that return in minutes and maybe
hours
Active Data Warehousing … is the timely, integrated, logically consistent store of detailed data available for strategic, tactical driven business decisions.
DSS
Large
Seconds or minutes
OLCP
T o d a y
Instant credit – How much credit can be extended to this person? Show the top ten selling items across all stores for 2003.
1 million seconds 1 billion seconds 1 trillion seconds 1 million inches 1 trillion inches
(30 roundtrips to the moon)
1 million square inches = .16 acres = .0002 square miles 1 trillion square inches = 249 square miles (larger than Singapore) $1 million $1 billion $1 trillion = < $ .01 for every person in U.S. = $ 3.64 for every person is U.S. = $ 3,636 for every person in U.S.
Data Warehouse Usage Evolution
STAGE 1
REPORTING WHAT happened?
STAGE 2
ANALYZING WHY did it happen?
STAGE 3
PREDICTING WHY will it happen?
STAGE 4
OPERATIONALIZING WHAT IS Happening?
Continuous Update Short Queries
Event Based Triggering Takes Hold
Event-Based Triggering
Batch
What is Active Data Warehousing?
Data Warehousing … is the timely, integrated, logically consistent store of detailed data available for analytic business decision making.
ATM
PeopleSoft ®
Point of Service eradata Database Teradata Warehouse Miner
Cognos ®
MicroStrategies ®
Examples of Access Tools End Users
Query Language (SQL)
• Manageable growth via modularity • Fault tolerance at all levels of hardware and
software
• Data integrity and reliability
Evolution of Data Processing
Small to moderate; possibly across multiple databases Large number of detail rows or moderate number of summary rows