数据仓库及数据建模的常用新术语
25个大数据专业术语入门大数据必备知识

25个大数据专业术语入门大数据必备知识大数据是指跨越传统数据处理能力范围,无法使用常规数据库工具进行处理和管理的大量、高速度、多样性的结构化和非结构化数据的集合。
随着信息技术的高速发展,大数据已经成为当今社会的热门话题之一。
掌握大数据的相关专业术语对于大数据领域的从业者和对大数据感兴趣的人来说至关重要。
本文将介绍25个常见的大数据专业术语,帮助读者快速入门大数据领域。
1. 数据挖掘(Data Mining)数据挖掘是指通过分析大量数据来发现隐藏在其中的模式和关联性的过程。
通过数据挖掘技术,可以从海量数据中提取有价值的知识和信息,支持决策和业务发展。
2. 机器学习(Machine Learning)机器学习是一种人工智能的技术,通过让计算机系统从数据中学习和改进,实现自主学习和自主决策的能力。
机器学习在大数据处理中起到了重要作用,可以从大量数据中挖掘出模式和规律。
3. 云计算(Cloud Computing)云计算是一种基于互联网的计算方式,可以通过网络提供各种计算资源和服务。
云计算通过将计算任务分配给大量的计算机集群来处理大数据,提高计算效率和资源利用率。
4. 流式处理(Stream Processing)流式处理是指对实时产生的数据流进行实时分析和处理的技术。
在大数据领域,流式处理可以对海量的实时数据进行连续的计算和分析,实现实时决策和实时应用。
5. 数据湖(Data Lake)数据湖是指一个存储了各种结构化和非结构化数据的集合,可以容纳大量的原始数据。
数据湖不要求进行数据的预处理或格式转换,使得数据的获取和利用更加灵活和高效。
6. 数据仓库(Data Warehouse)数据仓库是指一个用于存储和管理各种企业数据的集中化数据存储系统。
数据仓库通过将来自不同数据源的数据进行整合和清洗,为企业决策提供可靠的数据支持。
7. 数据可视化(Data Visualization)数据可视化是指使用图表、图像和其他可视化方式将数据表达出来的过程。
关于大数据你应该知道的50个专业术语

关于大数据你应该知道的50个专业术语1. 大数据(Big Data)- 指的是规模庞大、复杂多变的数据集合。
它在各个领域中不断积累和产生,涵盖了结构化、半结构化和非结构化的数据。
2. 数据挖掘(Data Mining)- 是从大数据中自动发现和提取有用信息的过程。
它使用统计学、模式识别和机器学习等技术,帮助解读数据并发现隐藏的模式和规律。
3. 云计算(Cloud Computing)- 是通过互联网提供各种计算资源和服务的模式。
大数据通常需要庞大的计算和存储能力,云计算提供了弹性和可靠的资源解决方案。
4. 数据仓库(Data Warehouse)- 是用于存储和管理结构化数据的集中式系统。
它经过数据清洗和整合,方便用户进行复杂的分析和查询。
5. 数据湖(Data Lake)- 是指将各种类型和格式的数据存储在一个集中式的存储系统中。
与数据仓库不同,数据湖不需要事先定义数据模式和结构,可以更灵活地处理复杂的数据分析需求。
6. Hadoop- 是一个开源的分布式计算框架,用于处理大规模数据集。
它基于MapReduce算法,能够有效地分布和处理数据。
7. MapReduce- 是一种并行计算编程模型,用于处理大规模数据集。
它将数据分成多个小块,分发给多个计算节点进行并行计算,最终将结果合并返回。
8. Spark- 是一个快速、通用、高级的大数据处理引擎。
它支持内存计算,能够在大数据集上进行复杂的数据处理和分析。
9. 数据可视化(Data Visualization)- 是将数据以图表、图形和其他可视化形式展示的过程。
它能够帮助用户更好地理解和分析数据,发现潜在的信息和见解。
10. 数据清洗(Data Cleaning)- 是处理和修正数据中的错误、缺失和不一致之前的过程。
清洗后的数据更加准确可靠,有助于后续的分析和应用。
11. 数据集成(Data Integration)- 是将来自不同数据源的数据合并成一个统一的数据集的过程。
数据仓库专业术语表

数据仓库专业术语表AAccess PathThe path chosen by a database management system to retrieve the requested data.Access ProviderA company which provides its customers a service whereby they can access theInternet.The user normally connects to the access provid er’s computer viaa modem using a dial up connection.Active AttackA persistent security assault by someone trying to gain restricted accessby altering data.There are multiple techniques,decryption for example,which can be used to lead the attack.Active Server Pages (ASP)Active server pages are a set of software components that run on a Web server and allow Web developers to build dynamic Web pages.Activity-Based Costing (ABC)Activity-based costing (ABC) is an information system that maintains and processes data on a firm's activities and products.It identifies the activities performed,traces cost to these activities,and then uses various cost drivers to trace the cost of activities to products.Activity-Based Management (ABM)Activity-based management(ABM) is the use of the activity-based costing tool by process owners to control and improve their operations.Because process analysis is conducted in the building of an activity-based cost model,management knows its business much better and can consequently evaluate value-added and non-value-added activities.Because a certain volume of work produces a certain outcome,"What if"analysis can be conducted to determine what resources are required of operations are scaled back or expanded.Ad ClicksAlso called clickthroughs.The number of times a user "clicks" on an online ad,often measured as a function of time("ad clicks per day").Ad Hoc QueryAny query that cannot be determined prior to the moment the query is issued.A query that consists of dynamically constructed SQL,which is usuallyconstructed by desktop-resident query tools.Ad Hoc Query ToolAn end-user tool that accepts an English-like or point-and-click request for data and constructs an ad-hoc query to retrieve the desired result.Administrative DataIn a data warehouse,the data that helps a warehouse administrator manage the warehouse.Examples of administrative data are user profiles and order history data.Aggregate DataData that is the result of applying a process to combine data elements.Data that is taken collectively or in summary form.AggregatorThis is an e-commerce business model in which the Web site sells products or services which it does not produce or warehouse.An aggregator creates an environment where multiple providers (sellers) must compete on terms determined by the use.AlertsA notification from an event that has exceeded a pre-defined threshold.AnalystSomeone who creates views for analytic interpretation of data,performs calculations and distributes the resulting information in the form of reports.Analytic ApplicationsPackaged software that meets three distinct conditions:process support,separation of function and time-oriented,integrated data.Analytic applications expand the reach of business intelligence to an extended user base,packaging these technologies in a business context.AnalyticsThe process and techniques for the exploration and analysis of business data to discover and identify new and meaningful information and trends that allow for analysis to take place.AppletA small Java program that can be embedded in an HTML page.They cannot accesscertain resources on local computers such as files and serial devices and are prohibited from communication with most other computers across a network.Application Service Provider (ASP)ASPs provide the infrastructure needed to deliver reliable application access,including enterprise applications,hardware platforms,operating systems,database systems,network hardware as well as the technical expertise to make it all work for a monthly service charge.ASCIIAmerican Standard Code for Information Interchange.An eight-bit code for character representation,includes seven bits plus parity.ASPApplication Service Provider.A company that offers access over the Internet to application programs and related services that would otherwise have to be located in other own personal or enterprise computers.Atomic DataData elements that represent the lowest level of detail.For example,in a daily sales report,the individual items sold would be atomic data,while rollups such as invoice and summary totals from invoices are aggregate data.AttributeA field represented by a column within an object (entity).An object may bea table,view or report.An attribute is also associated with an SGML(HTML)tag used to further define the usage.Authorization RequestA request initiated by a consumer to access data for which the consumer doesnot presently have access privileges.Authorization RulesCriteria used to determine whether or not an individual,group,or application may access reference data or a process.AvailabilityUser access to applications and/or data stores that reside and execute on computing systems accessing information that resides in files and databases suppo rted by an organization’s various operating environments.BB2BBusiness-to-business commerce conducted over the Web.B2CBusiness-to-consumer commerce conducted over the Internet.It links consumers to commercial entities in one-way networks.Balanced ScorecardA comprehensive,top-down view of organizational performance with a strongfocus on vision and strategy.In 1992 the founding fathers of the Balanced Scorecard,Drs.Robert Kaplan and David Norton,debuted their methodology in the Harvard Business Review.Then,in 1996,they released The BalancedScorecard Translating Strategy into Action,the so-called bible of the Balanced Scorecard.Balanced Scorecard CollaborativeA professional services firm dedicated to the worldwideawareness,use,enhancement and integrity of the balanced scorecard as a value-added management process.Balanced Scorecard Collaborative CertificationAn industry-standard certification offered to software providers whose balanced scorecard applications meet the functional standards of Kaplan and Norton.These are applications that will enable end users to achieve the benefits of the balanced scorecard management process.Baldrige Criteria for Performance ExcellenceCriteria providing a systems perspective for understanding performance management.They reflect validated, leading management practices against which an organization can measure itself.With their acceptance nationally and internationally as the model for performance excellence,the criteria represent a common language for communication among organizations for sharing best practices.BannerA picture or graphic that stretches horizontally across a Web page.These canbe used to title the Web page,start or separate different sections,create links to other Web pages,or provide a place for advertisements.Banner AdvertisingA marketing mechanism that contains strips of advertisements that aresporadically positioned on a web page and are extremely popular on the World Wide Web.These types of ads generally take up a considerable amount of bandwidth and are sometimes disturbing to the Web user.Base TablesThe normalized data structures maintained in the target warehousing database.Also known as the detail data.Basel II New Accord (Basel 2,New Accord)This is a set of banking standards,which will regulate finance and banking for countries in the European Union.The Basel Committee on Banking Supervision is tasked with the goal to complete the New Accord by mid-year 2004,with implementation to take effect in member countries by year-end 2006.To that end,work already has begun in a number of countries on draft rules that would integrate Basel capital standards with national capital regimes.Basel II is focused specifically on global banks and financial institutions and ensures liquidity of those institutions for the protection of public trust.BenchmarkingA point of reference for measurement.Benefit SegmentationThe process of grouping customers into market segments according to the benefits they seek from the product.Refers to their needs and wants only.Best PracticesA case study considered to be a good example of a business discipline.Bidirectional ExtractsThe ability to extract,cleanse and transfer data in two directions among different types of databases,including hierarchical,networked and relational databases.Braking MechanismA software mechanism that prevents users from querying the operationaldatabase once transaction loads reach a certain level.Bricks and MortarRefers to businesses that exist in the real world as opposed to just the cyber world such as bricks-and-mortar retail outlets,bricks-and-mortar warehouses,etc.BrowserThe generic term for software programs that retrieve,display and print information World Wide Web.The most popular browsers are Microsoft Internet Explorer,Netscape Navigator and Mosaic.Mosaic was the first browser to introduce graphics.Previously,users were only allowed to view the text of Web pages.Currently,Microsoft Outlook is the most popular browser in the world.Bulk Data TransferA software-based mechanism designed to move large data files.It supportscompression,blocking and buffering to optimize transfer times.Business Activity Monitoring (BAM)BAM is a business solution that is supported by an advanced technical infrastructure that enables rapid insight into new business strategies,the reduction of operating cost by real-time identification of issues and improved process performance.Business ArchitectureOne of the four layers of an information systems architecture.A business architecture describes the functions a business performs and the information it uses.Business ContinuityThe degree to which an organization may achieve uninterrupted stability of systems and operational procedures.Business DataInformation about people,places,things,business rules,and events,which is used to operate the business.It is not metadata.(Metadata defines and describes business data.)Business DriversThe people,information,and tasks that support the fulfillment of a business objective.Business Intelligence (BI)Business intelligence is actually an environment in which business users receive data that is reliable,consistent,understandable,easily manipulated and timely.With this data,business users are able to conduct analyses thatyield overall understanding of where the business has been,where it is now and where it will be in the near future.Business intelligence serves two main purposes.It monitors the financial and operational health of the organization(reports,alerts,alarms,analysis tools,key performance indicators and dashboards).It also regulates the operation of the organization providing two-way integration with operational systems and information feedback analysis.Business Intelligence PlatformA foundation of enabling tools and technologies necessary for the developmentand deployment of business intelligence and business performance management applications.Business Intelligence Service Provider (BISP)A natural extension of the ASP,application of data warehousing and businessintelligence (BI) methodologies and technologies to the ASP model.BISPs tie into information systems behind a corporation's firewall,providing traditional data warehouse and analytic application capabilities for Internet-based e-businesses,especially e-commerce Web sites and are hosted off site.Business Intelligence SoftwareA category of software that enables companies to access,analyze and shareinformation to understand how the business is performing and to improve decision making.Business Intelligence ToolsThe tools and technologies used to access and analyze business information.They include online analytical processing(OLAP) technologies,data miningand advanced analytics;end-user tools for ad hoc query and analysis,enterprise class query,analysis and reporting including dashboards for performance monitoring;and production reporting against all enterprise data sources.Business ModelA view of the business at any given point in time.The view can be from aprocess,data,event or resource perspective,and can be the past,present or future state of the business.Business Performance Calibration (BPC)The continuous,near real-time forecasting and analysis of related performance metrics to achieve balanced performance,i.e.,efficient growth and the optimal management of resources.Business Performance Intelligence (BPI)A subset of the BI market and involves planning and budgeting,BalancedScorecard performance management and activity-based costing.Business Performance Management (BPM)Applications that help direct modeling or scenario exploration activities.Rather than simply exploring what happened and why,the application can help the user consider the implications of alternative courses of action before they become operational.Performance management suggests an explicit relationship to action,and modeling is the key link to do this.Business Performance MeasurementApplications that provide support for specific KPIs(key performance indicators) enable a business to measure their performance.This is often coupled with comparative information from industry sources,so a company can compare their performance against that of others in their industry.Business performance measurement applications support the analysis phase of the business improvement cycle.Business TransactionA unit of work acted upon by a data capture system to create,modify,or deletebusiness data.Each transaction represents a single valued fact describinga single business event.CC-Commerce (Collaborative-Commerce)A business strategy that motivates value-chain partners with a commonbusiness interest to generate value through sharing information at all phases of the business cycle (from product development to distribution).C2BThe financial interaction,initiated by a consumer,between a consumer and business.CachePronounced "cash". The storage of recently visited sites and data which can be accessed from computer memory instead of linking the server each time you return to the site.This speeds the access time, but does not reflect any changes to the site while in the cache.On rapidly changing sites you may needto click the reload button in order to read the most recent changes.Call CenterThe part of an organization that handles inbound/outbound communications with customers.Campaign ManagementDetailed tracking, reporting and analysis that provides precise measurements regarding current marketing campaigns,how they are performing and the types of leads they attract.Cartesian productA Cartesian join will get you a Cartesian product.A Cartesian join is whenyou join every row of one table to every row of another table.You can also get one by joining every row of a table to every row of itself.Cascading Style Sheet (CSS)Cascading style sheets is a style sheet language that enables authors and users to attach style(fonts,spacing and aural cues) to structure that include HTML and XML applications.CASEComputer Aided Software Engineering.CASE ManagementThe management of information between multiple CASE encyclopedias," whether the same or different CASE tools.CatalogA component of a data dictionary that contains a directory of its DBMS objectsas well as attributes of each object.CellData point defined by one member of each dimension of a multidimensional structure.Often,potential cells in multidimensional structures are empty,leading to "sparse" storage.Central WarehouseA database created from operational extracts that adheres to asingle,consistent,enterprise data model to ensure consistency of decision-support data across the corporation.A style of computing where all the information systems are located and managed from a single physical location.Change Data CaptureThe process of capturing changes made to a production data source.Change data capture is typically performed by reading the source DBMS log.It consolidates units of work,ensures data is synchronized with the original source,and reduces data volume in a data warehousing environment.ChurnDescribes customer attrition.A high churn rate implies high customer disloyalty.Classic Data Warehouse DevelopmentThe process of building an enterprise business model,creating a system datamodel,defining and designing a data warehouse architecture,constructing the physical database,and lastly populating the warehouses database.Clicks and MortarA business that has successfully integrated its online existence with itsoffline,real-world existence.For example,a retail store that allows customers to order products online or purchase products at its store location.ClickthroughThe percentage of advertisements or other content a user clicks on or chooses to view.ClientA software program used to contact and obtain data from a server softwareprogram on another computer.Each client program is designed to work with one or more specific kinds of server programs, and each server requires a specific kid of client.Client/ServerA distributed technology approach where the processing is divided by function.The server performs shared functions——managing communications,providing database services,etc.The client performs individual user functions——providing customized interfaces,performing screen to screen navigation,offering help functions,etc.Client/Server ArchitectureA networked environment where a smaller system such as a PC interacts witha larger,faster system.This allows the processing to be performed on thelarger system which frees the user's PC.The larger system is able to connect and disconnect from the clients in order to more efficiently process the data.Client/Server ProcessingA form of cooperative processing in which the end-user interaction is througha programmable workstation (desktop) that must execute some part of theapplication logic over and above display formatting and terminal emulation.CollectionA set of data that resulted from a DBMS query.COM+Provides an enterprise development environment,based on the Microsoft component object model (COM),for creating component-based,distributed applications.Common Object Model (COM)Common object model is an object-based programming specification,designed to provide object interoperability through sets of predefined routines called interfaces.Common Object Request Broker Architecture (CORBA)Common object request broker architecture is the Object Management Group(OMG) vendor-independent architecture and infrastructure,which computer applications use to work together over networks.Communications IntegrityAn operational quality that ensures transmitted data has been accurately received at its destination.ConsolidationThe process that takes data from different systems and entities,and possibly desperate formats,and combines and aggregates that information to create a unified view.ConsumerAn individual,group or application that accesses data/information in a data warehouse.Consumer ProfileIdentification of an individual,group or application and a profile of the data they request and use:the kinds of warehouse data, physical relational tables needed,and the required location and frequency of the data (when,where,and in what form it is to be delivered).Content ManagementThe processes and workflows involved in organizing,categorizing,and structuring information resources so that they can be stored,published,and reused in multiple ways.A content management system(CMS) is used to collect,manage and publish content,storing the content either as components or whole documents,while maintaining the links between components.It may also provides for content revision control.Continuous AvailabilityA protocol,associated execution and ready state of functionality thatvirtually guarantees computing-system operational continuity in any downtime event.Continuous availability concerns itself with 1) the recovery of applications,data and data transactions committed up to the moment of system loss;and 2) seamless,24x7 system availability that offsets any planned or unplanned downtime event.Control DataData that guides a process. For example,indicators,flags,counters and parameters.CookiesCookies are text files that are stored at the client's hard drive.When a browser requests a document,the web server creates a fragment of data,which is sent to the browser and stored at the client's computer.Afterward,when the browser solicits another document,the cookie is sent with the request.Cookies are very similar to the caller id boxes that have become so popular in that they provide telemarketers with such relevant information as:the consumers name,address,and previous purchase payment record.Cooperative ProcessingA style of computer application processing in which thepresentation,business logic,and data management are split among two or more software services that operate on one or more computers.In cooperative processing,individual software programs (services) perform specific functions that are invoked by means of parameterized messages exchanged between them.Copy ManagementThe analysis of the business benefit realized by the cost of expenditure on some resource,tool,or application development.Corporate Performance ManagementAn umbrella term used to describe the methodologies, metrics,processes and systems used to monitor and manage the business performance of an enterprise.Cost Benefit AnalysisThe analysis of the business benefit realized by the cost of expenditure on some resource,tool,or application development.Critical Success FactorsKey areas of activity in which favorable results are necessary for a company to reach its goal.CRMCustomer Relationship Management.CrosstabA process or function that combines and/or summarizes data from one or moresources into a concise format for analysis or reporting.CubeA data cube is a multidimensional structure that contains an aggregate valueat each point,i.e.,the result of applying an aggregate function to an underlying relation.Data cubes are used to implement online analytical processing (OLAP).Currency DateThe date the data is considered effective.It is also known as the "as of"date or temporal currency.Customer Relationship ManagementThe idea of establishing relationships with customers on an individual basis,then using that information to treat different customers differently.Customer buying profiles and churn analysis are examples of decision support activities that can affect the success of customer relationships.Cyber marketingThis term refers to any type of Internet-based promotion.This includes Web sites, targeted e-mail,Internet bulletin boards, sites where customers can dial-in and download files,and sites that engage in internet commerce by offering products for sell over the Internet.The term doesn't have a strict meaning,though,and many marketing managers use it to cover any computer-based marketing tools.DDashboardAn application or custom user interface that organizes and presents information in a way that is easy to read.The information may be integrated from multiple components into a unified display.A dashboard helps monitor individual,business unit and organizational performance and processes fora greater understanding of the business.DataItems representing facts,text,graphics,bit-mapped images,sound,analog or digital live-video segments.Data is the raw material of a system supplied by data producers and is used by information consumers to create information.Data Access ToolsAn end-user oriented tool that allows users to build SQL queries by pointing and clicking on a list of tables and fields in the data warehouse.Data AcquisitionIdentification,selection and mapping of source data to target data.Detection of source data changes,data extraction techniques,timing of data extracts,data transformation techniques,frequency of database loads and levels of data summary are among the difficult data acquisition challenges.Data Analysis and Presentation ToolsSoftware that provides a logical view of data in a warehouse.Some create simple aliases for table and column names;others create data that identify the contents and location of data in the warehouse.Data ApplianceA combination of hardware,software,DBMSs and storage,all under oneumbrella.A black box that yields high performance in both speed and storage, making the BI environment simpler and more useful to the users.Data ConsumerAn individual,group,or application that receives data in the form of a collection.The data is used for query,analysis,and reporting.Data CustodianThe individual assigned the responsibility of operating systems, data centers, data warehouses, operational databases, and business operations in conformance with the policies and practices prescribed by the data owner.Data DictionaryA database about data and database structures. A catalog of all data elements,containing their names, structures, and information about their usage. A central location for metadata. Normally, data dictionaries are designed to store a limited set of available metadata, concentrating on the information relating to the data elements, databases, files and programs of implemented systems.Data DirectoryA collection of definitions, rules and advisories of data, designed to beused as a guide or reference with the data warehouse. The directory includes definitions, examples, relations, functions and equivalents in other environments.Data ElementThe most elementary unit of data that can be identified and described in a dictionary or repository which cannot be subdivided.Data Extraction SoftwareSoftware that reads one or more sources of data and creates a new image of the data.Data Flow DiagramA diagram that shows the normal flow of data between services as well as theflow of data between data stores and services.Data IntegrationPulling together and reconciling dispersed data for analytic purposes that organizations have maintained in multiple, heterogeneous systems. Data needs to be accessed and extracted, moved and loaded, validated and cleaned, and standardized and transformed.Data LoadingThe process of populating the data warehouse. Data loading is provided by DBMS-specific load processes, DBMS insert processes, and independent fastload processes.Data ManagementControlling, protecting, and facilitating access to data in order to provide information consumers with timely access to the data they need. The functions provided by a database management system.Data Management SoftwareSoftware that converts data into a unified format by taking derived data to create new fields, merging files, summarizing and filtering data; the process of reading data from operational systems. Data Management Software is also known as data extraction software.Data MappingThe process of assigning a source data element to a target data element.。
数据仓库的技术词汇

■数据仓库的技术词汇access (访问或存取)—在存储单元上查找、读或写数据的操作。
access method (访问方法或存取方法)—用于将物理记录从大容量存储设备传入或传出的技术。
access pattern (访问模式或存取模式)—访问数据结构的一般序列(例如,从元组到元组,从记录到记录,从段到段等等)。
accuracy (精确度)—一种对避免误差的定性估计,或对误差大小的定量度量,表示为一个相对误差的函数。
ad hoc processing (特别处理)—仅执行一次,偶尔访问,并且用从未用过的参数操纵数据,通常以启发式的迭代的方式进行。
after image (后映像)—当完成一个事务后,放入日志的数据快照。
agent of change (变化动因)—大得不能抗拒的驱动力,通常是系统的老化、技术的变化、需求的根本改变等等。
algorithm (算法)—组织好用以在有限步骤内解决问题的一系列语句。
analytical processing (分析型处理)—使用计算机为管理决策提供分析,通常包括趋势分析、向下探查分析、统计分析及概要分析等等。
application (应用)—支持一个组织或企业需求的一组相互联系的算法和数据。
application database (应用数据库)—组织好用以支持一种特定应用的数据集合。
archival database (存档数据库)—包含具有历史特性的数据的数据集合。
一般来说,存档数据是不被更新的。
每个存档数据单元都和一个过去的时间点有关。
artifact (人工关系)—在D S S环境中用于表示参照完整性的一种设计技术。
atomic (原子)—(1)存储在数据仓库中的数据;( 2)处理分析的最低层次。
atomic database (原子数据库)—由原始的原子数据组成的数据库;一个数据仓库;一个D S S基础数据库。
atomic-level data (原子层数据)—具有最低粒度级的数据。
最全数据分析常用术语及其定义

最全数据分析常用术语及其定义最全数据分析常用术语1.数据挖掘(Data Mining):数据挖掘是一种从大型数据库或数据集中发现隐藏的模式、关联、趋势和洞见的过程。
它常常用到关联规则挖掘、聚类分析、决策树、神经网络等多种技术。
2.数据库查询(Database Query):数据库查询是指通过特定的指令从数据库中检索所需要的数据。
这通常涉及 SQL、NoSQL 等数据库查询语言。
3.数据分析(Data Analysis):数据分析是通过收集、处理、组织和挖掘数据,以发现其内在的规律和联系,从而为决策提供支持和洞见的过程。
4.数据预处理(Data Preprocessing):数据预处理是对原始数据进行清洗、整理、转换等处理,以适应后续分析的需要。
这包括数据清理、数据变换、数据归一化等步骤。
5.特征工程(Feature Engineering):特征工程是数据分析的关键步骤,它涉及到从原始数据中提取有意义的特征,以输入到模型中进行训练。
这些特征可能包括数值特征、文本特征、图像特征等。
6.可视化报告(Visualization):可视化报告是将数据分析结果通过图形、图像、图表等形式呈现出来,以帮助理解和解释数据。
它可以帮助发现数据中的模式和趋势,以及更好地理解数据。
7.模型评估(Model Evaluation):模型评估是在训练模型后,通过使用测试数据集来评估模型的性能和准确性的过程。
这包括计算各种评估指标,比如准确率、召回率、F1 值等。
8.决策树(Decision Tree):决策树是一种监督学习算法,它通过将数据集拆分成若干个简单的子集,从而生成一个树状结构,以做出分类或回归预测。
9.聚类分析(Cluster Analysis):聚类分析是一种无监督学习算法,它通过将数据集中的样本按照某种相似性度量划分为不同的类别或簇,以发现数据中的模式和结构。
10.主成分分析(Principal Component Analysis,简称 PCA):主成分分析是一种降维算法,它通过将数据投影到一组正交的子空间上,使得投影后的数据方差最大,从而降低数据的维度,并保留最重要的特征。
大数据技术术语

以下是一些常见的大数据技术术语: 1. 数据仓库:用于存储和管理大量结构化数据的系统。
2. 数据湖:一种存储大量原始数据的系统,包括结构化、半结构化和非结构化数据。
3. 数据挖掘:从大量数据中提取有用信息和知识的过程。
4. 机器学习:使用算法从数据中学习并做出预测或决策的方法。
5. 数据分析:对数据进行检查、转换、清理和建模,以提取有用信息和支持决策制定的过程。
6. 数据治理:确保数据质量、安全性和合规性的一套策略、流程和技术。
7. 数据隐私:保护个人数据不被未经授权的第三方访问或使用的做法。
8. 云计算:通过互联网提供计算资源(如服务器、存储和应用程序)的模型。
9. 大数据处理:处理和分析大量数据的过程,通常涉及分布式计算和存储技术。
10. 数据科学:结合统计学、计算机科学和领域专业知识来理解和分析数据的跨学科领域。
这些只是大数据领域中的一些常见术语,随着技术的发展,新的术语和概念还在不断涌现。
如果你对特定的大数据技术术语有更多疑问,我可以为你提供更详细的信息。
数据管理专业术语

数据管理专业术语
数据管理专业术语包括但不限于:
1.数据库(Database):用于存储和管理大量结构化数据的集合。
2.数据模型(DataModel):描述数据结构、属性和关系的概念工具,包括层次模型、网络模型、关系模型和面向对象模型等。
3.数据处理(DataProcessing):对各种已有数据进行各种数学运算和统计加工的过程。
4.数据管理(DataManagement):对数据进行收集、整理、组织、编码、存储、检索和传输等一系列操作的总称。
5.数据中心(DataCentre):一个实体地点,放置了用来存储数据的服务器。
6.数据管理员(DataCustodian):负责维护数据存储所需技术环境的专业技术人员。
7.数据集(DataSet):大量数据的集合。
8.数据虚拟化(DataVirtualization):数据整合的过程,以此获得更多的数据信息。
9.数据副本管理(CopyDataManagement):注重如何将获取到的数据更好的管理和利用,以及更好的与应用相结合的利用。
10.变化数据捕获(ChangeDataCapture,CDC):识别出变化的数据,并抽取这些变化的数据的过程。
11.数据仓库(DataWarehouse):英文名称为DataWarehouse,可简写为DW或DWH。
数据仓库名词解释

数据仓库名词解释数据仓库是一个面向主题的、集成的、稳定的、直接面向最终用户的数据集合,用于支持企业决策制定、分析和决策支持系统。
数据仓库是一个独立的数据存储和管理系统,其目标是针对企业中各个部门的数据进行整合、清洗、加工和建模,从而提供一套一致、可信、易于访问和理解的数据,帮助用户进行数据分析和企业决策。
以下是一些与数据仓库相关的重要概念和名词的解释:1. 数据集成:将来自不同数据源的数据整合到数据仓库中,包括内部和外部数据源。
2. 数据清洗:数据清洗是指通过一系列的操作,消除数据中的错误、重复、缺失和不一致的部分,提高数据的质量。
3. 数据加工:对数据进行转换、聚合、计算和抽取,以满足用户的特定需求和分析目的。
4. 主题:数据仓库的主题是指根据企业的业务需求而组织起来的数据类别或领域,例如销售、人力资源、供应链等。
5. 元数据:元数据是描述数据的数据,包括数据的源头、结构、定义、关系等。
元数据对于数据仓库的管理和使用非常重要。
6. 维度:维度是数据仓库中描述主题的属性,如时间、地理位置、产品、客户等,用于分析和查询。
7. 度量:度量是数据仓库中可以计量和比较的数据,如销售额、利润、客户数量等。
8. 星型模式:星型模式是一种常见的数据仓库建模技术,其中一个中心表(事实表)围绕着多个维度表进行关联。
9. 粒度:粒度是指数据仓库中所记录的事实的详细程度,如日销售额、月销售额、年销售额等。
10. OLAP(联机分析处理):OLAP是一种针对多维数据进行快速查询和分析的技术,通过透视表、图表和报表等方式展现数据。
11. ETL(抽取、转换和加载):ETL是数据仓库中的核心过程,用于从源系统中抽取数据,通过转换和加工后加载到数据仓库中。
12. 决策支持系统:决策支持系统是通过利用数据仓库中的数据和分析工具,辅助管理层做出决策的信息系统。
数据仓库在企业中扮演着重要的角色,它能够提供一致、准确的数据,帮助企业决策者进行数据分析和制定决策。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
数据仓库及数据建模的常用新术语
数据仓库引入了新的术语,扩展了数据建模的术语表。
为使本文的阐述能够完备,下面我介绍一下最常用的术语。
◆数据仓库
数据仓库是一个支持管理决策的数据集合。
数据是面向主题的、集成的、不易丢失的并且是时间变量。
数据仓库是所有操作环境和外部数据源的快照集合。
它并不需要非常精确,因为它必须在特定的时间基础上从操作环境中提取出来。
◆数据集市
数据仓库只限于单个主题的区域,例如顾客、部门、地点等。
数据集市在从数据仓库获取数据时可以依赖于数据仓库,或者当它们从操作系统中获取数据时就不依赖于数据仓库。
◆事实
事实是数据仓库中的信息单元,也是多维空间中的一个单元,受分析单元的限制。
事实存储于一张表中(当使用关系数据库时)或者是多维数据库中的一个单元。
每个事实包括关于事实(收入、价值、满意记录等)的基本信息,并且与维度相关。
在某些情况下,当所有的必要信息都存储于维度中时,单纯的事实出现就是对于数据仓库足够的信息。
我们稍后讨论有关缺无事实的情况。
◆维度
维度是绑定由坐标系定义的空间的坐标系的轴线。
数据仓库中的坐标系定义了数据单元,其中包含事实。
坐标系的一个例子就是带有x 维度和y 维度的Cartesian(笛卡尔)坐标系。
在数据仓库中,时间总是维度之一。
◆数据挖掘
在数据仓库的数据中发现新信息的过程被称为数据挖掘,这些新信息不会从操作系统中获得。
◆分析空间
分析空间是数据仓库中一定量的数据,用于进行数据挖掘以发现新信息同时支持管理决策。
◆切片
一种用来在数据仓库中将一个维度中的分析空间限制为数据子集的技术。
◆切块
一种用来在数据仓库中将多个维度中的分析空间限制为数据子集的技术。
◆星型模式
一种使用关系数据库实现多维分析空间的模式,称为星型模式。
星型模式将在本白皮书中稍后进行进一步讨论。
◆雪花模式
不管什么原因,当星型模式的维度需要进行规范化时,星型模式就演进为雪花模式。