Data Warehousing Dictionary

A |B |C |D |E |F |G |H |I |J |K |L |M |N |O |P |Q |R |S |T |U |V |W |X |Y |Z


A

Agent
A program to perform standard functions autonomously.  Typical uses are for automated data loading and exception reporting.  Not to be confused with data mining.

Aggregates
Tables (usually held in a relational database) which hold precomputed totals in a hierarchical multidimensional structure.

Aggregation
Accumulation of related data into a single pre-joined table.

A transformation function that groups and mathematically combines data from a set of detailed records.

Append
A complete load of the captured data into the target dataset, adding to any data that may already exist there.

Application-assisted Capture
A method, built into the source application, that preserves its data for immediate capture of incremental changes in that source.

Application-level Modeling
Data and process modeling aimed at developing specific business function within the scope of a single application.

Apply
A component of data replication that applies the captured and transformed data to the target data store.

Architecture
Provides the planning, structure, and standardization necessary to ensure integration of multiple components, projects, and processes across time.

Archive and Retrieval
A function providing managed and efficient storage and retrieval of data between the different levels of a storage hierarchy.

Return to Top


B

BPR (Business Process Re-engineering)
A fundamental corporate reorganization based upon the processes that deliver value to customers.

Business Drivers
The forces and pressures that have significant influence on how the business performs and operates.

Business Goal
What the business needs or wants to achieve in response to the business drivers.

Business Information Interface
The user's means of selecting, manipulating, and displaying data in business terminology, both initially and for later reuse across the organization.

Business Information Needs
Information that is needed by business to make their business decisions.

Business Processes
Sets of activities performed by the business to produce products that have value to a customer.

Business Units
Organizational units that share common business drivers, goals, information needs and processes.

Return to Top


C

Capture
A component of data replication that interacts with a source data store to obtain a copy of some or all of the data contained therein, or a record of changes that have occurred there.

Cell
Data point defined by one member of each dimension of a multidimensional structure.   Often, potential cells in multidimensional structures are empty, leading to 'sparse' storage.

Conceptual Models
The product of analysis activities, describing requirements without consideration from computer implementation.

Constructive Merge
A method of applying captured changed data to a periodic dataset, which builds and maintains the historical record of changes to the source data.

Contextual Models
Describes the scope of requirements, establishing a context for analysis.

Conversion
A transformation function that converts the value in a single data field to some related form.

Return to Top


D

Data
The computerized representation of business information.

Data Analysis-and-Prediction Tool
An end-user tool that provides an end user with a variety of ways of looking at data with a view to finding the right questions to ask.

Data Cleansing or Scrubbing
Statistical and other techniques used to eliminate variations in data content, to reduce redundancy, and generally to improve the consistency and usability of raw data.

Data Mart
A localized, single-purpose data warehouse implementation.

Typically a small data warehouse or subset of data warehouse designed and constructed to serve the needs of a single, homogenous user group.

Data Mining
Delivers information for research and investigative uses of the data warehouse.   Explores the warehouse contents at a detail level to identify and report the patterns inherent in the data.

The process of using statistical techniques to discover subtle relationships between data items, and the construction of predictive models based on them.  The process is not the same as just using an OLAP tool to find exceptional items.  Generally, data mining is a very different and more specialist application than OLAP, and uses different tools from different vendors.  Normally the users are different, too.  OLAP vendors have had little success with their data mining efforts.

Data Replication
A set of techniques that provides comprehensive support for copying and transforming data from source to target location in a managed, consistent, repeatable, and well-understood manner.

Data Transfer
The data transport function, that underlies data replication and data access, and is responsible for transferring data--ranging from messages to complete files--between data warehouse components in any combination of locations, hardware, or software.

Data Warehouse
A subject-oriented, integrated, non-volatile, time-variant, enterprise wide collection of data organized to support management needs.

A single, complete and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use in a business context.

Data Warehouse Pilot
The first development project of the warehouse implementation process, designed to prove the viability of the overall approach.

Data Warehousing
The coordinated, architected, and periodic copying of data from various sources, both inside and outside the enterprise, into an environment optimized for analytical and informational processing.

DBMS (Data Base Management System)
Used to store, process and manage data in a systematic way.  May use a variety of underlying storage methods, including relational, multidimensional, network and hierarchical.

Delayed Capture
Capture of changes at specified times, producing a complete record of changes only in periodic data.

Dense
The majority, or significant minority (at least ten percent), of potential data cells actually occupied in a multidimensional structure.

Dependent Data Mart
Also called an architected data mart.  Shares common business rules, semantics, and definitions.  A Dependent Data Mart reads metadata from a central metadata repository to define local metadata.  This ensures that all components of the architecture are linked via common metadata.

Derived Data
Point-in-time or periodic data, at a detailed or summary level, derived by some process from real-time data and used to manage the business.

Destructive Merge
A method of applying captured changed data to a target dataset that maintains an exact copy of the source data at the time of capture.

Dimension
An index by which you can access facts according to the value (or values) wanted.

Dirty Data
Inconsistent, missing, incomplete, or erroneous data.  Source data often contains a high percentage of "dirty" data.

DSS (Decision Support Systems)
Delivers information to aid management decisions.

Application for analyzing large quantities of data and performing a wide variety of calculations and projections.

Return to Top


E

EIS (Executive Information Systems)
Delivers information targeted at understanding business trends.

Enrichment
A transformation function that combines data from two or more fields in one or more records to create a new field or fields in the output record.

Enterprise Modeling
Modeling whose focus is a complete and integrated view of all the data and processes in the business.

ETL (Extraction, Transformation, Load) Tool
Used to extract data from source environments, cleanse the data, transform the data to resolve logical inconsistencies, reorganize the data to make it understandable to end users, and load the data into target databases.

External Data
Data from outside the enterprise.

Extraction
Copy data from multiple sources, with resolution of data quality and consistency issues.

Return to Top


F

Fact
An instance of some particular occurrence or event and the properties of the event all stored in a database.

File Comparison
A method of capturing a possibly incomplete set of changed data by comparing two versions of the source data.

Return to Top


G

Granularity
Degree of summarization of data.

Return to Top


H

High-level Enterprise Model
A formal, high-level description in business terms of the information needed and used to manage the entire business, understood by business users and IS personnel.

HOLAP (Hybrid OLAP)
A product that can provide multidimensional analysis simultaneously of data stored in a multidimensional database and in an RDBMS.  Becoming a popular architecture for server OLAP.

Hypercube
An OLAP product that stores all data in a single cube which has all the application dimensions applied to it.

Return to Top


I

Immediate Capture
Capture of changes at the time they occur, ensuring a complete record of changes in transient, semi-periodic, and periodic data.

Incremental Capture
A method of capturing a record of the changes that take place in a source dataset.

Incremental Implementation
Develops and delivers a subset of the total warehouse.  A pragmatic approach to building an enterprise level warehouse in a segmented, evolutionary fashion.

Return to Top


J

 

Return to Top


K

 

Return to Top


L

LAN (Local Area Network)
High speed connection between desktop PCs and server machines, including file servers, application servers, print servers and other services.

Load
A complete load of the captured data into the target dataset, irrespective of any existing data.

Logical Models
Design products, describing data from a computer system perspective, yet free of any implementation platform specifics.

Return to Top


M

MDB (Multidimensional Data Base)
A product that can store and process multidimensional data.

MDX (Multidimensional Expression Language)
The multidimensional equivalent of SQL.  The language used to define multidimensional data selections and calculations in Microsoft's OLE DB for OLAP API (Tensor).  It is also used as the calculation definition language in Microsoft's OLAP Services.

Metadata
Source definitions, data models, calculations, transformation and conversion rules.("data about data")

Data that describes the meaning and structure of business data, as well as how it is created, accessed, and used.

Data about data.  How the structures and calculation rules are stored, plus, possibly, additional information on data sources, user privileges, etc.

Metadata exchange
Application program interface that enables products from multiple vendors to read and exchange metadata using a common format.  Microsoft's OLE DB for OLAP API has become the industry standard for exchanging metadata between data warehousing products.

Methodology
A general structure that describes the steps necessary to produce deliverables common to many environments.

Minicube
A subset of a hypercube, with fewer dimensions than the encompassing hypercube.   The hypercube will consist of a collection of logically similar minicubes.

Model-driven Replication
An approach to replication in which the data and mapping definitions start from a business viewpoint and lead to logical- and physical level definitions based on a layered enterprise modeling approach.

Multicube
An OLAP product that can store data in the form of a number of multidimensional structures which together form an OLAP database.

Multidimensional
Data structure with three or more independent dimensions.

Return to Top


N

Normalization/Denormalization
A transformation function that splits or joins data records, while maintaining relationships between primary and secondary record keys.

Return to Top


O

OLAP (Online Analytical Processing)
Delivers information to meet the needs of Buenos analysts.

OLE DB
New Microsoft database connectivity standard, replacing the older ODBC standard.

OLE DB for OLAP
Microsoft's OLAP API, likely to become an industry standard.  Used to link OLAP clients and servers using a standard multidimensional language MDX.

OLTP (OnLine Transaction Processing)
Transaction-based systems that are central to operation of the business and collect large amounts of data at a highly detailed level.

Operational Data
Data in enterprises business applications

Operational Data Store
Addresses operational needs in the areas of keyed retrieval, update, and access of non-integrated operational data.

A modeled store of detailed operational data designed and structured to allow better integration among such data when it is used for immediate inquiry.

A set of integrated, scrubbed data without history or summarization provided for tactical decision support.  ODSs typically support operational systems and provide a single view of operational data.  They are commonly used to populate data warehouses and data marts.

Operational Systems
Applications that run the business on a day-to-day basis using real-time data.

Return to Top


P

Partitioning
Splitting of target data into smaller units.

Periodic Data
Data recording the history of the business over a period of time by maintaining a complete record either of all statuses or of all events that have occurred.

Physical Models
Detailed design and specifications of data structures to be implemented with specific, known technology.

Process Management
A set of functions supporting the definition of inter-related process steps and the management of their execution across a variety of hardware and software platforms; used mainly by data replication.

Return to Top


Q

Query-and-Reporting Tool
An end-user tool designed to support the definition of queries and reports where the analysis need is well understood in business terms.

Normally used for list-oriented reporting from relational databases.

Return to Top


R

RDBMS (Relational Data Base Management System)
Used to store, process and manage data arranged in relational tables.  Often used for transaction processing and data warehouses.

Real-time Data
Up-to-the-second, detailed data used to run the business and accessed in read/write mode, usually through predefined transactions.

Reconciled Data
A special type of derived data that occurs at an historical, detailed level and is designed and used to ensure consistency of data across the entire enterprise.

Refresh
A replication mode based on bulk rewriting of the target data using static captured data.

Replication
Copy data from source databases, with minimal scrubbing or clean-up of the data.

ROLAP (Relational OLAP)
A product that provides multidimensional analysis of data, aggregates and metadata stored in an RDBMS.  The multidimensional processing may be done within the RDBMS, a mid-tier server or the client.  A 'merchant' ROLAP is one from an independent vendor which can work with any standard RDBMS.

Return to Top


S

Scope
Described in terms of what the project will produce and what the project will not produce.

Selection
A transformation function that selects all, part, or nothing of a single input record.

Separation/Concatenation
A transformation function that splits or joins data records while maintaining the original relationship to the primary record key.

Snapshot Data
A point-in-time view of the business data showing its status at a particular time, which is then kept as a (potentially) permanent record of that state of the data.

Snowflake Schema
A variant of the star schema with normalized dimension tables.

Sparse
Only a small proportion (arbitrarily, less than 0.1 percent) of potential data cells actually occupied in a multidimensional structure.

Sponsor
A senior manager or executive who has a business sake in the results of the project.

SQL (Structured Query Language)
The standard data structuring and access language used by relational databases.   MDX is based loosely on SQL.

Staging Area
Buffer area used to accumulate data from multiple data sources prior to processing the data with an ETL tool.

Staging Data
Data packaged for delivery to data warehouse or data mart.

Star Schema
A database design, particularly suitable for ad hoc queries, in which dimensional data (describing how data is commonly aggregated) is separated from fact or event data (describing individual transactions in the business).

A relational database schema for representing multidimensional data.  The data is stored in a central fact table, with one or more tables holding information on each dimension.  Dimensions have levels, and all levels are usually shown as columns in each dimension table.

Static Capture
A method of capturing a time-independent or static view of all or a subset of the data in a source data set.

Steering Committee
An executive group, drawn from all parts of the business, responsible for prioritizing projects in the data warehouse.

Stovepipe Data Mart
Also called an independent data mart, is not integrated with central metadata.   There are no common business rules, semantics, and definitions.  Stovepipe data marts support the needs of individual business units, but do not support an enterprise data warehousing strategy.

Subject-oriented Data
Data structured around the major entities in an enterprise data model and largely independent of the creation and usage processes.

Summarization
Consolidated data summarized on one or more dimensions, or axes.

Summary Tables
Often used in RDBMSs to store pre-aggregated information, rather than holding it in the same table as the base data.  Used to improve responsiveness.

Return to Top


T

Target Data Base
Database for a data warehouse or data mart that is to be populated with data from one or more source systems.

Tensor
The internal name for Microsoft's OLAP API, a set of OLE COM objects and interfaces designed to add multidimensionality to OLE DB.  It seems set to become the de facto industry standard multidimensional API, being adopted both by most from-end OLAP tools vendors and by several other OLAP database vendors.  The official name will be OLE DB for OLAP, and the 1.0 specification was published in February 1998.

Transaction Processing
The operational systems used to collect and manage the base data of an organization.

Transformation
Data refinement through aggregation, summarization, partitioning, and reorganization to make it usable for Decision Support System analysis.

A component of data replication that converts data between different logical or physical structures according to predefined rules.

Transient Data
Real-time data in which changes to existing records overwrite the previous data, and deletions physically erase records, leading to a loss of the historical record of the changes that data has undergone.

Triggered Capture
A method for immediate capture of incremental data changes, dependent on the use of trigger technology in the source DBMS.

Return to Top


U

Update
A replication mode using captured change data to maintain the target data incrementally.

Return to Top


V

Virtual Data Warehouse
Users are given direct access to source databases and data is not moved from the source databases.

Provides users with direct access to operational data.  Data is not moved from the source database and there is no target database.

Return to Top


W

Warehouse Architecture
Establishes the framework, standards, and procedures for the data warehouse at an enterprise level.

Warehouse Platforms
Provides hardware, software, and communications operating environments for warehouse tools.

Warehouse Standards
Provides consistency and quality of warehouse deliverables, and guide the expectations of warehouse developers and warehouse customers.

Warehouse Tools
Used to develop, implement, operate, support, and use the data warehouse.

Return to Top


X

 

Return to Top


Y

 

Return to Top


Z

 

Return to Top


Last Modified: Monday, September 2, 2002 11:40 AM