Data Warehousing Dictionary
- Agent
- A program to perform standard functions autonomously. Typical uses are for
automated data loading and exception reporting. Not to be confused with data mining.
- Aggregates
- Tables (usually held in a relational database) which hold precomputed totals in a
hierarchical multidimensional structure.
- Aggregation
- Accumulation of related data into a single pre-joined table.
A transformation function that groups and mathematically combines data from a set of
detailed records.
- Append
- A complete load of the captured data into the target dataset, adding to any data that
may already exist there.
- Application-assisted Capture
- A method, built into the source application, that preserves its data for immediate
capture of incremental changes in that source.
- Application-level Modeling
- Data and process modeling aimed at developing specific business function within the
scope of a single application.
- Apply
- A component of data replication that applies the captured and transformed data to the
target data store.
- Architecture
- Provides the planning, structure, and standardization necessary to ensure integration of
multiple components, projects, and processes across time.
- Archive and Retrieval
- A function providing managed and efficient storage and retrieval of data between the
different levels of a storage hierarchy.
Return to Top
- BPR (Business Process Re-engineering)
- A fundamental corporate reorganization based upon the processes that deliver value to
customers.
- Business Drivers
- The forces and pressures that have significant influence on how the business performs
and operates.
- Business Goal
- What the business needs or wants to achieve in response to the business drivers.
- Business Information Interface
- The user's means of selecting, manipulating, and displaying data in business
terminology, both initially and for later reuse across the organization.
- Business Information Needs
- Information that is needed by business to make their business decisions.
- Business Processes
- Sets of activities performed by the business to produce products that have value to a
customer.
- Business Units
- Organizational units that share common business drivers, goals, information needs and
processes.
Return to Top
- Capture
- A component of data replication that interacts with a source data store to obtain a copy
of some or all of the data contained therein, or a record of changes that have occurred
there.
- Cell
- Data point defined by one member of each dimension of a multidimensional structure.
Often, potential cells in multidimensional structures are empty, leading to
'sparse' storage.
- Conceptual Models
- The product of analysis activities, describing requirements without consideration from
computer implementation.
- Constructive Merge
- A method of applying captured changed data to a periodic dataset, which builds and
maintains the historical record of changes to the source data.
- Contextual Models
- Describes the scope of requirements, establishing a context for analysis.
- Conversion
- A transformation function that converts the value in a single data field to some related
form.
Return to Top
- Data
- The computerized representation of business information.
- Data Analysis-and-Prediction Tool
- An end-user tool that provides an end user with a variety of ways of looking at data
with a view to finding the right questions to ask.
- Data Cleansing or Scrubbing
- Statistical and other techniques used to eliminate variations in data content, to reduce
redundancy, and generally to improve the consistency and usability of raw data.
- Data Mart
- A localized, single-purpose data warehouse implementation.
Typically a small data warehouse or subset of data warehouse designed and constructed to
serve the needs of a single, homogenous user group.
- Data Mining
- Delivers information for research and investigative uses of the data warehouse.
Explores the warehouse contents at a detail level to identify and report the patterns
inherent in the data.
The process of using statistical techniques to discover subtle relationships between data
items, and the construction of predictive models based on them. The process is not
the same as just using an OLAP tool to find exceptional items. Generally, data
mining is a very different and more specialist application than OLAP, and uses different
tools from different vendors. Normally the users are different, too. OLAP
vendors have had little success with their data mining efforts.
- Data Replication
- A set of techniques that provides comprehensive support for copying and transforming
data from source to target location in a managed, consistent, repeatable, and
well-understood manner.
- Data Transfer
- The data transport function, that underlies data replication and data access, and is
responsible for transferring data--ranging from messages to complete files--between data
warehouse components in any combination of locations, hardware, or software.
- Data Warehouse
- A subject-oriented, integrated, non-volatile, time-variant, enterprise wide collection
of data organized to support management needs.
A single, complete and consistent store of data obtained from a variety of sources and
made available to end users in a way they can understand and use in a business context.
- Data Warehouse Pilot
- The first development project of the warehouse implementation process, designed to prove
the viability of the overall approach.
- Data Warehousing
- The coordinated, architected, and periodic copying of data from various sources, both
inside and outside the enterprise, into an environment optimized for analytical and
informational processing.
- DBMS (Data Base Management System)
- Used to store, process and manage data in a systematic way. May use a variety of
underlying storage methods, including relational, multidimensional, network and
hierarchical.
- Delayed Capture
- Capture of changes at specified times, producing a complete record of changes only in
periodic data.
- Dense
- The majority, or significant minority (at least ten percent), of potential data cells
actually occupied in a multidimensional structure.
- Dependent Data Mart
- Also called an architected data mart. Shares common business rules, semantics, and
definitions. A Dependent Data Mart reads metadata from a central metadata repository
to define local metadata. This ensures that all components of the architecture are
linked via common metadata.
- Derived Data
- Point-in-time or periodic data, at a detailed or summary level, derived by some process
from real-time data and used to manage the business.
- Destructive Merge
- A method of applying captured changed data to a target dataset that maintains an exact
copy of the source data at the time of capture.
- Dimension
- An index by which you can access facts according to the value (or values) wanted.
- Dirty Data
- Inconsistent, missing, incomplete, or erroneous data. Source data often contains a
high percentage of "dirty" data.
- DSS (Decision Support Systems)
- Delivers information to aid management decisions.
Application for analyzing large quantities of data and performing a wide variety of
calculations and projections.
Return to Top
- EIS (Executive Information Systems)
- Delivers information targeted at understanding business trends.
- Enrichment
- A transformation function that combines data from two or more fields in one or more
records to create a new field or fields in the output record.
- Enterprise Modeling
- Modeling whose focus is a complete and integrated view of all the data and processes in
the business.
- ETL (Extraction, Transformation, Load) Tool
- Used to extract data from source environments, cleanse the data, transform the data to
resolve logical inconsistencies, reorganize the data to make it understandable to end
users, and load the data into target databases.
- External Data
- Data from outside the enterprise.
- Extraction
- Copy data from multiple sources, with resolution of data quality and consistency issues.
Return to Top
- Fact
- An instance of some particular occurrence or event and the properties of the event all
stored in a database.
- File Comparison
- A method of capturing a possibly incomplete set of changed data by comparing two
versions of the source data.
Return to Top
- Granularity
- Degree of summarization of data.
Return to Top
- High-level Enterprise Model
- A formal, high-level description in business terms of the information needed and used to
manage the entire business, understood by business users and IS personnel.
- HOLAP (Hybrid OLAP)
- A product that can provide multidimensional analysis simultaneously of data stored in a
multidimensional database and in an RDBMS. Becoming a popular architecture for
server OLAP.
- Hypercube
- An OLAP product that stores all data in a single cube which has all the application
dimensions applied to it.
Return to Top
- Immediate Capture
- Capture of changes at the time they occur, ensuring a complete record of changes in
transient, semi-periodic, and periodic data.
- Incremental Capture
- A method of capturing a record of the changes that take place in a source dataset.
- Incremental Implementation
- Develops and delivers a subset of the total warehouse. A pragmatic approach to
building an enterprise level warehouse in a segmented, evolutionary fashion.
Return to Top
Return to Top
Return to Top
- LAN (Local Area Network)
- High speed connection between desktop PCs and server machines, including file servers,
application servers, print servers and other services.
- Load
- A complete load of the captured data into the target dataset, irrespective of any
existing data.
- Logical Models
- Design products, describing data from a computer system perspective, yet free of any
implementation platform specifics.
Return to Top
- MDB (Multidimensional Data Base)
- A product that can store and process multidimensional data.
- MDX (Multidimensional Expression Language)
- The multidimensional equivalent of SQL. The language used to define
multidimensional data selections and calculations in Microsoft's OLE DB for OLAP API
(Tensor). It is also used as the calculation definition language in Microsoft's OLAP
Services.
- Metadata
- Source definitions, data models, calculations, transformation and conversion
rules.("data about data")
Data that describes the meaning and structure of business data, as well as how it is
created, accessed, and used.
Data about data. How the structures and calculation rules are stored, plus,
possibly, additional information on data sources, user privileges, etc.
- Metadata exchange
- Application program interface that enables products from multiple vendors to read and
exchange metadata using a common format. Microsoft's OLE DB for OLAP API has become
the industry standard for exchanging metadata between data warehousing products.
- Methodology
- A general structure that describes the steps necessary to produce deliverables common to
many environments.
- Minicube
- A subset of a hypercube, with fewer dimensions than the encompassing hypercube.
The hypercube will consist of a collection of logically similar minicubes.
- Model-driven Replication
- An approach to replication in which the data and mapping definitions start from a
business viewpoint and lead to logical- and physical level definitions based on a layered
enterprise modeling approach.
- Multicube
- An OLAP product that can store data in the form of a number of multidimensional
structures which together form an OLAP database.
- Multidimensional
- Data structure with three or more independent dimensions.
Return to Top
- Normalization/Denormalization
- A transformation function that splits or joins data records, while maintaining
relationships between primary and secondary record keys.
Return to Top
- OLAP (Online Analytical Processing)
- Delivers information to meet the needs of Buenos analysts.
- OLE DB
- New Microsoft database connectivity standard, replacing the older ODBC standard.
- OLE DB for OLAP
- Microsoft's OLAP API, likely to become an industry standard. Used to link OLAP
clients and servers using a standard multidimensional language MDX.
- OLTP (OnLine Transaction Processing)
- Transaction-based systems that are central to operation of the business and collect
large amounts of data at a highly detailed level.
- Operational Data
- Data in enterprises business applications
- Operational Data Store
- Addresses operational needs in the areas of keyed retrieval, update, and access of
non-integrated operational data.
A modeled store of detailed operational data designed and structured to allow better
integration among such data when it is used for immediate inquiry.
A set of integrated, scrubbed data without history or summarization provided for tactical
decision support. ODSs typically support operational systems and provide a single
view of operational data. They are commonly used to populate data warehouses and
data marts.
- Operational Systems
- Applications that run the business on a day-to-day basis using real-time data.
Return to Top
- Partitioning
- Splitting of target data into smaller units.
- Periodic Data
- Data recording the history of the business over a period of time by maintaining a
complete record either of all statuses or of all events that have occurred.
- Physical Models
- Detailed design and specifications of data structures to be implemented with specific,
known technology.
- Process Management
- A set of functions supporting the definition of inter-related process steps and the
management of their execution across a variety of hardware and software platforms; used
mainly by data replication.
Return to Top
- Query-and-Reporting Tool
- An end-user tool designed to support the definition of queries and reports where the
analysis need is well understood in business terms.
Normally used for list-oriented reporting from relational databases.
Return to Top
- RDBMS (Relational Data Base Management System)
- Used to store, process and manage data arranged in relational tables. Often used
for transaction processing and data warehouses.
- Real-time Data
- Up-to-the-second, detailed data used to run the business and accessed in read/write
mode, usually through predefined transactions.
- Reconciled Data
- A special type of derived data that occurs at an historical, detailed level and is
designed and used to ensure consistency of data across the entire enterprise.
- Refresh
- A replication mode based on bulk rewriting of the target data using static captured
data.
- Replication
- Copy data from source databases, with minimal scrubbing or clean-up of the data.
- ROLAP (Relational OLAP)
- A product that provides multidimensional analysis of data, aggregates and metadata
stored in an RDBMS. The multidimensional processing may be done within the RDBMS, a
mid-tier server or the client. A 'merchant' ROLAP is one from an independent vendor
which can work with any standard RDBMS.
Return to Top
- Scope
- Described in terms of what the project will produce and what the project will not
produce.
- Selection
- A transformation function that selects all, part, or nothing of a single input record.
- Separation/Concatenation
- A transformation function that splits or joins data records while maintaining the
original relationship to the primary record key.
- Snapshot Data
- A point-in-time view of the business data showing its status at a particular time, which
is then kept as a (potentially) permanent record of that state of the data.
- Snowflake Schema
- A variant of the star schema with normalized dimension tables.
- Sparse
- Only a small proportion (arbitrarily, less than 0.1 percent) of potential data cells
actually occupied in a multidimensional structure.
- Sponsor
- A senior manager or executive who has a business sake in the results of the project.
- SQL (Structured Query Language)
- The standard data structuring and access language used by relational databases.
MDX is based loosely on SQL.
- Staging Area
- Buffer area used to accumulate data from multiple data sources prior to processing the
data with an ETL tool.
- Staging Data
- Data packaged for delivery to data warehouse or data mart.
- Star Schema
- A database design, particularly suitable for ad hoc queries, in which dimensional data
(describing how data is commonly aggregated) is separated from fact or event data
(describing individual transactions in the business).
A relational database schema for representing multidimensional data. The data is
stored in a central fact table, with one or more tables holding information on each
dimension. Dimensions have levels, and all levels are usually shown as columns in
each dimension table.
- Static Capture
- A method of capturing a time-independent or static view of all or a subset of the data
in a source data set.
- Steering Committee
- An executive group, drawn from all parts of the business, responsible for prioritizing
projects in the data warehouse.
- Stovepipe Data Mart
- Also called an independent data mart, is not integrated with central metadata.
There are no common business rules, semantics, and definitions. Stovepipe data marts
support the needs of individual business units, but do not support an enterprise data
warehousing strategy.
- Subject-oriented Data
- Data structured around the major entities in an enterprise data model and largely
independent of the creation and usage processes.
- Summarization
- Consolidated data summarized on one or more dimensions, or axes.
- Summary Tables
- Often used in RDBMSs to store pre-aggregated information, rather than holding it in the
same table as the base data. Used to improve responsiveness.
Return to Top
- Target Data Base
- Database for a data warehouse or data mart that is to be populated with data from one or
more source systems.
- Tensor
- The internal name for Microsoft's OLAP API, a set of OLE COM objects and interfaces
designed to add multidimensionality to OLE DB. It seems set to become the de facto
industry standard multidimensional API, being adopted both by most from-end OLAP tools
vendors and by several other OLAP database vendors. The official name will be OLE DB
for OLAP, and the 1.0 specification was published in February 1998.
- Transaction Processing
- The operational systems used to collect and manage the base data of an organization.
- Transformation
- Data refinement through aggregation, summarization, partitioning, and reorganization to
make it usable for Decision Support System analysis.
A component of data replication that converts data between different logical or physical
structures according to predefined rules.
- Transient Data
- Real-time data in which changes to existing records overwrite the previous data, and
deletions physically erase records, leading to a loss of the historical record of the
changes that data has undergone.
- Triggered Capture
- A method for immediate capture of incremental data changes, dependent on the use of
trigger technology in the source DBMS.
Return to Top
- Update
- A replication mode using captured change data to maintain the target data incrementally.
Return to Top
- Virtual Data Warehouse
- Users are given direct access to source databases and data is not moved from the source
databases.
Provides users with direct access to operational data. Data is not moved from the
source database and there is no target database.
Return to Top
- Warehouse Architecture
- Establishes the framework, standards, and procedures for the data warehouse at an
enterprise level.
- Warehouse Platforms
- Provides hardware, software, and communications operating environments for warehouse
tools.
- Warehouse Standards
- Provides consistency and quality of warehouse deliverables, and guide the expectations
of warehouse developers and warehouse customers.
- Warehouse Tools
- Used to develop, implement, operate, support, and use the data warehouse.
Return to Top
Return to Top
Return to Top
Return to Top
Last Modified:
Monday, September 2, 2002 11:40 AM