| Aggregate : |
Summarized data. For example, unit sales of a particular product could be aggregated by day, month, quarter and yearly sales. |
|
| Aggregation : |
The process of consolidating data values into a single value. For example, sales data could be collected on a daily basis and then be aggregated to the week level, the week data could be aggregated to the month level, and so on. The data can then be referred to as aggregate data. Aggregation is synonymous with summarization, and aggregate data is synonymous with summary data. |
|
| Aggregations : |
Information stored in a data warehouse in a summarized form. |
|
| Algorithm : |
Any well-defined computational procedure that takes some value or set of values as input, and produces some value or set of values as output. |
|
| Analysis Services : |
Business Intelligence tools included with Microsoft SQL Server 2000. The same product in Microsoft SQL Server 7.0 was called OLAP Services. The name was changed because the Microsoft SQL Server 2000 version included data mining capabilities as well as the OLAP capabilities. |
|
| Association Rules : |
Rules that state a statistical correlation between the occurrences of certain attributes in a database table. The general form of an association rule is X1, X2….Xn =>y. This means that the attributes X1, X2….Xn predict Y. |
|
| Attribute : |
Additional information included with a dimension that is not used in defining the levels of the dimension. A descriptive characteristic of one or more levels. Attributes represent logical groupings that enable end users to select data based on like characteristics. In RDBMS, an attribute is a column in a dimension that characterizes elements of a single level. |
|
| Bit(s) : |
Acronym for binary digit: A unit of computer storage that can store 2 values, 0 or 1. |
|
| Business Intelligence Tools : |
Software that enables business users to see and use large amounts of complex data. |
|
| Byte : |
A standard unit of storage. A byte consists of 8 bits. Basically, a byte is a sufficient to store a single character. |
|
| Cell : |
A cell is a single point in a cube. Cubes have cells for all of the possible combinations of points from all of the cube's dimensions. |
|
| Changed Data Capture : |
In a database replication, changed data capture occurs when only the data that has changed since the previous replication is copied |
|
| Changing Dimensions : |
A dimension that has level or attribute data that needs to be updated |
|
| Classification : |
A subdivision of a set of examples into a number of classes. |
|
| Cleansing : |
The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process |
|
| Clickstream Data : |
Data regarding web browsing |
|
| Cluster : |
A tightly coupled group of SMP machines, with the database is shut down. |
|
| Coding : |
Operations on a database to transform or simplify data in order to prepare it for a machine-learning algorithm |
|
| Cold Backup : |
A database backup taken while the database is shut down. |
|
| Common Warehouse Meta Data (CWM) : |
A repository standard used by Oracle data warehousing, decision support and OLAP tools including Oracle Warehouse Builder. The CWM repository schema is a standalone product that other products can share each product owns only the objects within the CWM repository that it creates. |
|
| Confidence : |
Given the association rule X1, X2…Xn=>Y, the confidence is the percentage of records for which Y holds, within the group of records for which X1, X2…Xn holds. |
|
| Conformed Dimension : |
A dimension that is used in more than one cube. The use of conformed dimensions and shared measures is the primary way a set of data marts can be united into one consolidated data warehouse. |
|
| Cube (Also Known As Multidimensional Cube) : |
The fundamental structure for data in a multidimensional (OLAP) system. A cube contains dimensions, hierarchies, levels, and measures. Each individual point in a cube is referred to as a cell. |
|
| Data : |
Data is the reality that a computer records, stores, and processes. or Any symbolic representation of facts or ideas from which information can potentially be extracted. |
|
| Data Cleansing : |
Removing errors and inconsistencies from data being imported into a data warehouse. |
|
| Data Mart : |
A data warehouse that is designed for a particular line of business, such as sales, marketing, or finance. In a dependent data mart, the data can be derived from an enterprise-wide data warehouse. In an independent data mart, data can be collected directly from sources. |
|
| Data Mart (Also Known As: Local Data Warehouse) : |
A database that has the same characteristics as a data warehouse, but is usually smaller and is focused on the data for one division or one workgroup within an enterprise. |
|
| Data Migration : |
The movement of data from one environment to another. This happens when data is brought from a legacy system into a data warehouse. |
|
| Data Mining : |
The actual discovery phase of a knowledge discovery process. |
|
| Data Quality Assurance (Also Known As: Data Cleansing or Data Scrubbing) : |
The process of checking the quality of the data being imported into the data warehouse. |
|
| Data Scrubbing : |
Removing errors and inconsistencies from data being imported into a data warehouse. |
|
| Data Selection : |
The stage of selecting the right data for a KDD process. |
|
|
| Agent : |
An application that searches the data and sends an alert when a particular pattern is found |
|
| Data Source : |
A database, application, repository, or file that contributes data to a warehouse. |
|
| Data Transformation : |
The modification of data as it is moved into the data warehouse. |
|
| Data Warehouse : |
A relational database that is designed for query and analysis rather than transaction processing. A data warehouse usually contains historical data that is derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables a business to consolidate data from several sources. In addition to a relational database, a data warehouse environment often consists of an ETL solution, an OLAP engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users. |
|
| Data Warehousing : |
The process of visioning, planning, building, using, managing, maintaining, and enhancing data warehouses and/or data marts. |
|
| Data Warehousing Management : |
The ongoing supervision of the data warehousing process. Data warehousing is an ongoing process. All of the issues that need to be addressed when a data warehousing project is started also need to be addressed as the data warehouse is used and, most likely, expanded. |
|
| Data-Based Knowledge : |
Knowledge derived from data through the use of Business Intelligence Tools and the process of Data Warehousing. |
|
| Database Management System (DBMS) : |
The software that is used to store, access, and manage data. There are two main types of Database Management Systems used for business intelligence and data warehousing - specialized Multidimensional Database Management Systems (MDBMS) and the more widely used general purpose Relational Database Management Systems (RDBMS). |
|
| Decision Support System (DSS) : |
A computer system designed to assist an organization in making decisions. |
|
| Decision Trees : |
A decision tree consists of nodes and branches, starting from a single root node. Each node represents a test or decision. Depending on the outcome of the decision, one chooses a certain branch and when a terminal node(or leaf) is reached a decision on a class assignment is made. |
|
| Deep Knowledge : |
Knowledge that is hidden within a database and can only be recovered if one is given certain clues. |
|
| Denormalize : |
The process of allowing redundancy in a table so that it can remain flat. Contrast with normalize. |
|
| Derived Fact (or Measure) : |
A fact (or measure) that is generated from existing data using a mathematical operation or a data transformation. Examples include averages, totals, percentages, and differences. |
|
| Dimension : |
A structure often composed of one or more hierarchies that categorize data. Several distinct dimensions, combined with measures, enable end users to answer business questions. Commonly used dimensions are customer, product, and time. |
|
| Dimension Data : |
Data stored externally to the fact data, which contains expanded information on specific dimensions of the fact data. |
|
| Dimension Table : |
In a star schema, a table which contains the data for one of the cube's dimensions. The dimension table has a primary key which is used to connect it to the fact table. The dimension table has one field for each level of each hierarchy contained in the dimension. The data values in these fields become the members of each of the dimension's levels. The dimension table has as many attribute fields as possible. These fields describe individual characteristics of the dimension. |
|
| Dimensionalization : |
The process of transforming data into a multidimensional (or star) schema. |
|
| Drill : |
To navigate from one item to a set of related items. Drilling typically involves navigating up and down through the levels in a hierarchy. When selecting data, you can expand or collapse a hierarchy by drilling down or up in it, respectively. See also drill down, drill up. |
|
| Drill Down : |
To expand the view to include child values that are associated with parent values in the hierarchy. |
|
| Drill Down and Drill Up : |
The ability to move between levels of the hierarchy when viewing data with an OLAP browser. Drill Down: Changing the view of the data to a greater level of detail. Drill Up: Changing the view of the data to a higher level of aggregation. |
|
| Drill Up : |
To collapse the list of descendant values those are associated with a parent value in the hierarchy. |
|
| DTS (Data Transformation Services) : |
An ETL tool provided as a part of Microsoft SQL Server. DTS was first released with SQL Server 7.0. It provides a design environment for creating data transformation applications. |
|
| Enrichment : |
A stage of the KDD process in which new data is added to the existing selection. |
|
| ETL : |
Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into a data warehouse. The order in which these processes are performed varies. Note that ETT (extraction, transformation, transportation) and E T M (extraction, transformation, move) are sometimes used instead of ETL. |
|
| Extraction : |
The process of taking data out of a source as part of an initial phase of ETL |
|
| Fact Data : |
The fact data is the basic core data that will be stored in the data warehouse. It is called fact data because it will correspond to business-related facts such as call record data for a Telco, or account transaction data for a bank. |
|
| Fact Table : |
In a star schema, the central table which contains the individual facts being stored in the database. There are two types of fields in a fact table: 1. The fields storing the foreign keys which connect each particular fact to the appropriate value in each dimension. 2. The fields storing the individual facts (or measures) – such as number, amount, or price. |
|
| File-to-table Mapping : |
Maps data from flat files to tables in the warehouse. |
|
| Genetic Algorithm : |
A class of machine-learning algorithm that is based on the theory of evaluation. |
|
| Granularity : |
The level of detail of the facts stored in a data warehouse. |
|
| Hidden Knowledge : |
It is Knowledge that is hidden in a database and that cannot be recovered by a simple SQL query. More advanced machine-learning techniques involving many different SQL queries are necessary to find hidden knowledge. |
|
| Hierarchy : |
A logical structure that uses ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation; for example, in a Time dimension, a hierarchy might be used to aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path, regardless of whether the levels in the hierarchy represent aggregated totals. |
|
| Hybrid OLAP (HOLAP) : |
A combined use of Relational OLAP (ROLAP) and Multidimensional OLAP (MOLAP). In HOLAP, the source data is usually stored using a ROLAP strategy and aggregations are stored using a MOLAP strategy. This combination usually results in the least amount of storage space and the fastest cube processing. |
|
| Hyper-Cube : |
A cube with more than three dimensions. A cube is an object with three dimensions. A hyper-cube is a cube-like structure with more than three dimensions. In the world of OLAP, hyper-cubes are nearly always simply referred to as cubes. |
|
| Information : |
Meaningful data |
|
| KDD: Knowledge Discovery in Databases : |
The non-trivial extraction of implicit, previously unknown and potentially useful information from data. |
|
| Knowledge : |
A collection of interesting and useful patterns in a database. |
|
| Learning : |
An individual learns how to carry out a certain task by making a transition from a situation in which the task cannot be carried out to a situation in which the same task can be carried out under the same circumstances. |
|
| Legacy System : |
A computer system that's been around for a while. Sometimes organizations have several legacy systems that have been developed at different times by different people for a variety of purposes. The data in these systems is usually mutually incompatible and sometimes inaccurate. One of the biggest challenges of the data warehousing process is to bring data out of the variety of systems where it currently is located and organize it so it all fits together in the data warehouse. |
|
| Level : |
The hierarchies in dimensions have levels which can be used to view data at various levels of detail. A Time dimension could have levels for Year, Quarter, Month, and Day. |
|
| Local Cube : |
A cube contained in a file. Microsoft Analysis Services (OLAP Services) provides the ability to take all or a subset of a server cube and create a local cube file. The local cube can be used to analyze OLAP data while the user is disconnected from the network. |
|
| Materialized View : |
A pre-computed table comprising aggregated and/or joined data from fact and possibly dimension tables, also known as a summary or aggregate table |
|
| Measure : |
A numeric value stored in a fact table and in an OLAP cube. |
|
| Meta Data : |
Data about data. Meta data is data that describes the source, location and meaning of another piece of data. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the data warehouse. |
|
| Metadata : |
Data that describes the data in the warehouse. |
|
| Metric : |
Another term used for Dimension |
|
| Multi-dimensional Knowledge : |
A table with n independent attributes can be seen as an n-dimensional space. Some regularities represented in such a table cannot easily be detected when the table has a standard two-dimensional representation. WE have to explore the relationship |
|
| Multidimensional Analysis : |
Also Known As OLAP (On-Line Analytical Processing): A process of analysis that involves organizing and summarizing data in a multiple number of dimensions. |
|
| Multidimensional Database Management System (MDBMS) : |
A database management system that organizes data multidimensionally. |
|
| Neural Networks : |
A class of learning algorithm consisting of multiple nodes those communicates through their connecting synapses. Neural networks initiate the structure of biological nervous systems. |
|
| Non-Volatile : |
Data that does not change. Data is stable in a data warehouse. More data is added, but data is never removed. This enables management to gain a consistent picture of the business. Non-volatility is one of the original defining characteristics of a data warehouse. |
|
| Normalization : |
The process of organizing data in accordance with the rules of a relational database. |
|
| Normalize : |
In a relational database, the process of removing redundancy in data by separating the data into multiple tables. Contrast with denormalize. |
|
| OLAP (On-Line Analytical Processing) : |
The use of computers to analyze an organization's data. OLAP is the most widely used term for multidimensional analysis software. The term, On-Line Analytical Processing, was developed to distinguish data warehousing activities from On-Line Transaction Processing, the use of computers to run the on-going operation of a business. |
|
| OLAP Services : |
Business Intelligence tools included with Microsoft SQL Server 7.0. OLAP Services was extended and renamed as Analysis Services in SQL Server 2000. |
|
| Online Backup : |
A database backup taken while the database is open and potentially in use. The RDBMS will need to have special facilities to ensure that data in the backup is consistent. |
|
| Operational Data Store (ODS) : |
The cleaned, transformed data from a particular source database. |
|
| Overnight Processing : |
Any processing that needs to be done to accomplish the daily loading, cleaning, aggregation, maintenance and backup of data, in order to make that new data available to the users for the following day. |
|
| Patterns : |
Structures in a database those are statistically relevant. |
|
| Prediction : |
The result of the application of a theory or a rule in a specific case. |
|
| Query Tools : |
Tools designed to query a database. |
|
| Replication : |
The physical copying of data from one database to another. In data warehousing replication takes place as data is moved from the on-line transaction processing system into the data warehouse. Replication also takes place if one or more data marts are being populated with data from the data warehouse. |
|
| Scale, Scalable, and Scalability : |
Having to do with the ability of a computer system or a database to operate efficiently with larger quantities of data. Scalability is often discussed in situations when multiple processors are joined together. The system scales well (or is scalable) if doubling the number of processors also doubles the speed at which the system performs its tasks. The extra work involved in coordinating larger systems usually prevents them from being fully scalable, so that going from one to two processors would increase the total speed by less than a factor of two. |
|
| Schema : |
The logical organization of data in a database. |
|
| Shallow Knowledge : |
The information stored in a database that can be retrieved with a single query. |
|
| Shared Dimension : |
A dimension used by more than one cube; in general, a dimension that is used by more than one cube is called a conformed dimension. |
|
| Slice and Dice : |
The ability to move between different combinations of dimensions when viewing data with an OLAP browser. |
|
| Slowly Changing Dimensions (SCD) : |
Dimensions that has levels or attributes that are changing on an occasional basis. |
|
| Snowflake Schema : |
A variant of the star schema where each dimension can have its own dimensions. |
|
| Sparsity and Density, Sparse and Dense : |
The degree to which the cells of a cube are filled with data. |
|
| Star Schema : |
A logical structure that has a fact table in the centre with dimension table radiating off of this central table. The fact table will generally be extremely large in comparison to its dimension tables. There will be a foreign key in the fact table for each dimension table. |
|
| Summary Tables : |
Tables used to store summarized or aggregated data. |
|
| Supervised Algorithms : |
Algorithms that need the control of a human operator during their execution |
|
| Time-variant data : |
Data that is identified with a particular time period. Time-variant is one of the original defining characteristics of a data warehouse. |
|
| Transportation : |
The process of moving copied or transformed data from a source to a data warehouse. |
|
| Unsupervised Algorithms : |
The opposite of supervised algorithms. |
|
| Validation : |
The process of verifying metadata definitions and configuration parameters. |
|
| Verification : |
The validation of a theory on the basis of a finite number of examples. |
|
| Versioning : |
The ability to create new versions of a data warehouse project for new requirements and changes. |
|
| Virtual Cube : |
The term used in Microsoft's Analysis Services (OLAP Services) for a cube that is created from portions of one or more base cubes. A virtual cube is similar to a view in a relational database. It can be used for security purposes, giving users access to only some of the dimensions and measures. It can also be used to show data from separate cubes at the same time. Virtual cubes are much more useful when you have shared dimensions and measures that are common to all the base cubes that are used. |
|
| Virtual Dimension : |
The term used in Microsoft's Analysis Services (OLAP Services) for a dimension that is created from one or more member properties in another dimension. |
|
| Visualization Techniques : |
A class of graphic techniques used to visualize the contents of a database. |
|
| XML : |
A method of sharing data between disparate data systems, without needing a direct connection between them. |
|
| XML for Analysis Services : |
An XML schema that can be used to communicate with a Microsoft Analysis Server. |
|