Data Warehouse and Data Mining Technologies

Information Technology Resources

A Data warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated.This makes it much easier and more efficient to run queries over data that originally came from different sources".

Another definition for data warehouse is : " A data warehouse is a logical collection of information gathered from many different operational databases used to create business intelligence that supports business analysis activities and decision-making tasks, primarily, a record of an enterprise's past transactional and operational information, stored in a database designed to favour efficient data analysis and reporting (especially OLAP)". Generally, data warehousing is not meant for current "live" data, although 'virtual' or 'point-to-point' data warehouses can access operational data. A 'real' data warehouse is generally preferred to a virtual DW because stored data has been validated and is set up to provide reliable results to common types of queries used in a business.

History of data warehousing

In the 1990's as organizations of scale began to need more timely data about their business, they found that traditional information systems technology was simply too cumbersome to provide relevant data efficiently and quickly. Completing reporting requests could take days or weeks using antiquated reporting tools that were designed more or less to 'execute' the business rather than 'run' the business.

From this idea, the data warehouse was born as a place where relevant data could be held for completing strategic reports for management. The key here is the word 'strategic' as most executives were less concerned with the day to day operations than they were with a more overall look at the model and business functions.

As with all technology, over the course of the latter half of the 20th century, we saw increased numbers and types of databases. Many large businesses found themselves with data scattered across multiple platforms and variations of technology, making it almost impossible for any one individual to use data from multiple sources. A key idea within data warehousing is to take data from multiple platforms/technologies (As varied as spreadsheets, DB2 databases, IDMS records, and VSAM files) and place them in a common location that uses a common querying tool. In this way operational databases could be held on whatever system was most efficient for the operational business, while the reporting / strategic information could be held in a common location using a common language. Data Warehouses take this even a step farther by giving the data itself commonality by defining what each term means and keeping it standard. (An example of this would be gender which can be referred to in many ways, but should be standardized on a data warehouse with one common way of referring to each sex.)

All of this was designed to make decision support more readily available and without affecting day to day operations. One aspect of a data warehouse that should be stressed is that it is NOT a location for ALL of a businesses data, but rather a location for data that is 'interesting'. Data that is interesting will assist decision makers in making strategic decisions relative to the organization's overall mission.

Design of data warehouses

Data warehouses often hold large amounts of information which are sometimes subdivided into smaller logical units called dependent data marts. Dependent Datamarts allow for easier reporting by keeping relevant data together in one location.

Usually, two basic ideas guide the creation of a data warehouse:

1. Integration of data from distributed and differently structured databases, which facilitates a global overview and comprehensive analysis in the data warehouse.

2. Separation of data used in daily operations from data used in the data warehouse for purposes of reporting, decision support, analysis and controlling.

Since OLTP databases contain large volumes of data, it is very critical to unload data quickly without adding significant overhead to production database. Periodically, one imports data from enterprise resource planning (ERP) systems and other related business software systems into the data warehouse for further processing. It is common practice to "stage" data prior to merging it into a data warehouse. In this sense, to "stage data" means to queue it for preprocessing, usually with an ETL tool. The preprocessing program reads the staged data (often a business's primary OLTP databases), performs qualitative preprocessing or filtering (including denormalization, if deemed necessary), and writes it into the warehouse.


ITtoolbox Data Warehousing: Provides a knowledge network and support environment for the IT industry. As IT professionals or business decision-makers need information to complete a task or make a decision, they turn to ITtoolbox to quickly get specific answers to their unique questions.ITtoolbox provides a knowledge network and support environment for the IT industry. As IT professionals or business decision-makers need information to complete a task or make a decision, they turn to ITtoolbox to quickly get specific answers to their unique questions. A very useful collection of articles, newsitems and features on data warehousing and related items. For the most up-to-date news and views on business intelligence check out our Industry News Site. Updated daily to give you the most current insight, our Industry News Site includes news articles and editorials on a wide range of data warehousing-related topics.

SAS Data Warehousing SAS Data Warehousing enables you to bring together a 360-degree view of your suppliers, organization, customers and enterprise and serves as the foundation for obtaining insights in a low-risk manner. And because it's part of an integrated suite of data quality applications from SAS, you know that the information you capture is of the highest quality.

Data Warehousing Information Center: This site's aim is to help readers learn about data warehousing and decision support (i.e., business intelligence) systems. The site will: Publish this site's author's essays about data warehousing and decision support Point the reader to external publications - some of the better articles and white papers that are web accessible, books, technical evaluations, periodicals, and other non-vendor sources of information.

Data Warehousing Knowledge Center - Knowledge Centers is currently in the process of updating our web site to provide more value to our members.

DM Review Magazine -None of our readers report that their business intelligence/data warehousing implementation is a failure ­ due, I'm sure at least in part, to the knowledge they've garnered by reading the expert-written articles and columns in DM Review. I'm sure you'll join me in thanking our regular columnists for sharing the expertise with all of you each month.

RedGate Consultants - RedGate was founded in 1998 and offer its services in the area of business consulting aimed at the achievement of a set goal. Itprovides Information Technology Services to businesses in the areas of E-commerce and Web Development,GUI and Client/Server Application Development, RDBMS Oracle, MS SQL,Sybase. Design, installation, support and fine-tuning of mid to very large size relational database systems and data warehousing (OLAP) systems.

First Net UK - Information solutions for insurance, banking, government, business intelligence, data warehousing, financial management, managed IT, and enterprise systems. Rid yourself of the hassle and expense of installing and maintaining your office hardware and software - let us do it for you. You just pay a monthly fixed fee for each user ... that's it, we do the rest. Your software and data is always available and you can access it from anywhere. It is secure and will be supported 24 hours a day by dedicated staff within our facilities. Add new users ... instantly. Don't pay for hardware or software that is not in use.

Network Centric Consultants - Oracle database consultants for installation, migration, upgrade, diagnostics, application development, and data warehousing. Offers information on technical support and training courses.

Transformation Systems, Inc. - TranSys offers year 2000 remediation, contract consulting, e-commerce, re-engineering, data warehousing, ERP, application development, maintenance, and internet solutions.

Data Warehousing and OLAP - Papers on data warehousing and online analytical processing available online.

Alta Plana: Online Analytical Processing (OLAP) - A collection of information related to Online Analytical Processing and selected related disciplines including Data Warehousing (DW), Decision Support Systems (DSS), and Executive Information Systems (EIS). Maintained as a public service by Alta Plana Corporation.

DM Review Magazine - Online magazine containing original articles, columns and product reviews. Includes DM Direct, a bi-monthly e-zine, large, focused sections on topics from analytic applications to data warehousing.

Mariner - An information technology consulting firm delivering Internet eBusiness and business intelligence / data warehousing software solutions - Charlotte and Raleigh, North Carolina.

SAS Institute Home Page - SAS software is an integrated suite of information delivery software for business decision making. From balanced scorecard, data warehousing, data mining, financial consolidation, knowledge management and Web enablement--SAS software empowers your organization.

Informix to map data plans - The fierce turf battle in the database market is stretching into data warehousing territory. []

Titan Data - Specializing in document imaging and conversion, data mining and warehousing, and web development and hosting.

Information Resource Management Association of Canada - Association for exchanging knowledge and experiences involving data administration, data warehousing and all aspects of information resource management.

Cognizant Technology Solutions - Provider of e-business solutions and application management services, across a wide range of technologies including internet, data warehousing and object-oriented software development, as well as legacy and client-server applications.

Echo Data Services, Inc. - Cd-rom replication, disk duplication, fulfillment, packaging and assembly, warehousing, fulfillment and electronic commerce solutions.

Intelligent Enterprise Magazine | Data Webhouse - Great collection of articles by Richard Kimball about making a web-data based data warehouse. As Kimball is the father of data warehousing, these are great articles for data architecture and database design.

Nutech Systems - AS/400 Barcode Software - Barcode data collection software written for the AS/400 platform featuring warehousing and time & attendance. Interfaces written for ERP solutions such as JBA and JDEdwards.

SoftLink - SoftLink develops file transfer technology and solutions for distributed heterogeneous environments, operating across UNIX, Windows NT, Windows 95, and OpenVMS platforms. Our FASTCopy file transfer technology for TCP/IP networks fits mission critical business needs such as data distribution, replication, synchronization and data warehousing.

Business Intelligence Advisor - Monthly newsletter that provides in-depth assessments on the products, vendors, technologies, and trends that are transforming data warehousing and business intelligence.

Network Centric Consultants - Oracle database consultants for installation, migration, upgrade, diagnostics, application development, and data warehousing. Offers information on technical support and training courses.

Hyperion Solutions Corporation - Develops high performance, OLAP software for business planning, analysis, management reporting, and data warehousing applications.

Data Mining and Warehousing for Financial Services - This conference offers you the tools & tactics to successfully create, develop, implement, & manage data mining & warehousing technology. Learn from over 30 of the industry's leading experts!

Mariner - An information technology consulting firm delivering Internet eBusiness and business intelligence / data warehousing software solutions - Charlotte and Raleigh, North Carolina.

BI Solutions Ltd. - We provide Business Intelligence consulting services in the UK. We are specialists in SAS software and are a SAS Alliance partner. Typical services are Data Warehousing, MIS and CRM. Our expertise encompasses the full project lifecycle.

IBM adds to data warehousing line - IBM is letting users shine some rays on their data warehouses thanks to technology acquired from Tanning Technology.

Teknion Consulting Services - Services include application development, data warehousing, e-business, project management, imaging, supplemental staffing, web design, LAN/network support and IT assessments.

ABC-US, Inc. - Data management solutions offering data warehousing and management devices and services. Specializing in archive and retrieval.

Generation Consulting Ltd - E-commerce consulting practice specialising in ERP, e-solutions and data mining, and data warehousing.

Soffia Software limited - Chennai-based software company; activities include e-commerce, web hosting, web design and maintenance, data warehousing, customer relationship management, supply chain management, enterprise resource planning.

CDA Data Business Ltd. - CD and DVD replication and duplication, manufacturing, print, packaging, warehousing and logistics.

Data Warehousing and OLAP Research Bibliography - List of links to online research papers on data warehousing and OLAP.

Infocube - Consultants in business intelligence, data warehousing and performance management. Offers information on Web-enabled software, as well as training programs.

Network Centric Consultants - Oracle database consultants for installation, migration, upgrade, diagnostics, application development, and data warehousing. Offers information on technical support and training courses.

Rey Consulting, Inc. - Software development and information technology consulting firm providing services to the financial services, banking, pharmaceutical, telecommunications and entertainment industries. We specialize in decision support systems, data warehousing applications and financial systems development.

Nykredit Center for Database Research - Performs research in temporal databases, data warehousing, spatio-temporal databases, and world-wide-web data management.

The Pythian Group - Oracle consulting firm specializing in data warehousing, data mining, Designer and Developer 2000. Serving clients in the United States and Mexico.

Essential Strategies, Inc. - Consulting firm specializing in requirements analysis, data modeling, process modeling, data warehousing, architecture and other data related issues. Also sells Data Model Patterns, a packaged data model.

Informix to buy Red Brick Systems - Database software maker Informix has signed an agreement to buy Red Brick Systems, a data warehousing provider, for about $35 million in stock.

Vincent Associates Inc. - Services covering data analysis, data warehousing, data mining, loyalty programs, predictive modeling, lettershop and fulfillment, customer acquisition, fund raising systems, and mapping.

Db-ology, Inc. - Offers database consulting, data warehousing, statistical analysis, e-commerce services, and data management solutions. Features news and company profile.

Datajump Inc, - Provider of fixed-time, fixed-priced IT solutions, specializing in data warehousing, e-commerce, networking, data management, and system integration.

Database and Programming Technologies - Aalborg University data warehousing, database integrity, temporal, spatial and spatio-temporal databases.

German OLAP and Data Warehouse Forum - Comprehensive list of links to white papers, (research) organizations, companies and product related to Data Warehousing, Online Analytical Processing (OLAP), and Data Mining. Content provided in both English and German.

DM Review Magazine - Monthly issues and solutions publication that focuses on data warehousing and business intelligence for the enterprise.

Informix lays brick for new unit - Informix is laying a new brick foundation for its data warehousing business.

PPD Informatics: Software and data solutions - PPD provides software solutions for the pharmaceutical and biotechnology industries. Our expertise is focused on technology and systems for clinical data management, coding and drug safety. Clinical data warehousing gives you the most for your clinical data investment by integrating your clinical data into a repository that is optimized for easier analysis and reporting. PPD Informatics offers customized and specialized software for clinical research.

The International Data Warehouse Association - Independent, non-profit organization dedicated to advancing the knowledge, theory, and applications of data warehousing and is open to all qualified professionals.

Network Centric Consultants - Oracle database consultants for installation, migration, upgrade, diagnostics, application development, and data warehousing. Offers information on technical support and training courses.

Princetec - software development and IT consulting company in princeton, NJ providing e-business solutions, e-commerce, client-server applications, data warehousing, networking, telecommunications.

Erlangen University Department of Database Systems - Research projects include data warehousing, scientific database management, workflow management, and distributed data.

Kairon Ltd - Provides independent consultancy and training services using software products in the business intelligence, OLAP and data warehousing marketplace.

Torrent Systems - Torrent's Orchestrate is a parallel programming environment used by data warehousers to process large volumes of data. Orchestrate provides the ETL and analytical components used in most data warehousing applications. Acquired by Ascential Software.

Suggestions for Further Reading

Attribution: Information for this article has been adapted from wikipedia under GNLU licence.

Translate the Page