Info > Data Mining3-1. DM and DW > The Multitiered Architecture for a DW


¢¹¢º The Multitiered Architecture for a DW

¡Ù The multitiered approach to data warehousing recognizes that data needs come in many different forms.

¡æ provides a comprehensive solution for managing data for decision support.

 

¢º The major components of this architecture

¢¹ Source systems

¡Ø where the data comes from

¢¹ Data transport and cleansing

¡Ø move data between different data stores

¢¹ Central repository

¡Ø the main store for the DW.

¢¹ Metadata

¡Ø describes what s available and where.

¢¹ Data Marts

¡Ø provide fast, specialized access for end users and applications

¢¹ Operation feedback

¡Ø integrates decision support back into the operational systems

¢¹ end users

¡Ø the reason for developing the warehouse in the first place.

¡Ù One or more of these components exist in virtually every system called a DW.

: consisting of hardware, sorftware, and networks.

 

¢º Source systems

¢¹ the operational system inside the organization, providing the lowest level of data.

¡Ø for operational use, not for decision support.

¢¹ Examples

¡Ø a retail point of sales system. :
"returned dollars"
¡æ purchase and returns at the same time
¡æ no value
¡æ a negative
(the amount was subtracted from the sales total.)

¡Ø a telephone switching system

¡Ù each system has its own peculiarities. although well designed when built, operational systems respond to changes in the business environment.
¢¹ when dealing with multiple systems, the data issues multiply quickly.
¢¹ Four deposit systems have four different, incompatible codes for account status.
¢¹ Product descriptions depend on which system contains the code.
¢¹ Different systems use their own versions of account numbers and customer ids.
¢¹ Examples : "shipping date"

¡Ø planned shipping date.
¡Ø actual shipping date.

¢¹ Bringing the data together in a consistent format is almost always the most expensive part of implementing a data warehousing solution.
¢¹ Run on a wide range of hardware and much of the software.

¡Ø mainframe, midrange systems, generally use complicated and proprietary file structure.
¡Ø mainframe : hold and process data

 

¢º Data transport and cleansing

¢¹ data transport and cleansing tools

¡Ø the software used to move data from the structure system to the warehouse or analysis environment.
¡Ø the responsibility of programmers writing special purpose code as the need for data arose.

¢¹ Several products

¡Ø offer graphical user interfaces to describe the mapping from one source system to another and offer error checking and verification of out of bounds conditions on the data.

¢¹ Transformation rules

¡Ø COBOL, RPG, some scripting language.

¢¹ The goal of these tools

¡Ø describe where data comes from and what happens to it.

 

¢º Central Repository

¡Ø the most technically advanced part of the DW.

¡Ø the database containing the data.

¢¹ three key features

¡Ø scalable hardware.
¡Ø relational database system.
¡Ø logical data model.

¢¹ scalable hardware

¡Ø the ability of the hardware to grow virtually without limit.
¡Ø supports more users, more data, more processing power. ¡æ possible : parallel technology.
¡Ø parallel machines

¡Ù grow by adding more disk, more memory, more processing units, more bandwidth between processors.
¡Ù This is particularly important for DW where the amount of data can quickly grow into the hundreds of gigabytes or terabytes of data, requiring dozens of processors and hundreds of disks for the system.

¢¹ relational databases

¡Ø matured to take advantage of scalable hardware platforms for all the data-intensive operations. : loading data, building indexes, backing up the database, and processing queries.

¢¹ a logical data model

¡Ø a data model describes the structure of the data inside a database.

¡Ø confused with the physical layout of the database.
¡æ the purpose of the physical layout is to maximize performance and to inform database administrators.
¡æ the purpose of the logical data model is to communicate the contents of the database to a wider audience.

¡Ø the business user must be able to understand the logical data model - entities, attributes, and relationships.

 

¢º Metadata

¡Ø an often overlooked component of the DW environment.

¡Ø in the narrowest sense "data about data"

¡Ø the basic metadata is the database schema : the physical layout of the data in tables.

¡Ø a good metadata system

¡æ the annotated(explain the entities and attributes, including valid values) logical data model.
¡æ mapping from the logical data model to the source systems.
¡æ physical schema.
¡æ mapping logical model to physical schema.
¡æ common views and formulas for accessing the data.
¡æ load and update information.
¡æ security and access information.

¡Ø metadata makes this information(in scripts written by the DBA, in email messages, in documentation, in the system tables in the database) available to the users in a format they can readily understand.

 

¢º Data Mart

¡Ø typically hundreds or thousands of users that can make use of the data inside a DW.

¡Ø different users have different needs.

¡Ø problems

¡æ a single centralized system centralizes control over the data and the systems. End user may want to exert more control over their information environment.
¡æ a single centralized system is subject to delays and preemption when new data sources and new capabilities are put on-line.
¡æ to satisfy a wide range of requests, the design has to sacrifice performance in all areas.
¡æ breaking out the costs for a centralized system supporting many different departments is difficult from a charge-back and accounting perspective.

¡Ø the solution to these problems is the data mart (department data warehouse).

¡Ø a data mart is a specialized system that brings together the data needed for a department or related application.

¡Ø Kinds

¡æ data marts can be implemented within the central repository by creating special, application-specific view on the data in the base tables.
¡æ instantiated view

 

¡ä an optimization of the view where the data for a particular view is placed into another table and kept up-to-date with the original data.


Info > Data Mining3-1. DM and DW > The Multitiered Architecture for a DW