Info > Data Mining > 3-1. DM and DW > The Multitiered Architecture for a DW
¢¹¢º The Multitiered Architecture for a DW
¡Ù The multitiered approach to data warehousing recognizes that data needs come in many different forms.
¡æ provides a comprehensive solution for managing data for decision support.
¢º The major components of this architecture
¢¹ Source systems
¡Ø where the data comes from
¢¹ Data transport and cleansing
¡Ø move data between different data stores
¢¹ Central repository
¡Ø the main store for the DW.
¢¹ Metadata
¡Ø describes what s available and where.
¢¹ Data Marts
¡Ø provide fast, specialized access for end users and applications
¢¹ Operation feedback
¡Ø integrates decision support back into the operational systems
¢¹ end users
¡Ø the reason for developing the warehouse in the first place.
¡Ù One or more of these components exist in virtually every system called a DW.
: consisting of hardware, sorftware, and networks.
¢º Source systems
¢¹ the operational system inside the organization, providing the lowest level of data.
¡Ø for operational use, not for decision support.
¢¹ Examples
¡Ø a retail point of sales system. :
"returned dollars"
¡æ purchase and returns at the same time
¡æ no value
¡æ a negative
(the amount was subtracted from the sales total.)
¡Ø a telephone switching system
¡Ù each system has its own peculiarities. although well designed when
built, operational systems respond to changes in the business environment.
¢¹ when dealing with multiple systems, the data issues multiply quickly.
¢¹ Four deposit systems have four different, incompatible codes for account status.
¢¹ Product descriptions depend on which system contains the code.
¢¹ Different systems use their own versions of account numbers and customer ids.
¢¹ Examples : "shipping date"
¡Ø planned shipping date.
¡Ø actual shipping date.
¢¹ Bringing the data together in a consistent format is almost always
the most expensive part of implementing a data warehousing solution.
¢¹ Run on a wide range of hardware and much of the software.
¡Ø mainframe, midrange systems, generally use complicated and proprietary
file structure.
¡Ø mainframe : hold and process data
¢º Data transport and cleansing
¢¹ data transport and cleansing tools
¡Ø the software used to move data from the structure system to the warehouse
or analysis environment.
¡Ø the responsibility of programmers writing special purpose code as the need for
data arose.
¢¹ Several products
¡Ø offer graphical user interfaces to describe the mapping from one source system to another and offer error checking and verification of out of bounds conditions on the data.
¢¹ Transformation rules
¡Ø COBOL, RPG, some scripting language.
¢¹ The goal of these tools
¡Ø describe where data comes from and what happens to it.
¢º Central Repository
¡Ø the most technically advanced part of the DW.
¡Ø the database containing the data.
¢¹ three key features
¡Ø scalable hardware.
¡Ø relational database system.
¡Ø logical data model.
¢¹ scalable hardware
¡Ø the ability of the hardware to grow virtually without limit.
¡Ø supports more users, more data, more processing power. ¡æ possible : parallel
technology.
¡Ø parallel machines
¡Ù grow by adding more disk, more memory, more processing units, more
bandwidth between processors.
¡Ù This is particularly important for DW where the amount of data can quickly grow
into the hundreds of gigabytes or terabytes of data, requiring dozens of processors
and hundreds of disks for the system.
¢¹ relational databases
¡Ø matured to take advantage of scalable hardware platforms for all the data-intensive operations. : loading data, building indexes, backing up the database, and processing queries.
¢¹ a logical data model
¡Ø a data model describes the structure of the data inside a database.
¡Ø confused with the physical layout of the database.
¡æ the purpose of the physical layout is to maximize performance and to inform database
administrators.
¡æ the purpose of the logical data model is to communicate the contents of the database
to a wider audience.
¡Ø the business user must be able to understand the logical data model - entities, attributes, and relationships.
¢º Metadata
¡Ø an often overlooked component of the DW environment.
¡Ø in the narrowest sense "data about data"
¡Ø the basic metadata is the database schema : the physical layout of
the data in tables.
¡Ø a good metadata system
¡æ the annotated(explain the entities and attributes, including valid
values) logical data model.
¡æ mapping from the logical data model to the source systems.
¡æ physical schema.
¡æ mapping logical model to physical schema.
¡æ common views and formulas for accessing the data.
¡æ load and update information.
¡æ security and access information.
¡Ø metadata makes this information(in scripts written by the DBA, in email messages, in documentation, in the system tables in the database) available to the users in a format they can readily understand.
¢º Data Mart
¡Ø typically hundreds or thousands of users that can make use of the data inside a DW.
¡Ø different users have different needs.
¡Ø problems
¡æ a single centralized system centralizes control over the data and
the systems. End user may want to exert more control over their information environment.
¡æ a single centralized system is subject to delays and preemption when new data
sources and new capabilities are put on-line.
¡æ to satisfy a wide range of requests, the design has to sacrifice performance in
all areas.
¡æ breaking out the costs for a centralized system supporting many different departments
is difficult from a charge-back and accounting perspective.
¡Ø the solution to these problems is the data mart (department data warehouse).
¡Ø a data mart is a specialized system that brings together the data needed for a department or related application.
¡Ø Kinds
¡æ data marts can be implemented within the central repository by creating
special, application-specific view on the data in the base tables.
¡æ instantiated view
¡ä an optimization of the view where the data for a particular view is placed into another table and kept up-to-date with the original data.
Info > Data Mining > 3-1. DM and DW > The Multitiered Architecture for a DW