Info > Data Mining3-1. DM and DW > Lots of Data


¢¹¢º Lots of Data

 

¢º traditional approach to data analysis

¡æ reduce the size of the data

¡Ø three common ways

¡æ summarizing detailed transactions
¡æ taking a subset of the data.
¡æ only looking at certain attributes.

 

¢º DM

¡Ø searching for tends in the data and for valuable anomalies.
¡Ø what product is this customer most likely to purchase next?
¡Ø when the traditional approach of summarizing data hits the DW, it often results in the following hybrid structure.

¡æ summarized data for all customers.
¡æ detailed data for customer in one market.
¡æ detailed data for a random subset of customers.
¡Ù it incurs additional overhead in deciding which detailed data to keep and which not to keep.

¡Ø the biggest drawback

¡æ the summary level for all customers.
¡æ 6~12 months.

¡Ø another problem

the detailed transaction data may contain actionable and profitable information on trends or customer segmentation.
¡æ unfortunately, the data is only available on a few customers.

¡Ø fortunately, DM algorithms are often able to take advantage of large amounts of data.

¡Ø CART : the algorithms for producing decision tree.

¡æ dozens or hundreds of attributes on each record.

¡Ø Neural networks can train on millions of records at a time.

 

¡Ü On Sampling and Summarization in the DW.

¢º this approach introduces more problems than its solve

¢¹ summarizations and subsets of the data that seem useful today probably will not meet future business needs.
¢¹ summarizations the data introduce dependences, such as on the definition of a customer or the regional sales hierarchy, that make the DW brittle.
¢¹ Many analyses of the data thrive on detail.

¢º summarization and subsets of data play an important role in exploiting data.

¡æ best used in the data mart.


Info > Data Mining3-1. DM and DW > Lots of Data