Data mining capstone
Snippets of the data mining class.
Techniques
- Frequency itemset mining
- Associated rules discovery
- Cluster analysis
- Outlier detection
- Classifier building
Association Rules Discovery
Example of association rule.
Supermarket. Purchase history of people.
- large set of items. things sold.
- large set of baskets. each basket is a subset of items.
id | items |
---|---|
1 | Bread, Coke, Milk |
2 | Beer, Bread |
3 | Beer, Coke, Diaper, Milk |
Association rules. People bought {x,y,z} tend to buy {v,w} with very high probability.
Essentially, recommendation system from Machine Learning.
Rule Measures: Support and Confidence
- Support, s, probability that a transaction contains { X & Y & Z}.
- Confidence, c, conditional probability that a transaction having {X & Y} also contains Z.
Low support means low number of transactions have {X & Y & Z}.
Good rule has high support and high confidence.
Interesting Association rules
- Not all high confidence rules are interesting.
- The rules X -> Milk may have high confidence for many itemsets X, because milk is just purchased very often and the confidence is very high.
Good clustering
- High intra class similarity
- Low inter-class similarity
Distance functions: Euclidean, Cosine, Jaccard, Edit distance.
Clustering algorithms
- Partitioning, Hierarchy algorithm, Density, Grid based
Random stuff
Roy goes bowling in vivo.
My very educated mother just served us nine pizzas.