Posted on One min read

Snippets of the data mining class.

Techniques

  • Frequency itemset mining
  • Associated rules discovery
  • Cluster analysis
  • Outlier detection
  • Classifier building

Association Rules Discovery

Example of association rule.

Supermarket. Purchase history of people.

  • large set of items. things sold.
  • large set of baskets. each basket is a subset of items.
id items
1 Bread, Coke, Milk
2 Beer, Bread
3 Beer, Coke, Diaper, Milk

Association rules. People bought {x,y,z} tend to buy {v,w} with very high probability.

Essentially, recommendation system from Machine Learning.

Rule Measures: Support and Confidence

  • Support, s, probability that a transaction contains { X & Y & Z}.
  • Confidence, c, conditional probability that a transaction having {X & Y} also contains Z.

Low support means low number of transactions have {X & Y & Z}.

Good rule has high support and high confidence.

Interesting Association rules

  • Not all high confidence rules are interesting.
    • The rules X -> Milk may have high confidence for many itemsets X, because milk is just purchased very often and the confidence is very high.

Good clustering

  • High intra class similarity
  • Low inter-class similarity

Distance functions: Euclidean, Cosine, Jaccard, Edit distance.

Clustering algorithms

  • Partitioning, Hierarchy algorithm, Density, Grid based

Random stuff

Roy goes bowling in vivo.

My very educated mother just served us nine pizzas.