Special Topics in Computer Science:

Probabilistic Data Mining

339.327 1KV Csato Block Begin: 16.11.2010

Data mining is defined as the process of "information extraction" from a set of data. In formal terms all data mining problems can be restated as inferring model parameters, consequently we will discuss the issues related both with the choice of good -- or less adequate -- models for a given set of observation and to the problems related with the parameter estimation process given the model.

We will use the framework provided by the machine learning methodology where the emphasis is both on the models and on the type of data and observation process at hand. The illustration of methods is done with real data and realistic observation models.

Lecturer

Dr. Lehel Csato
Faculty of Mathematics and Informatics
Babes-Bolyai University, Cluj-Napoca
www.cs.ubbcluj.ro/~csatol

Dates

Date Time Room
Tu 16.11.2010 15:30-18:00 MT 132
We 17.11.2010 15:30-18:00 K 033C
Th 18.11.2010 15:30-17:00 MT 132
Fr 19.11.2010 13:45-15:15 BA 9908

Contents

  1. Modeling Data
    - Machine Learning
    - Latent variable models
  2. Estimation
    - Maximum Likelihood
    - Maximum a-posteriori
    - Bayesian Estimation
    - Examples
  3. Unsupervised Estimation models
    - General concepts
    - Principal Components
    - Examples for PCA
    - Independent Components
    - Examples for ICA
    - Mixture Models
    - Examples

Handouts

The handouts of this course can be downloaded from here.

Exam

Students will have to do the following project and will have to send it to the lecturer not later than December 11, 2010. The marks for this course will be based on the project.

Literature

  1. J. M. Bernardo and A. F. Smith (1994) Bayesian Theory, Wiley & Sons.
  2. C. M. Bishop (2006) Pattern Recognition and Machine Learning, Springer Verlag.
  3. T. M. Cover and J. A. Thomas (1991) Elements of Information Theory, Wiley & Sons.
  4. A. P. Dempster, N. M. Laird, and D. B. Rubin (1977) Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society series B, 39:1-38.
  5. T. Hastie, R. Tibshirani, és J. Friedman (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Verlag.