Wine recognition data set

 

Task: classification

Number of instances: 178

Number of attributes: 13 (numerical)

Type of attribute to be predicted: discrete with 3 classes

Download the data: DataWine

 

These data concern the chemical analysis of a set of 178 wines coming from 3 different producers (of the same area of Italy). The objective is the extraction of models enabling to find out the producer knowing the content of the following components: Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines, Proline.

Sources: Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy. Data found in the UCI Machine Learning Repository.

 

Model with 1 variable

The most precise model that uses only one explanatory variable concerns the content of Flavanoids:

* If (Flavanoids is lower than 1) then (Class is rather 3)

* If (Flavanoids is higher than 2,5) then (Class is rather 1)

* Otherwise (Class is rather 2)

 

It enables to correctly classify 148 of the 178 data of the sample (83%). We can graphically represent it (red curve) with the experimental data (green points):

 

 

Model with 2 variables

This model implies a second variable: the Petal length . It is similar to the first model, but comprises an additional rule:

* If (Flavanoids is lower than 1) then (Class is rather 3)

* If (Proline is higher than 800) then (Class is rather 1)

* Otherwise (Class is rather 2)

 

It enables to correctly classify 163 data out of 178 (91%). The following graph is obtained with Proline = 600± 200:

 

Model with 3 variables

* If (Flavanoids is lower than 1) then (Class is rather 3)

* If (Proline is higher than 800) then (Class is rather 1)

* If (Color intensity is lower than 2) then (Class is rather 2)

 

It enables to correctly classify 175 data out of 178 (98%). The following graph is obtained with Proline = 600± 200 and Flavanoids = 2 ± 0,8:

 

Model with 4 variables (full classification)

The following model enables to correctly classify the totality of the 150 instances of the datadet :

 

* If (Flavanoids is lower than 0,5) and (Color intensity is higher than 4) then (Class is rather 3)

* If (Alcohol is higher than 12,5) and (Color intensity is higher than 4) and (Proline is higher than 600) then (Class is rather 1)

* If (Alcohol decreases) then (Class is rather 2)

 

 

 
 

© 2007-2008 BLIASOLUTIONS - All right reserved | Terms of use | Contact us | Site map