Data Overview

wine_dataset.csv

Data Cleaning

data_cleaning

Create new relevant variables and dropping the variables we will not use:

new variables

Let us try to classify our observations into ‘red’ or ‘white’ wines using Decision Trees.

decision tree selection

Decision Tree

From the decision tree in the left, we see that:

full decision tree

Random Forest

We see the performance of dtree on our predicted data is :

We use random forest to try and increase the accuracy of our model.

From the forest performance we see that:

Our model now is 99.12% accurate. We can thus use our random forest model to successfully classify a given wine observation into red or white wine with 99.12% accuracy.

random forest

Visualization for Chlorides for each Wine Style

chlorides_by_wine code

From the graph we see that,\ The density graph for Chlorides shows us that chlorides concentration is significantly lesser in white wines than red wines.

What does this imply?

The amount of chloride in wine is influenced by both the terroir and type of grape, and the importance of quantification lies in the fact that wine flavor is strongly impacted by this particular ion, which, in high concentration, gives the wine an undesirable salty taste and significantly decreases its market appeal.

Visualization between Residual Sugar by Total Acidity

sugar_by_acidity code

All wines are inherently acidic with a pH between 2-4.\ Residual Sugar (or RS) is from natural grape sugars leftover in a wine after the alcoholic fermentation finishes. It’s measured in grams per liter.

We see that the residual sugar is higher for white wines as compared to red wines hence we can say that they are relatively sweeter with more carbs as compared to red wines.\ We also see that the total acidity is higher for red wines as compared to white wines.

Visualization between Alcohol and Density

alcohol_by_density code

Visualization between Alcohol and Density The graph above shows us the volume of alcohol in the wine by its density.

We see that alcohol and density are inversely related. This implies that the wines with higher volume of alcohol in them tend to have a lesser density.

This can be attributed to the fact that alcohol has less weight by volume as compared to water and other liquids. Thus, the more alcohol vs other liquids a wine contains should decrease the overall density of the wine.

Further we see that, white wines have a slightly higher alcohol percentage and hence lower densities as compared to red wines.

Visualization between the proportion of free SO2 by volume of Alcohol

so2_by_alcohol code

Sulphur dioxide (SO2) is the most widely used and controversial additive in winemaking. Its main functions are to inhibit or kill unwanted yeasts and bacteria, and to protect wine from oxidation.\ ‘Free’ SO2 is that which is unbound to compounds in the wine and is therefore able to exert an antioxidant/preservative action. ‘Bound’ SO2 is that which has already been complexed to other compounds in the wine (such as sugars) and has essentially been quenched such that it no longer has antioxidant/preservative activity. Total SO2, is the sum of both of these forms.

We see that red wines in general have higher proportion of free SO2 compared to white wines.\ We also see that there is less variation in the alcohol volume in red wine as compared to white wine.\ This could also be due to lesser number of observations for red wine.

Is white wine better or red?

Now that we have seen the relationships between the variables that affect the classification of wines, let us use the quality variable to see if the quality of one wine is better than the other.\ Let us compare the ‘Excellent’ and ‘Not Good’ categories to better understand the influences on wine quality categories.

Visualization between total acidity and product category

acidity_by_quality code

From the graph we see that:

Visualization between Alcohol/Density and Quality Category

alcohol_by_quality density_by_quality code

As we saw earlier, alcohol content and density are inversely related.

From the graphs we see that:

Visualization of Residual Sugar by Quality Category

sugar_by_quality code

From the graph we see that:

Visualization between Total Acidity and Quality Category

acidity_by_quality code

The acidity of a wine is one of its most appealing characteristics, enhancing its refreshing, crisp qualities as well as enabling wines to be paired with foods so successfully. Acidity complements foods in a palate-cleansing, refreshing manner.

An important point to remember is that the perception of acidity, as with other flavor components in wine, should not be considered independently. Sweetness and acidity, for example, balance each other. A wine high in acidity that also has a bit of sweetness will seem less acidic.

From the graph we see that:

Insights/Hypotheses

However, more analysis would be required to prove the above insights/hypothesis. Another thing to keep in mind is that there was relatively higher number of observations in the data for white wines as compared to red wines. There were also more observations for the “Good” category wines as compared to ”Not Good” and “Excellent” quality categories. This might cause our analysis to have some discrepancies.