Logistic Regression on the Iris Dataset

InputsScaled InputsLogitsProbabilitiesPredictionBias1.00Sepal Length5.10Sepal Width3.50Petal Length1.40Petal Width0.20μ: 0.000 σ: 1.000μ: 5.852 σ: 0.714μ: 3.050 σ: 0.168μ: 3.779 σ: 3.199μ: 1.196 σ: 0.605-0.1871.757-1.570-1.0150.6180.3980.958-0.358-0.600-1.729-0.2531.982-1.594-0.7842.378Scaled Bias1.00Scaled Sepal Length-1.05Scaled Sepal Width2.68Scaled Petal Length-0.74Scaled Petal Width-1.65Logit for Setosa7.36Logit for Versicolor1.63Logit for Virginica-8.99Probability for Setosa1.00Probability for Versicolor0.00Probability for Virginica0.00Setosa

Sepal Length

5.10

Sepal Width

3.50

Petal Length

1.40

Petal Width

0.20

The Iris dataset used in the literature on classification methods and widely used in statistics and machine learning. There are four features: sepal length, sepal width, petal length, and petal width. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant: Iris Setosa, Iris Versicolour, or Iris Virginica.

The following is a visualization of the logistic regression model trained on the Iris dataset. A given input goes through 3 transformations:

  1. Scaling: Normalizing the input to a mean of 0 and a standard deviation of 1, a.k.a, Z-score normalization
  2. Computing Logits: We calculate the logits as a linear combination of all the scaled inputs
  3. Softmax: The logits aren't useful by themselves, we can apply the softmax function to convert it to a probability distribution
  4. Prediction: Our prediction is the class with the largest probability

Notice that there is a fifth extra feature which is always 1. This gives us a convenient way to encode the bias terms.