Using Logistic Regression for Image Classification

Machine Learning Project


Before anything else add this lines of code at the top of your file

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In case you need it you can use the command “pip install package_name” to set up your work environment.


The idea of this project is to develop and train a model that is able to take the pixel values of a digit and identify if it is an image of the digit one or not.


Here you can download the dataset

Each row of the dataset represents the flattened pixel values of a digit, it goes from zero to nine but we will not use all of them because the goal of the binary classifiction model is return 1 if the digit is one and 0 otherwise.

To retrieve the digits data (X) and labels (Y) we must add in our code the following lines:

df_x= pd.read_excel('dataset.xlsx', 'X', header=None)
df_y= pd.read_excel('dataset.xlsx', 'Y', header=None)

Because this model will identify the digit 1 only, in the labels we need to convert the rest of digits that are not one to zero.

y = df_y[0]
for i in range(len(y)):
if y[i] != 1:
y[i] = 0
y = pd.DataFrame(y)

From the 5000 rows of data, 4000 will be used to train the model and the remaining 1000 will be used to test it, because is important for the model to be tested by unseen data. To do this we must add the following lines to our code:

x_train = df_x.iloc[0:4000].T
x_test = df_x.iloc[4000:].T
x_train = np.array(x_train)
x_test = np.array(x_test)
y_train = y.iloc[0:4000].T
y_test = y.iloc[4000:].T
y_train = np.array(y_train)
y_test = np.array(y_test)

These training and test datasets are in DataFrame form so they need to be in an array format for the convenience of calculation. Also it is worth noting that we are using .T to take the transpose of each dataset.

Once the training and test datasets are ready to be used it is time to focus on the model itself. To give it form we must define some formulas.


We start from the basic linear regression formula:

Formula 1

Where Y is the output, X is the independent variable, A is the slope and B is the intercept. But in logistic regression variables are expressed diferrently:

Formula 2


Z should be passed through a non-linear function, this time we will use the sigmoid function that returns a value between 0 and 1.

Formula 3

It is worth saying that this a will be the final output that is the value in the y_train or y_test.

To define the sigmoid function in our code we add at the top:

def sigmoid(z):
s = 1/(1 + np.exp(-z))
return s

About w and b being initialized as zeros let’s define a function to do it:

def initialize_with_zeros(dim):
w = np.zeros(shape=(dim, 1))
b = 0
return w, b

Cost function and Gradient descent

The cost function is a measure of a model that reflects how much the predicted output differs from the original output. The model aim to lower the cost function value and for that we use the following formula for all the rows.

Formula 4

Furthermore, we need to update the values of w and b in the formula 2. These values would be initialized as zeros but they will need more appropriate values and gradient descent will help us with that.

In Formula 4, the cost function is expressed as a function of a and y, but it can also be expressed as a function of w and b as well.

The differential w and b will be derived by taking the partial differentiation of cost function with respect to w and b.

Formula 5 and 6


Once all the formulas are set, let’s put it all together in a function called propagate by adding the next lines in our code

def propagate(w, b, X, Y):
#Find the number of training data
m = X.shape[1]
#Calculate the predicted output
A = sigmoid(, X) + b)
#Calculate the cost function
cost = -1/m * np.sum(Y*np.log(A) + (1-Y) * np.log(1-A))
#Calculate the gradients
dw = 1/m *, (A-Y).T)
db = 1/m * np.sum(A-Y)

grads = {"dw": dw, "db": db}
return grads, cost

This function calculates:

By using this function now we are able to update w and b in Formula 2.

Optimize the parameters

To best fit the training data we will update the parameters which are the core of this model. The propagate function will be run through a number of iterations. In each iteration, w and b will be updated.

def optimize(w, b, X, Y, num_iterations, learning_rate):
costs = []
#propagate function will run for a number of iterations
for i in range(num_iterations):
grads, cost = propagate(w, b, X, Y)
dw = grads["dw"]
db = grads["db"]

#Updating w and b by deducting the dw and db
#times learning rate from the previous w and b
w = w - learning_rate * dw
b = b - learning_rate * db
#Record the cost function value for each 100 iterations
if i % 100 == 0:
#The final updated parameters
params = {"w": w,"b": b}
#The final updated gradients
grads = {"dw": dw,"db": db}

return params, grads, costs

The term learning rate was introduced in this function. It’s not a calculated value. It can vary depending the different machine learning algorithms. Later in the conclusions I will show the impact it has.

Predict the output

By now we can optimize the parameters, reduce the cost function and update the values of gradients, so is the time to predict the output.

def predict(w, b, X):
m = X.shape[1]
w = w.reshape(X.shape[0], 1)
#Initializing an aray of zeros which has a size of the input
#These zeros will be replaced by the predicted output
Y_prediction = np.zeros((1, m))

#Calculating the predicted output using the sigmoid function
#This will return the values from 0 to 1
A = sigmoid(, X) + b)
#Iterating through A and predict an 1 if the value of A
#is greater than 0.5 and zero otherwise
for i in range(A.shape[1]):
Y_prediction[:, i] = (A[:, i] > 0.5) * 1
return Y_prediction


By putting all the functions together the final model will look like this.

def model(X_train, Y_train, X_test, Y_test, num_iterations, learning_rate):
#Initializing the w and b as zeros
w, b = initialize_with_zeros(X_train.shape[0])
#Best fit the training data
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate)
w = parameters["w"]
b = parameters["b"]
# Predicting the output for both test and training set
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)
#Calculating the training and test set accuracy by comparing
#the predicted output and the original output
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
d = {"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations}
return d

Using the model

To use the model and see how well it works let us pass our data that we prepared at the beginning by adding the next lines of code

ni = 500 # num iterations
lr = 0.005 # learning rate
d = model(x_train, y_train, x_test, y_test, ni, lr)

Once the code it is executed it will appear on the console something like this

train accuracy: XX%
test accuracy: XX%

Percentages may vary depending on the number of iterations and learning rate you use.

Plotting data

Is worth noting the model function returns a dictionary that contains:

So, let us see how cost function changed with each updated ‘w’s and ‘b’s by adding the next lines of code.

#Plot how cost function changed each updated w's and b's
plt.scatter(x = range(len(d['costs'])), y = d['costs'], color='black')
plt.title('Scatter Plot of Cost Functions', fontsize=18)
plt.ylabel('Costs', fontsize=12)

After doing some tests changing values ​​I got the following results.

Scatter Plot of Cost Functions with diferent learning rates and iterations
Multiple values of test and train accuracy depending on learning rate and iterations





Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store