Linear regression is a method for finding a linear function to best fit the data supplied. Once the function has been found, predictions can be made for other, unseen input values. This method of regression is useful for when the input and the output data are continuous as the function found from the data will not have any limits in regards to the domain and range.

```
training_data = [
[1, 7],
[2, 10],
[3, 15],
[4, 20],
[5, 22]
]
```

Using this training data, we should be able to find a linear function to fit our data with the minimum error possible. Firstly, it will help to understand the problem if we can visualise the training vectors, so below I have used a scatter plot for the training data.

```
import matplotlib.pyplot as plt # Import the plotting library
import numpy as np # Import numpy to easily filter the data
%matplotlib inline
training_data = np.array(training_data) # Assign the training data to a numpy matrix
plt.scatter(training_data[:,0], training_data[:,1]) # Plot the x and y values
plt.xlabel("x") # Assign a label to the x axis
plt.ylabel("f(x)") # Assign a label to the y axis
plt.show() # Show the graph
```

From this data it’s easy to draw a mental estimate of a line of best fit. However, we are in need of an algorithm to do this for us, which should be able to work on larger datasets as well as being able to calculate the optimal function more accurately in less time than doing it manually.

We already know that the function that we are going to try and find will be linear and therefore we can write it in the following format.

*y = mx + c*

Where m is a coefficient of x denoting the gradient of the function. The constant c will often be referred to as the bias. In order to make this apply to our generalised algorithm, with an unknown number of dimensions, it is easier to write the function in the following way.

*f(x) = θ ^{T}x + b*

Here, we are clearly writing a function (shown by f(x)) and the bias (b) is still present in the same format. The main change that we have made is that the coefficients and values (originally mx) are now written as θ^{T}x. To be clear, we are now using θ as a vector of all of the coefficients and x is the feature vector. Therefore, θ^{T}x is each coefficient multiplied by the corresponding feature from x.

In order to construct this function, we will need to find appropriate values for the bias and coefficient. In this case, we only need one coefficient as we are operating in 2 dimensions, but it may be neccessary to accomodate for further dimensions which will each need a coefficient.

```
bias = 2.8 # The bias found
coefficient = 4 # The coefficient found
x_values = np.arange(1, 6) # An array of [1, 2, 3, 4, 5]
y_values = bias + coefficient*x_values # The prediction for each x value
plt.scatter(training_data[:,0], training_data[:,1]) # Plot the original data
plt.plot(x_values, y_values, c='r') # Plot the new line
plt.xlabel("x") # Assign a label to the x axis
plt.ylabel("f(x)") # Assign a label to the y axis
plt.show() # Display the graph
```

This is a graph of the function that I found for the training data used earlier. I have also plotted the original data to show that there is still an error here. This is because the training data cannot be perfectly modelled by a linear function. This is common with real world data as it is rarely uniform enough to get a perfect linear function.

## Making a prediction

Now that we’ve got our optimised parameters, we can get predictions for x values that we don’t already have the output known. These values will likely also have an amount of error, but they can give us a good estimate of the value and we can use the existing amount of error to assume the variance. Because this is so effective with continuous data, it can be an incredibly effective predictor.