Regression finder


7 Nov 2011 Code on Github

Introduction

I've been working my way through Stanford's online Machine Learning course recently and I thought I should put some of what I've learnt to use.

The program

The program I made reads a file of tab-delimited data, assuming the first column is the independent values (x) and the second is the dependent values (y). It then gives some options:

  1. Simple summary
  2. Plot data
  3. Find linear regression
  4. Find polynomial fit
  5. Exit

Simple summary currently just gives the mean of x and y, but I'll probably add standard deviation and maybe some other simple statistics.

Plot data does just that using matplotlib. I might changes this to use an SVG graph drawer I've been working on, but I wanted to try out matlibplot.

The function "find linear regression" initially worked by explicitly using the normal equation:

p = (X.T * X).I * X.T * y

But then I found, unsurprisingly, there is a function to do this in the Python linear algebra library, which also returns extra information:

from numpy.linalg import lstsq
(p, residuals, rank, s) = lstsq(X, y)

Similarly, I was going to write my own function to calculate a polynomial fit of a given degree, but it makes much more sense to use the function already there, namely polyfit:

p = numpy.polyfit(x, y, degree)

Most of my code is to display the resulting vector is a nice way.

There's still lots to add, including working with multivariate data and adding regularisation, but I hope it will already be a useful program.

Comments (1)

Pedro Fonseca on 21 Dec 2011, 10:26 a.m.

Hi, like it so far, i'm curious to see what it will look like when you've added to it. I'm working on a program to find interesting correlations from aparently random data (still a bit of a newb though)

Update me with your progress, if you like

Cheers
Pedro