Introduction
I've been working my way through Stanford's online Machine Learning course recently and I thought I should put some of what I've learnt to use.
The program
The program I made reads a file of tab-delimited data, assuming the first column is the independent values (x) and the second is the dependent values (y). It then gives some options:
- Simple summary
- Plot data
- Find linear regression
- Find polynomial fit
- Exit
Simple summary currently just gives the mean of x and y, but I'll probably add standard deviation and maybe some other simple statistics.
Plot data does just that using matplotlib. I might changes this to use an SVG graph drawer I've been working on, but I wanted to try out matlibplot.
The function "find linear regression" initially worked by explicitly using the normal equation:
p = (X.T * X).I * X.T * y
But then I found, unsurprisingly, there is a function to do this in the Python linear algebra library, which also returns extra information:
from numpy.linalg import lstsq
(p, residuals, rank, s) = lstsq(X, y)
Similarly, I was going to write my own function to calculate a polynomial fit of a given degree, but it makes much more sense to use the function already there, namely polyfit:
p = numpy.polyfit(x, y, degree)
Most of my code is to display the resulting vector is a nice way.
There's still lots to add, including working with multivariate data and adding regularisation, but I hope it will already be a useful program.
Comments (1)
Pedro Fonseca on 21 Dec 2011, 10:26 a.m.
Hi, like it so far, i'm curious to see what it will look like when you've added to it. I'm working on a program to find interesting correlations from aparently random data (still a bit of a newb though)
Update me with your progress, if you like
Cheers
Pedro