Wednesday, 8th February 2012
Experimenting with the Khan Academy API
In the last month or so, I've managed to more than triple the number of Earth badges I have on Khan Academy just by focussing on getting the Sub-light Speed and 299,792,458 Meters per Second badges, particularly on the simple exercises. I worked out on which exercises to focus by creating a spreadsheet of all the exercises and all the badges I had for each exercise. I filled out the spreadsheet with a combination of screen-scraping and manual data-entry.
And after that, I wondered if there was an API for the Khan Academy, and of course there is. It is described here. Using the API you can find out all sorts of useful information. Using the oauth example, you can get your personal information to find out, for example, how many hours of video you've watched in total and how many questions you've attempted.
The text below is a now-out-of-date description of what I'd done when I wrote this blog post. I now have a still ugly app at http://whatbadgenext.appspot.com. I've left the text below as a record of what I did and what the stats at the time were.
Using the API, you can get a list of all the videos in each playlist and how long each lasts using, so you can work out how long each playlist is. You can also add all these together to get the total length of videos on Khan Academy (which at present is 19 days, four and half hours), although it's a slight overestimate as it will count videos in multiple playlists multiple times.
I've been also been experimenting with Google App Engine, and so far I've managed to make a very ugly table of Khan Academy playlist times, which you can see at: http://whatbadgenext.appspot.com/. It's not a hugely useful table, but it's interesting to see that there are 16 playlists for which you can get the Ludicrous Listener badge (assuming you don't rewatch any videos). I'm quite pleased to see that Linear Algebra is the longest playlist because it was the first playlist I watched all the way through. It's slightly scary to think that it's nearly a day and a half of video.
The page may take a while to load because it's quite inefficient and makes a new API call every time you visit the page. The reason for the app name is that I'm hoping to make an app that can tell you which badges you are closest to achieving, but first I need to sort out how to get OAuth to work on Google App Engine.
Using the OAuth example manually, I was able to find in which playlist I'm closest to getting a new Listener badge. I also discovered that, at the time of writing, I've watched nearly a week of videos on Khan Academy and done just over 28000 exercises. Which seems a lot.
One question I've often wondered is how fast I need to answer exercises on Khan Academy to get one of the speed badges (Picking Up Steam etc.). The answer is in the API; you can see it at: http://whatbadgenext.appspot.com/exercise_times. This information will be essential for working out which badges are easily achievable, and which require more work. The most time consuming questions are the rate and kinematic problems, and various solving systems of equations exercises.
It's also interesting to note that the shortest times, as I write, are for the exercises Proportions 1 and Proportions 2, which are currently the newest exercises. This confirms what I had suspected, which is that it's almost impossible to get the speed badges for new exercises. Presumably, they wait until a sufficient number of people have tried the exercises so they can work out a suitable time. So many times, I've tried in vain to get a speed badge on a new, easy exercise, but failed even if I answer as quickly as I can type.
Something else I noticed after getting this information is that you don't necessarily get the speed badges even if you have completed the exercises within the time: you need to get them all in a single session. For example, even though I had previously done 60 exercises within the time limit, I still have to do another 75 before I get the 299,792,458 Meters per Second badge. I suspect this has something to do with when the program checks to see whether you've completed enough exercises, or maybe the streak is reset if the time limits are reset.