N-grams

I've skipped looking at the most frequent words Sal uses for now, and jumped straight to looking at the most frequent combinations of words.

Bigrams

Bigrams are combinations of two words. The five most common bigrams Sal uses are (with counts in parentheses):

  • going to (11939)
  • this is (11182)
  • equal to (10925)
  • of the (8462)
  • is equal (8068)

Before running my analysis, I had assumed all the most frequent bigrams would be common English word combinations, but actually, I think only "this is" and "of the" are common in most text (though I'd like to check this). The bigram "going to" is probably quite common in English, but I suspect not as common as here. This must be because Sal frequently introduces what he is about to do before doing it. The bigrams "equal to" and "is equal" are probably relatively rare in English outside of mathematical discussions.

Trigrams

When we extend our search to trigrams, we see the most common bigrams get extended. The five most frequent are:

  • is equal to (7990)
  • going to be (5101)
  • is going to (3108)
  • so this is (2137)
  • this is the (1958)

You can see that the bigram "going to" is extended in both directions to "(is) going to (be)", while "this is" is extended to "(so) this is (the)". When we move to 4-grams we can see how the "is equal to" fits in.

4-grams

More the same, expanding the sentence fragments further.

  • is going to be (2395)
  • to be equal to (1148)
  • is equal to the (1096)
  • the same thing as (1064)
  • going to be equal (1020)

You can see how these could be pieced together to form the fragment "is going to be equal to the same thing as". Other interesting 4-grams are "x is equal to" (1015), "the square root of" (691), "in the last video" (379) and "with respect to x" (309).

5-grams

You can probably see where this is going now.

  • going to be equal to (932)
  • is going to be equal (681)
  • is the same thing as (647)
  • this is going to be (635)
  • let's see if we can (348)

Clearly, the clause "this is going to be equal to" is very common. Interestingly, "is the same thing as" is also very common, and essentially the same thing. The fragment "let's see if we can" is very much part of Sal's inclusive style of talking. Other 5-grams include "both sides of this equation" (209), "so let's say I have" (130), "the limit as x approaches" (109) and "let's say I have a" (107).

To be continued...

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.