N-Gram Video Tour Ben: Hello there, fellow datavizzers! In this segment we'll be demonstrating our n-gram tool. Don't know what an n-gram is? Well, don't worry, because neither do we! But I think it might have something to do with charting the relative frequency of individual words contained within a corpus. It could be a really useful tool for teasing out comparisons and correlations over time. So let's get oriented with this tool, shall we? First of all, the vertical axis, it indicates the relative word frequency of a term, and that's represented in parts per million. The horizontal axis, that's a timeline, and it's broken down in decades contained within our corpus. On the far left of the graph are the various terms that we selected to chart. Clicking on a term will turn a line on and off, and you can click on multiple terms as well. If you choose to click on all the terms at once, it's a really hot mess, but if you're able to make sense of any of that, please get in touch with us and let us know. But for now, we're gonna zoom in a little closer and highlight a few of the more noteworthy stories contained in this tool. So let's just dive right in. [harp transition] Ben: Welcome! We're gonna be playing around with the n-gram viewer. You might be asking yourself why even use an n-gram viewer? You have other, more visually dynamic ways of playing with data like the word clouds, but the word clouds show everything kind of crunched together and it's hard to kind of disentangle or pull things out. An n-gram viewer lets you kind of isolate certain terms and hold them up to others in gestures of comparison, or to tease out correlations. So let's dig in, shall we? First thing I want to do is look at a comparison between two media forms. Let's look at radio and television, because as Jason and I like to joke TV killed the radio star, and let's see if there's any empirical evidence to back that up. Here's radio. Here's television. And you can see that somewhere between the 40s and the 50s there was a marked decline in radio and an ascendancy of television, and so they kind of like crossed paths. Of course, eventually by the time we reached the end of our corpus, both of these kind of fell out of favor, but this kind of reflects one of our central claims: media have lifespans. They emerge, they grow in popularity, they become cultural stalwarts, and then eventually they grow long in the tooth and then sunset. Jason: yeah I think you're definitely right about that, but the lifespans look different. So let's take a look here at picture. Ben: Wow. Jason: Yeah, so the still image really didn't have a long lifespan, but now let's look at film. You know, film crests here in the 40s, but it really keeps going strong. No longer called a motion picture, clearly, as time goes on. And you know, it only starts a bit of a decline in more recent years. Ben: This makes me think, too, like given the comment about motion picture, this doesn't tell the entire story. If we're able to disambiguate between picture versus motion picture that decline might be even more pronounced. Jason: Right, so basically if we knew what the hell we were doing with corpus linguistics, this would be better. Ben: Okay, fair enough. Now let's reset. Let's look at computer. It's an interesting timeline in and of itself. We've talked about this elsewhere in the book, but the 1960s you see a small blip when people are talking about using mainframes and punch card related kind of machines, and then it kind of disappears in the 70s. And then with the rise of the PC, you see a massive explosion. So this kind of follows a different kind of trendline from the other media forms. There's this almost a false start, and then the real show begins. This doesn't actually tell the whole story, though. Because into the 90s, into the 2000s, we're obviously still very much invested in computers in the English classroom, and the corpus reflects that. But the use of the word "computer" is the sticking point, because as we know, other words tend to take the place of it--proxy words, words that kind of get at the digital. So you see this ascendancy of the the term "technology" over this time as well. Jason: Yeah. I mean, I notice technology actually becomes more popular than computer in our current century, and yet so often that term is restrictively used to refer only to the digital. Ben: Right. So let's look at something else now--the relationship between reading and writing. We've talked about this elsewhere in our project, but here we can see a fairly interesting story emerging. If we compare this hot pink and this brown line--I have to say, is there any reason why we pick the colors we did for this this chart Jason? Jason: Because writing is fabulous! [laugh] It needed the most fabulous color--clearly! Ben: Wonderful. So you see an interesting shift that happens in between the 60s and the 70s. I don't know if it correlates, but you know, composition as a discipline kind of emerging into its own, the emphasis on expressivist pedagogies, on process pedagogy--all of this is kind of like building this momentum. Maybe there's a correlation there that's worth exploring further. But it's interesting to note that the English Journal corpus we deal with is predominantly K-12. Regardless, reading is the thing up until this moment, and then they switch places, but ultimately arrive at roughly equivalent statuses. Jason: Yeah, but this gets a lot more exciting when you put writing up against computer: boom! green computer line, pink writing line. They meet together. Both have their high mark by far in the corpus in the 1980s. Coincidentally, really the moment when computers and writing begins to coalesce as an important subfield in English studies. Ben: There's a small handful of moments in this research project that made me go, "Aha!" and this is one of them. This is where I see--and granted, this doesn't tell the entire story, but there's a good case to be made that that computer was instrumental (pun intended) as a tool for promoting writing. So what else do we want to look at? Jason: Yeah, you know, I also think about some of the positive affect words. Like here's one of my favorite when I was playing around with it. Go you've got good--you know, that's pretty persistent. And then here in the pink you have like, but there's like just this modest little thing that you see somewhere in the 60s. New media seem to become less good, but people liked it way more. [both laugh] Okay, I'm probably making a lot out of a small distinction, but still, I think the broader point here is that positive affect words definitely tended to correlate more than negative affect words when we were doing this analysis. Ben: And this is absolutely the point of using tools like this. They don't let you arrive at dead-certain conclusions, but they open the door to ask questions and to see big bodies of texts in new ways. Anything else? Jason: You know, there's this one other that was on here. I just noted time. It was appearing quite frequently throughout all the decades. And of course time's a really ambiguous word. Like, is it referring to the time the radio or TV program is on? Is it referring to the time pedagogy takes? Or in my most fanciful way of thinking, I think it refers to the sheer amount of time it took us to prepare this full text corpus for reading. [laughs] In any sense, it also just points to some of these everyday words that show up a bit more than we might otherwise expect. The only other one out there is process, and if you put that up against writing, I think one of the things that's sort of interesting about this, process is the sort of huge god term for us, and yes, you see you see here in the 1980s it has its apex when writing has its apex, but it clearly wasn't at the same level of frequency as the notion of writing. And you know I'm still pondering what that means, but I'm intrigued by it. Ben: Right. And again, this is part of these strengths and the limitations of these types of methods and this particular type of methodology. But well said, Jason. And I guess with that I would say that we are out of time. So we invite you to, yes, go and play with our n-gram viewer. Maybe you see things that we haven't noticed here. So go have fun.