Sorting through the noise: Using computational methods to isolate key themes of online discussions

In the following blog post, Simon Rodier (pictured above) describes his time working with Project Someone on Words in Context and how this work relates to his Master’s thesis.

By Simon Rodier

I came to Project Someone with a background in computer science and an interest in using computational methods to analyze online speech about controversial issues. 

Discussions about contentious issues in online fora are particularly fascinating to me––anecdotally, we have all seen the stark divisions that form when people debate hot topics online. We often see proponents become entrenched in their own positions, essentially preventing any productive dialogue or negotiation from occurring. To complicate matters, online discussion about any given issue often happens over many platforms, between different people, and involves large volumes of text. My goal was to dig deeper into such discussions by sorting through them computationally in order to identify the major themes at play. This approach is useful for community stakeholders seeking to intervene in the larger debate, as it gives them a sense of how the public understands and interprets the issues at stake.  

My work with Project Someone unfolded along two major axes. In the first, I worked on the Words in Context project, where my major contribution was to develop a database and website that would  house and display the results of the team’s research into online speech about contentious issues in Canada and Lebanon. Together, we created a website that allows users to dig into how specific language is used in the corpora we analyzed, which was divided into thematic sections, each represented by several quotes and keywords drawn from the source texts. 

Out of this project I developed the basis of my M.A. thesis in Educational Technology, for which I designed a methodology using topic models to computationally identify thematic groupings in text. The method leverages the statistical co-occurrence of words within documents to group them into latent “topics.” An early version of the methodology was applied to a subset of a corpus used for Words in Context, and was ultimately refined for use in my thesis to identify some of the major themes of the controversy surrounding Brett Kavanaugh’s nomination to the U.S. Supreme Court in 2018.

Taken together, both axes of my work encourage readers and communities to engage more critically with trends in online discussions about contentious issues––an important step towards more effective interventions in public debate.