Topic modeling the Supreme Court opinions in my sample yielded little concrete results. Without cleaning the data before entering it into MALLET, the program returned topics that consisted mostly of legal terms and some case-specific words. The frequency of these topics in each justice’s sample that the program returned seem based on case-specific words. For example, Justice Scalia wrote a majority opinion on the writ of certiorari for a case involving the EPA and Topic Five from Figure 1 pops up most frequently in his dataset. Likewise, Justice Sotomayor wrote a majority opinion on the writ of certiorari for a case involving cocaine, and her dataset is heavily composed of Topic Three, according to MALLET.
Attempting to clean this data of legal-specific terms and some case-related words only seemed to exacerbate many of these differences. For example, Topic Five from the table below gave many words related to environmental cases, which Justice Scalia wrote about, and MALLET showed his dataset as the only place where this topic appeared frequently. Except, now, there were more words relating to the cases at hand and they were appearing less frequently in other justices’ samples.
In order to account for the case issue variable, the data was divided into only First Amendment issues. The sample size was even smaller for this, and, after MALLET analyzed the data and produced three topics, most of them also appeared to be legal terms or case-specific terms. Of course, the data was not cleaned and the sample size was extremely small, so this was not surprising. Once again, case topics that the justices wrote about appeared most frequently in the dataset of the justice that wrote about. The word “video” in Topic 3 in the figure below could relate to Scalia’s case on regulating the sale of video games. The data supports this assertion, showing that topic appears frequently in Scalia’s dataset.
Few concrete conclusions can be made from the results at hand. Because there were no results relating to the individual language of the justices with respect to how they determined the cases at hand. Instead, the topics revealed the general use of legal terminology and details specific to the facts in the case. So, one cannot determine much about the differences in rhetoric between liberal and conservative justices by looking at this data.
Perhaps, this illustrates that the Court’s justices truly do decide cases by interpreting the law, or, more cynically, are good at using the law to justify their decisions. They stick to the law and the fact of the case to decide the issues at hand—or mask their own political beliefs.
With a larger sample size, topic modeling and digital content analysis could produce more conclusive results. By using the Supreme Court Database, one could pull text from all the cases it categorizes from the Internet and divide them into text files by justice. Then, MALLET could perform a topic modeling analysis. Because the sample size will be larger, one can look for more topics. This could be used to see how the issues the Court deals with have changed over time, and how their language has changed over time. With the framework for pulling text online already built, it would be easier to build another code to pull older cases from the Internet and put them in the same text file.
Although this project did not lead to any conclusive results, it did lay the groundwork for doing more of this kind of work in the field. In no way did this project exhaust the possibilities of using topic modeling in the field. Rather, it took a small step forward, stumbled, but did not fall. Topic modeling analysis is useful for doing a large scale analysis of judicial opinions, and it will be interesting to see how researchers can use this technique in the future to answer more questions, like looking at the way the Court has changed over time.