Text Mining Discussions of Drug Abuse on the Social Web

In light of the recent experiments with marijuana legalization, illicit drugs have become a subject of much public discussion. Seeing a disconnection between the growing push for legalization and the lack of research on the effects of drug abuse, my colleagues Elizabeth Jones, Cameron Randall, Nicole White, and I turned to the large (~20GB of text) collection of subjective drug experiences stored on the Erowid.org website.

We gathered the data using Python with the Beautiful Soup library. Primary analysis and data cleaning was done with R, and textual sentiment analysis was done with LightSIDE. Gephi was used to visualize a graph of relationships between abused drugs:

Using R's ggplot2 library, we visualized the commonality of target words in the discussions of each drug:

Another visualization with ggplot2 shows the percentage of drug abuse reports mentioning 'sleep' and 'dream' by drug:

Using Python we captured text snippets discussing target words, and then using LightSIDE we determined if these texts snippets showed positive or negative sentiments:

More visualizations and a complete discussion of the findings can be seen in the full report.