July 17th, 2013


In this presentation, Ken will describe a portion of an early-phase project that uses social media data (tweets, Facebook posts, etc.) from service personnel to predict suicide rates. There’s a lot of motivation to provide better data for military psychologies, since more military wind up taking their own lives than are killed in the line of duty. By analyzing social media data that is voluntarily provided by personnel, plus a predictive analytics system, we can provide assessments that help mental health workers focus their time and energy on the most at-risk individuals. This project uses Cassandra as the scalable storage system for this social media data, which is then analyzed in a distributed environment using Hadoop. The project also uses the Solr search support from DataStax Enterprise to provide ways for users to dig into the underlying data, which is critical when understanding the assigned risk levels.



Ken Krugler, President of Scale Unlimited, a consulting and training company for big data processing and web mining problems using Hadoop, Cascading, and Solr.


Presntation Covers

  • Using Cassandra to store social media content
  • Combining Hadoop workflows with Cassandra
  • Leveraging Solr search support with DataStax Enterprise
  • Doing good with big data


Key Points of Using Cassandra

  • A repository for social media data
  • The data source for workflows
  • A search index, via Solr integration


Interesting Facts

  • More soldiers die from suicide than from combat
  • There are more suicides than homocides every year in the US
  • The suicide rate has climbed 80% since 2002
  • Indicator of Suicide: An event that happens (they crash their car, hurt themselves, etc.) where often times, it’s too late to save them.