Illustration Image

6/29/2022

Reading time:8

Cassandra Lunch #75: Getting Started with DataStax Enterprise (DSE) on Docker - Business Platform Team

logo

This resource is based on an article originally published here.

In Apache Cassandra Lunch #75: Getting Started with DataStax Enterprise (DSE) on Docker, we discussed getting started with DataStax Enterprise on Docker, we discussed some of the applications that make up the DataStax ecosystem. In the process we pulled some Docker images of the applications we are interested in and we got into working with the DSE Search, DSE Analytics with Spark, and DSE Graph on the Docker desktop. We were able to learn about these tools and their strengths. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!

Getting Started with DataStax Enterprise (DSE) on Docker

In this article, we are going to look at getting started with DataStax Enterprise on Docker. Recently, there have been lots of shifts in looking at how data problems can be solved by  NoSQL databases. There are different problems in using relational databases that are very difficult to navigate.

One of the interesting features of NoSQL databases is the ability to scale whenever there is a daunting and disturbing workload that may cause applications to go offline.  Interestingly, customers are very good at waiting to get responses at a longer time, as such your application must be available at all times, there must not be downtime. 

Due to the fact that Cassandra is known to be scalable and highly available, different companies have been working on using Apache Cassandra as their application database. DataStax Enterprise has done a great deal of work by building transformational data architectures for applications, microservices, and experiences that require data sovereignty, availability, scale, agility, and accessibility by any user. They have built different applications by leveraging Apache Cassandra and building enterprise applications that make application deployment much earlier.

One of these applications is the DataStax Enterprise (DSE),  built on Apache Cassandra which is well known for 100% uptime, unmatched low latency, and it also has the ability to handle massive data at a planetary scale.

A diagram showing DSE Analytics, DSE Search and DSE Graph feature enabled in DSE

There are different packages and capabilities that have been introduced into the DataStax Enterprise ecosystem, we are going to look at some of these software. As part of this, we are going to provision these software on Docker containers and work with them. We are going to look at working with DSE Search,  DSE Analytics (Spark), and DSE Graph to demonstrate handling data workloads. 

Diagram showing some of the packages that have been introduced into the DataStax ecosystem

DataStax Enterprise (DSE)

In DSE, there are different capabilities you can leverage, to handle different data problems. We are going to look at DataStax Studio, DataStax enterprise server which comes with DSE Search, DSE Analytics with Spark, and DSE Graph

DataStax Enterprise Search

DSE Search allows you to quickly find data and provide a modern search experience for your users, helping you create features like product catalogs, document repositories, ad-hoc reporting engines, and more. One of the restrictions possessed by Apache Cassandra is being able to use tables for applications that the tables were not predefined to support beforehand. Cassandra provides a solution called materialized view and the creating secondary views, however, these solutions are not flexible because managing these views and tables require some tricks. Indexing on data types like tuples and user-defined types is a lot of work. Let’s get straight at it. 

Install DataStax Enterprise server and enable Search capability

  1. Before you start installing the packages using Docker, you must download the Docker desktop from here.
  1. After the Docker desktop installation, you can now start pulling all the DataStax Docker images we will be working with from the Docker hub using the below commands. Make sure your Docker desktop is running and open your Powershell and type the following commands to pull the DataStax images.
$ docker pull datastax/ddac:5.1.17
$ docker pull datastax/dse-server:6.8.16
$ docker pull datastax/dse-opscenter:6.8.15
$ docker pull datastax/dse-studio:6.8.15

With all the images pulled, we are going to run and start the containers of all the images we pulled. We will run the containers using the command below. 

$ docker run -e DS_LICENSE=accept --name my-ddac -d datastax/ddac
$ docker run -e DS_LICENSE=accept -p 7080:7080 -p 7081:7081 --name datastax_server -d datastax/dse-server -k -s -g
$ docker run -e DS_LICENSE=accept -p 8888:8888 --name my-opscenter -d datastax/dse-opscenter
$ docker run -e DS_LICENSE=accept --name my-studio -p 9091:9091 -d datastax/dse-studio --link datastax_server

Use the command below to get into the running containers.  

$ docker exec -it <container name> /bin/bash

Use the command below to check for status of the node, you will be able to know if your node is running or not with this command. 

$ dsetool status

In my case, I am going to start working with the DSE server, notice I have enabled the DSE Search, the Analytics with Spark and the Graph capability with the below line. 

$ docker run -e DS_LICENSE=accept -p 7080:7080 -p 7081:7081 --name datastax_server -d datastax/dse-server -k -s -g
$ docker exec -it <container name> /bin/bash

Use CQL to create the search index on all columns in the table and all the Search nodes in the datacenter. Use the CQLSH command to get into the CQL shell.

$ CREATE SEARCH INDEX ON keyspace.table;
$ CREATE SEARCH INDEX ON voting_system.voters;
$ CREATE SEARCH INDEX ON voting_system.voters WITH COLUMNS column1, column2, column3, ...;

DataStax Enterprise with Analytics using Spark

DataStax Enterprise (DSE) integrates real-time and batch operational analytics capabilities with an enhanced version of Apache Spark. With DSE Analytics you can easily generate ad-hoc reports, target customers with personalization, and process real-time streams of data. The analytics toolset lets you write code once and then use it for both real-time and batch workloads.

Diagram showing the combination of DSE and Spark in a node

Use DSE Analytics to analyze huge databases. DSE Analytics includes integration with Apache Spark, Spark is the framework that will help to support our analytics applications. Spark is a distributed computation engine that is designed to handle big data and for in-memory processing. According to Apache Spark, Spark supports interactive and batch analytics, it is up to 100 times faster than Hadoop. Spark requires 5 – 10 times less code compared to Hadoop and at the same time supports efficiency and scalability. One of the features of Spark is fault tolerance. 

This is a diagram showing two datacenter Cassandra deployment with an external Spark cluster setup.

Some of the advantages of using DSE Analytics include the following;

  • No single point of failure
  • Spark Master management
  • Ability to perform analytics without ETL
  • There is no need to configure a new file system like S3, Azure Blob because DataStax Enterprise file system (DSEFS) is readily available
  • DSE Analytics Solo
  • Integrated security
  • AlwaysOn SQL

Use the command below to get to the Spark shell. 

$ docker exec -it <container name> /bin/bash
$ dse spark

Use the Spark Scala command below to manipulate the Cassandra table.

$ val table = spark.read.format("org.apache.spark.sql.cassandra").options(Map("keyspace"->"voting_system", "table"-> "voters")).load() 
$ val voters_table = table.select("voterid", "city", "state","supporting").where("solr_query='supporting:A'").show()

DataStax Enterprise Graph

DSE Graph is a distributed graph database that is optimized for fast data storage and traversals. DSE Graph database ensures zero downtime, analysis of complex, disparate, and related datasets in real-time. With all capabilities that come with DSE Graph, the database is capable of scaling to massive datasets and executing both transactional and analytical workloads (OLTP and OLAP). DSE Graph incorporates all of the enterprise-class functionality found in DataStax Enterprise, this includes advanced security protection, built-in DSE Analytics, the DSE Search functionality, visual management and monitoring, and development tools including DataStax Studio. 

A diagram showing the DataStax Enterprise (DSE) Graph database setup.

DSE Graph is built on top of Apache TinkerPop, Apache Cassandra, Apache Solr, and Apache Spark. DSE Graph uses Apache TinkerPop standards for data and traversal while also using Apache Cassandra for scalable storage and retrieval. DSE Graph leverages Apache Solr for search and for indexing Capabilities. DSE Graph employs Apache Spark for fast analytic traversal. All these components are integrated into the DSE graph to form a real-time graph database management system.

DSE Graph supports both transactional and analytic workloads, using two different engines. The analytic engine solely relies on Spark, which comes as part of the DSE product.

Get into the Gremlin console using the below command.

$ docker exec -it <container name> /bin/bash
$ dse gremlin-console

Check if a graph called voting_system exists in the system.

$ system.graph("voting_system").exists()

Create a new graph using the below command.

$ system.graph("graph_testing").create()

Get a list of graphs available in the system. 

$ system.graphs()

I find it very easy to work with graph and CQL commands on DataStax studio, the interactive notebook makes things a lot easier. We are going to link the DataStax studio with the DSE server and start working on the graph database from the studio. We are going to open the command line interface and use the below command to link the DSE server with the DSE studio and access the studio at http://localhost:9091/

docker run -e DS_LICENSE=accept --name <my-studio-name> -p 9091:9091 -d datastax/dse-studio --link <datastax_server_container_name>

With the DSE studio and the DSE server connected, we can now start defining our graph vertices, the graph edges, and our graph properties. Just a recap of what we covered in this article, we looked into various packages and capabilities of DSE, we created indexes on our Cassandra table and we also query and work with the table on Spark, then we proceeded to create a graph database on the DataStax studio.

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!

Related Articles

logo
cluster
troubleshooting
datastax

Explore Further

datastax

cassandra.basics

cassandra.lunch

Become part of our
growing community!
Welcome to Planet Cassandra, a community for Apache Cassandra®! We're a passionate and dedicated group of users, developers, and enthusiasts who are working together to make Cassandra the best it can be. Whether you're just getting started with Cassandra or you're an experienced user, there's a place for you in our community.
A dinosaur
Planet Cassandra is a service for the Apache Cassandra® user community to share with each other. From tutorials and guides, to discussions and updates, we're here to help you get the most out of Cassandra. Connect with us and become part of our growing community today.
© 2009-2023 The Apache Software Foundation under the terms of the Apache License 2.0. Apache, the Apache feather logo, Apache Cassandra, Cassandra, and the Cassandra logo, are either registered trademarks or trademarks of The Apache Software Foundation. Sponsored by Anant Corporation and Datastax, and Developed by Anant Corporation.

Get Involved with Planet Cassandra!

We believe that the power of the Planet Cassandra community lies in the contributions of its members. Do you have content, articles, videos, or use cases you want to share with the world?