Setup Guide

Setting up Cassandra with Docker

2much2learn - Setting up Cassandra with Docker
Clone the source code of the article from setting-up-cassandra-with-docker-on-windows

Introduction

As part of any application development, we definitely need a database. There might be different use cases which need different databases. Having them all installed on our machine might not be an ideal scenario. In the world of containers, this would ease us to provision the choice of database easily for testing purpose without actually installing them.

In this article we shall provision Apache Cassandra docker container and go through steps on using it.

Why use Apache Cassandra?

Apache Cassandra is an open source, distributed, wide-column store, NoSQL database which delivers always-on availability (No SPOF).

Apache Cassandra
Apache Cassandra

Below are some key features of Cassandra. Refer to the tutorial for more detail insights on Architecture, Data Model & Cqlsh.

  • Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to accommodate more customers and more data as per requirement.

  • Always on architecture − Cassandra has no single point of failure and it is continuously available for business-critical applications that cannot afford a failure.

  • Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases your throughput as you increase the number of nodes in the cluster. Therefore it maintains a quick response time.

  • Flexible data storage − Cassandra accommodates all possible data formats including: structured, semi-structured, and unstructured. It can dynamically accommodate changes to your data structures according to your need.

  • Easy data distribution − Cassandra provides the flexibility to distribute data where you need by replicating data across multiple data centers.

  • Transaction support − Cassandra supports properties like Atomicity, Consistency, Isolation, and Durability (ACID).

  • Fast writes − Cassandra was designed to run on cheap commodity hardware. It performs blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.

What are we going to do?

In this article, we shall go through on using Docker to explore Cassandra. Below are series of things we will be doing in this article:

  • Provision Cassandra container
  • Connecting to Cassandra from container shell
  • Create keyspace and perform some table operations
  • Connecting to the database from Cassandra GUI Client
  • Initialize Cassandra docker container with keyspace and data

Prerequisites

Having Docker installed on your machine is necessary to try out provisioning Cassandra container.

Install Docker Desktop to get started with Docker on Windows.

Post installation, run the below command in command prompt to verify if docker & docker-compose is installed successfully or validate if they are working as expected if docker is installed prior.

Docker Version
λ docker -v

Docker version 20.10.8, build 3967b7d
Docker Compose Version
λ docker-compose -v

Docker Compose version v2.0.0

Cassandra Docker Image

In this article, we shall use bitname/cassandra docker image opposed to official cassandra image.

Below are few salient reasons for opting Bitnami Images:

  • Bitnami closely tracks upstream source changes and promptly publishes new versions of this image using our automated systems.
  • With Bitnami images the latest bug fixes and features are available as soon as possible.
  • Bitnami containers, virtual machines and cloud images use the same components and configuration approach - making it easy to switch between formats based on your project needs.
  • Bitnami container images are released daily with the latest distribution packages available.

Provision Cassandra Container

It’s ideal option to orchestrate cassandra provisioning using docker-compose compared to running docker run. Both options are provided below for reference.

Using docker

λ docker run ^
    --name cassandra ^
    -p 7000:7000 ^
    -p 9042:9042 ^
    -v %cd%:/bitnami ^
    -d bitnami/cassandra:latest

Unable to find image 'bitnami/cassandra:latest' locally
latest: Pulling from bitnami/cassandra
....
....
Digest: sha256:30f76ea2e81f5379e1ba4c31c0f8b0c5481cd379b01113e3375ecfd6d5961862
Status: Downloaded newer image for bitnami/cassandra:latest
d67aca9ca1a79109b1ccf3010b3b549c9f80c876c981406c8a169ab23ddb74a8

Options passed:

  • --detach, -d - Run container in background and print container ID
  • --publish, -p - Publish a container’s port(s) to the host
  • --name - Assign a name to the container
  • -v - Mount current directory path as volume. This should create cassandra folder in the location where the docker run is executed

Verify if the container is provisioned by list containers command. Check logs to follow the traces on whats happening within the container.

List containers
λ docker ps
CONTAINER ID   IMAGE                      COMMAND                  CREATED         STATUS         PORTS                                            NAMES
1d897306c76c   bitnami/cassandra:latest   "/opt/bitnami/script…"   8 seconds ago   Up 5 seconds   0.0.0.0:7000->7000/tcp, 0.0.0.0:9042->9042/tcp   cassandra
View logs
λ docker logs -f cassandra
cassandra 20:23:34.21
cassandra 20:23:34.21 Welcome to the Bitnami cassandra container
cassandra 20:23:34.21 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-cassandra
cassandra 20:23:34.21 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-cassandra/issues
cassandra 20:23:34.21
cassandra 20:23:34.22 INFO  ==> ** Starting Cassandra setup **
cassandra 20:23:34.24 INFO  ==> Validating settings in CASSANDRA_* env vars..
....
....
....

Cleanup by stopping and removing the container before trying out provisioning with docker-compose

Cleanup
λ docker stop cassandra

λ docker rm cassandra

Using docker-compose

Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, we use a YAML file to configure your application’s services. Then, with a single command, we create and start all the services from our configuration.

To start with, let’s create docker-compose.yml with the below configuration.

docker-compose.yml
version: '3'

services:
  cassandra:
    image: docker.io/bitnami/cassandra:latest
    ports:
      - '7000:7000'
      - '9042:9042'
    volumes:
      - 'cassandra_data:/bitnami'
    healthcheck:
      test: [ "CMD", "/opt/bitnami/cassandra/bin/cqlsh", "-u cassandra", "-p cassandra" ,"-e \"describe keyspaces\"" ]
      interval: 15s
      timeout: 10s
      retries: 10
    environment:
      - CASSANDRA_SEEDS=cassandra
      - CASSANDRA_PASSWORD_SEEDER=yes
      - CASSANDRA_PASSWORD=cassandra
volumes:
  cassandra_data:
    driver: local

If observed, cassandra service is defined with different configurations which can be helpful to orchestrate, configure and perform healthcheck if instance is provisioned and running successfully by listing keyspaces.

Run the below series of commands from the path where docker-compose.yml resides.

Validate yaml file and list services
λ docker-compose config --services
Build and start services
λ docker-compose up -d
[+] Running 2/2
 - Network cassandratest_default        Created                                                                      
 - Container cassandratest-cassandra-1  Started
List containers
λ docker-compose ps
NAME                                  COMMAND                  SERVICE             STATUS               PORTS
cassandratest-cassandra-1   "/opt/bitnami/script…"   cassandra           running (starting)   0.0.0.0:7000->7000/tcp, 0.0.0.0:9042->9042/tcp
Display the running processes
λ docker-compose top
cassandratest-cassandra-1
UID    PID     PPID    C    STIME   TTY   TIME       CMD
1001   25498   25477   21   20:48   ?     00:00:22   /opt/bitnami/java/bin/java -ea -da:net.openhft
Stop and start services
λ docker-compose stop

λ docker-compose start

Run the below command to perform cleanup activities - Stops containers and removes containers, networks, volumes, and images created by up

Cleanup
λ docker-compose down -v
[+] Running 3/3
 - Container cassandratest-cassandra-1  Removed  
 - Volume cassandratest_cassandra_data  Removed  
 - Network cassandratest_default        Removed

Note: If the file name is other than docker-compose.yml, then run the commands by providing the file as below

provide file
λ docker-compose -f service-compose\services.yml up -d

λ docker-compose -f service-compose\services.yml ps

λ docker-compose -f service-compose\services.yml down -v

Connecting to the database from container shell

In order to interact and execute statements, we need to connect to container shell and login to cqlsh service.

Run the below commands post provisioning the container instance either through docker or docker-compose as briefed above.

Provision container
λ docker-compose up -d

λ docker-compose ps
NAME                                  COMMAND                  SERVICE             STATUS               PORTS
cassandratest-cassandra-1   "/opt/bitnami/script…"   cassandra           running (starting)   0.0.0.0:7000->7000/tcp, 0.0.0.0:9042->9042/tcp

Connect to container shell using docker exec by passing the name as listed in docker-compose ps output. Running ls -ltrs should list something as below post connecting to the shell.

Connect to container shell
λ docker exec -it cassandratest-cassandra-1 bash

I have no name!@12102dc0991f:/$ ls -ltrs
total 76
4 drwxr-xr-x   1 root root 4096 Sep 25  2017 lib
4 drwxr-xr-x   1 root root 4096 Sep 27 16:53 var
....
....
4 drwxr-xr-x   1 root root 4096 Oct 18 09:48 sbin
4 drwxr-xr-x   1 root root 4096 Oct 18 09:48 bin
0 lrwxrwxrwx   1 root root   44 Oct 18 09:49 entrypoint.sh -> /opt/bitnami/scripts/cassandra/entrypoint.sh
0 lrwxrwxrwx   1 root root   37 Oct 18 09:49 run.sh -> /opt/bitnami/scripts/cassandra/run.sh
4 drwxrwxr-x   2 root root 4096 Oct 18 09:49 docker-entrypoint-initdb.d
4 drwxr-xr-x   1 root root 4096 Oct 18 23:27 etc
4 drwxr-xr-x   3 root root 4096 Oct 18 23:27 bitnami
....
....

Perform database operations

Cassandra Query Language (CQL) is the primary language for communicating with the cassandra database. Syntax is slightly to SQL.

To start with, we need to create Keyspace in Cassandra which is similar to a database in SQL.

For the purpose of this article, we shall create sample database to manage Cycling Races documented in DataStax CQL Documentation.

Though Cassandra is NoSQL Database, We can model the domain by creating normalized tables and create foreign reference keys.

Below is the high level entity diagram on the database structure that we shall create under cycling keyspace.

cyclist_namecyclist_racesrace_timesrank_by_year_and_nameracesrace-timingsranking

As observed, the database is to track cyclist’s ranking in one or more race’s that is conducted as part of an event.

The model design includes couple of joins to mimic relationships, but Cassandra doesn’t enforce this as there are No Join queries. We need to create the data model in demoralized way ensuring there is no referential integrity and thus ensuring optimal performance they need when they have to do so many joins on years’ worth of data.

To start creating objects, we need to connect to cqlsh, which is the CLI for interacting with Cassandra using CQL.

Connect to Cqlsh
λ /opt/bitnami/cassandra/bin/cqlsh -u cassandra -p cassandra

Create keyspace and list them

Create keyspace
cassandra@cqlsh> CREATE KEYSPACE IF NOT EXISTS cycling
  WITH REPLICATION = {
   'class' : 'SimpleStrategy',
   'replication_factor' : 1
  };

cassandra@cqlsh> DESCRIBE keyspaces;

cycling  system_auth         system_schema  system_views
system   system_distributed  system_traces  system_virtual_schema

Switch to cycling keyspace to start creating tables, types or any other database objects

Switch to cycling keyspace
cassandra@cqlsh> use cycling;
cassandra@cqlsh:cycling>

Below is few create and insert statements for reference. Refer to schema for complete set of queries.

Cyclist Names
DROP TABLE IF EXISTS cycling.cyclist_name;

CREATE TABLE cycling.cyclist_name (
  id UUID PRIMARY KEY,
  lastname text,
  firstname text,
  details_ map<text,text>
);

INSERT INTO cycling.cyclist_name (id, lastname, firstname, details_) VALUES (e7cd5752-bc0d-4157-a80f-7523add8dbcd, 'VAN DER BREGGEN', 'Anna', {'details_age':'35', 'details_bday':'27/07/1980', 'details_nation':'AUSTRALIA'});
INSERT INTO cycling.cyclist_name (id, lastname, firstname, details_) VALUES (e7ae5cf3-d358-4d99-b900-85902fda9bb0, 'FRAME', 'Alex', {'details_age':'54', 'details_bday':'27/07/1961', 'details_nation':'ITALY'});
INSERT INTO cycling.cyclist_name (id, lastname, firstname, details_) VALUES (220844bf-4860-49d6-9a4b-6b5d3a79cbfb, 'TIRALONGO', 'Paolo', {'details_age':'23', 'details_bday':'27/07/1992', 'details_nation':'CANADA'});
INSERT INTO cycling.cyclist_name (id, lastname, firstname, details_) VALUES (6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47, 'KRUIKSWIJK', 'Steven', {'details_age':'23', 'details_bday':'27/07/1992', 'details_nation':'GERMANY'});
INSERT INTO cycling.cyclist_name (id, lastname, firstname, details_) VALUES (fb372533-eb95-4bb4-8685-6ef61e994caa, 'MATTHEWS', 'Michael', {'details_age':'28', 'details_bday':'27/07/1987', 'details_nation':'NETHERLANDS'});
Cyclist Races
DROP TABLE IF EXISTS cycling.cyclist_races;
DROP TYPE IF EXISTS cycling.race;

CREATE TYPE cycling.race (
   race_title text,
   race_date timestamp,
   race_time time);

CREATE TABLE cycling.cyclist_races (
  id UUID PRIMARY KEY,
  lastname text,
  firstname text,
  races list<FROZEN <race>> );

INSERT INTO cycling.cyclist_races (id,races) VALUES (
   e7ae5cf3-d358-4d99-b900-85902fda9bb0,
   [ { race_title:'17th Santos Tour Down Under Aalburg', race_date:'2017-04-14',race_time:'07:00:00' },
     { race_title:'17th Santos Tour Down Under Gelderland', race_date:'2017-04-14', race_time:'08:00:00' } ]);

INSERT INTO cycling.cyclist_races (id, lastname, firstname, races) 
VALUES (
	e7cd5752-bc0d-4157-a80f-7523add8dbcd, 'VAN DER BREGGEN', 'Anna', 
	[ {race_title:'Festival Luxembourgeois du cyclisme feminin Elsy Jacobs - Prologue - Garnich > Garnich',race_date:'2017-04-14',race_time:'08:00:00'},
		{race_title:'Festival Luxembourgeois du cyclisme feminin Elsy Jacobs - Stage 2 - Garnich > Garnich',race_date:'2017-04-14',race_time:'06:00:00'},
		{race_title:'Festival Luxembourgeois du cyclisme feminin Elsy Jacobs - Stage 3 - Mamer > Mamer',race_date:'2017-04-14',race_time:'10:00:00'} ]);

UPDATE cycling.cyclist_races
	SET 
    lastname = 'FRAME',
    firstname = 'Alex',
    races[1] = { race_time:'06:00:00'}
		WHERE id = e7ae5cf3-d358-4d99-b900-85902fda9bb0;

The above queries should create Two Tables and One User Defined Type.

Database objects
cassandra@cqlsh:cycling> DESCRIBE TYPES;

race

cassandra@cqlsh:cycling> DESCRIBE TABLES;

cyclist_name  cyclist_races

Play around with below Select query statements

Select statements

cassandra@cqlsh:cycling> SELECT * from cycling.cyclist_name;

cassandra@cqlsh:cycling> SELECT lastname, races FROM cycling.cyclist_races WHERE id = e7cd5752-bc0d-4157-a80f-7523add8dbcd;

Connecting to the database from GUI Client

I could find RazorSQL Cassandra Database Browser as an ideal option if there is a necessity to explore Cassandra database using GUI Client.

RazorSQL is a licensed product which provides 30 Days trail access.

With cqlsh and wide options on Cassandra drivers available at hand, We wouldn’t require a GUI client for basic usage.

Initializing Database during container provisioning

There would be a need for all of us to initialize the database as part of container provisioning for quick development & testing setup.

A minor change in the docker-compose.yml to configure the init scripts defined under volumes will do the trick.

docker-compose.yml
version: '3'

services:
  cassandra:
    image: docker.io/bitnami/cassandra:latest
    ports:
      - '7000:7000'
      - '9042:9042'
    volumes:
      - ./schema/cassandra:/docker-entrypoint-initdb.d      - 'cassandra_data:/bitnami'
    healthcheck:
      test: [ "CMD", "/opt/bitnami/cassandra/bin/cqlsh", "-u cassandra", "-p cassandra" ,"-e \"describe keyspaces\"" ]
      interval: 15s
      timeout: 10s
      retries: 10
    environment:
      - CASSANDRA_SEEDS=cassandra
      - CASSANDRA_PASSWORD_SEEDER=yes
      - CASSANDRA_PASSWORD=cassandra
volumes:
  cassandra_data:
    driver: local

As highlighted, couple of cql files are created in order under schema/cassandra folder. During the container startup, these scripts will be executed in sequence and the database objects will be created for usage.

Repo structure
Repo structure

Clone the Repo and run the below command to orchestrate container creation which will initialize the configured schema.

Database initialization
λ git clone https://github.com/2much2learn/article-oct192021-cassandra-with-docker.git

λ cd article-oct192021-cassandra-with-docker

λ docker-compose up -d
Connect to shell
λ docker exec -it article-oct192021-cassandra-with-docker-cassandra-1 bash
Connect to cqlsh and verify initialized database objects
I have no name!@d2fc96819e4b:/$ /opt/bitnami/cassandra/bin/cqlsh -u cassandra -p cassandra

cassandra@cqlsh> DESCRIBE keyspaces;

cycling  system_auth         system_schema  system_views
system   system_distributed  system_traces  system_virtual_schema

cassandra@cqlsh> use cycling;

cassandra@cqlsh:cycling>
cassandra@cqlsh:cycling> DESCRIBE tables;

cyclist_name  cyclist_races  race_times  rank_by_year_and_name

cassandra@cqlsh:cycling> DESCRIBE types;

race

Conclusion

This article is more of a little hack on making our application development faster by quickly setting up database of our choice without worrying on annoying steps on installing and configuring the database. As Cassandra is one of the most commonly used NoSql databases, having it provisioned with Docker and connecting to our local application is quite simple and easy.

Clone the source code of the article from setting-up-cassandra-with-docker-on-windows
author

Madan Narra21 Posts

Software developer, Consultant & Architect

Madan is a software developer, writer, and ex-failed-startup co-founder. He has over 10+ years of experience building scalable and distributed systems using Java, JavaScript, Node.js. He writes about software design and architecture best practices with Java and is especially passionate about Microservices, API Development, Distributed Applications and Frontend Technologies.

  • Github
  • Linkedin
  • Facebook
  • Twitter
  • Instagram

Contents

Related Posts

Get The Best Of All Hands Delivered To Your Inbox

Subscribe to our newsletter and stay updated.