Setting up Cassandra with Docker
Last modified: 19 Oct, 2021Introduction
As part of any application development, we definitely need a database. There might be different use cases which need different databases. Having them all installed on our machine might not be an ideal scenario. In the world of containers, this would ease us to provision the choice of database easily for testing purpose without actually installing them.
In this article we shall provision Apache Cassandra docker container and go through steps on using it.
Why use Apache Cassandra?
Apache Cassandra
is an open source, distributed, wide-column store, NoSQL database which delivers always-on availability (No SPOF).
Below are some key features of Cassandra. Refer to the tutorial for more detail insights on Architecture, Data Model & Cqlsh.
Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to accommodate more customers and more data as per requirement.
Always on architecture − Cassandra has no single point of failure and it is continuously available for business-critical applications that cannot afford a failure.
Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases your throughput as you increase the number of nodes in the cluster. Therefore it maintains a quick response time.
Flexible data storage − Cassandra accommodates all possible data formats including: structured, semi-structured, and unstructured. It can dynamically accommodate changes to your data structures according to your need.
Easy data distribution − Cassandra provides the flexibility to distribute data where you need by replicating data across multiple data centers.
Transaction support − Cassandra supports properties like Atomicity, Consistency, Isolation, and Durability (ACID).
Fast writes − Cassandra was designed to run on cheap commodity hardware. It performs blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.
What are we going to do?
In this article, we shall go through on using Docker
to explore Cassandra
. Below are series of things we will be doing in this article:
- Provision Cassandra container
- Connecting to Cassandra from container shell
- Create keyspace and perform some table operations
- Connecting to the database from Cassandra GUI Client
- Initialize Cassandra docker container with keyspace and data
Prerequisites
Having Docker installed on your machine is necessary to try out provisioning Cassandra container.
Install Docker Desktop to get started with Docker on Windows.
Post installation, run the below command in command prompt to verify if docker & docker-compose is installed successfully or validate if they are working as expected if docker is installed prior.
λ docker -v
Docker version 20.10.8, build 3967b7d
λ docker-compose -v
Docker Compose version v2.0.0
Cassandra Docker Image
In this article, we shall use bitname/cassandra docker image opposed to official cassandra image.
Below are few salient reasons for opting Bitnami Images:
- Bitnami closely tracks upstream source changes and promptly publishes new versions of this image using our automated systems.
- With Bitnami images the latest bug fixes and features are available as soon as possible.
- Bitnami containers, virtual machines and cloud images use the same components and configuration approach - making it easy to switch between formats based on your project needs.
- Bitnami container images are released daily with the latest distribution packages available.
Provision Cassandra Container
It’s ideal option to orchestrate cassandra provisioning using docker-compose
compared to running docker run
. Both options are provided below for reference.
Using docker
λ docker run ^
--name cassandra ^
-p 7000:7000 ^
-p 9042:9042 ^
-v %cd%:/bitnami ^
-d bitnami/cassandra:latest
Unable to find image 'bitnami/cassandra:latest' locally
latest: Pulling from bitnami/cassandra
....
....
Digest: sha256:30f76ea2e81f5379e1ba4c31c0f8b0c5481cd379b01113e3375ecfd6d5961862
Status: Downloaded newer image for bitnami/cassandra:latest
d67aca9ca1a79109b1ccf3010b3b549c9f80c876c981406c8a169ab23ddb74a8
Options passed:
--detach, -d
- Run container in background and print container ID--publish, -p
- Publish a container’s port(s) to the host--name
- Assign a name to the container-v
- Mount current directory path as volume. This should createcassandra
folder in the location where thedocker run
is executed
Verify if the container is provisioned by list containers command. Check logs to follow the traces on whats happening within the container.
λ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1d897306c76c bitnami/cassandra:latest "/opt/bitnami/script…" 8 seconds ago Up 5 seconds 0.0.0.0:7000->7000/tcp, 0.0.0.0:9042->9042/tcp cassandra
λ docker logs -f cassandra
cassandra 20:23:34.21
cassandra 20:23:34.21 Welcome to the Bitnami cassandra container
cassandra 20:23:34.21 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-cassandra
cassandra 20:23:34.21 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-cassandra/issues
cassandra 20:23:34.21
cassandra 20:23:34.22 INFO ==> ** Starting Cassandra setup **
cassandra 20:23:34.24 INFO ==> Validating settings in CASSANDRA_* env vars..
....
....
....
Cleanup by stopping and removing the container before trying out provisioning with docker-compose
λ docker stop cassandra
λ docker rm cassandra
Using docker-compose
Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, we use a YAML file to configure your application’s services. Then, with a single command, we create and start all the services from our configuration.
To start with, let’s create docker-compose.yml
with the below configuration.
version: '3'
services:
cassandra:
image: docker.io/bitnami/cassandra:latest
ports:
- '7000:7000'
- '9042:9042'
volumes:
- 'cassandra_data:/bitnami'
healthcheck:
test: [ "CMD", "/opt/bitnami/cassandra/bin/cqlsh", "-u cassandra", "-p cassandra" ,"-e \"describe keyspaces\"" ]
interval: 15s
timeout: 10s
retries: 10
environment:
- CASSANDRA_SEEDS=cassandra
- CASSANDRA_PASSWORD_SEEDER=yes
- CASSANDRA_PASSWORD=cassandra
volumes:
cassandra_data:
driver: local
If observed, cassandra
service is defined with different configurations which can be helpful to orchestrate, configure and perform healthcheck if instance is provisioned and running successfully by listing keyspaces
.
Run the below series of commands from the path where docker-compose.yml resides.
λ docker-compose config --services
λ docker-compose up -d
[+] Running 2/2
- Network cassandratest_default Created
- Container cassandratest-cassandra-1 Started
λ docker-compose ps
NAME COMMAND SERVICE STATUS PORTS
cassandratest-cassandra-1 "/opt/bitnami/script…" cassandra running (starting) 0.0.0.0:7000->7000/tcp, 0.0.0.0:9042->9042/tcp
λ docker-compose top
cassandratest-cassandra-1
UID PID PPID C STIME TTY TIME CMD
1001 25498 25477 21 20:48 ? 00:00:22 /opt/bitnami/java/bin/java -ea -da:net.openhft
λ docker-compose stop
λ docker-compose start
Run the below command to perform cleanup activities - Stops containers and removes containers, networks, volumes, and images created by up
λ docker-compose down -v
[+] Running 3/3
- Container cassandratest-cassandra-1 Removed
- Volume cassandratest_cassandra_data Removed
- Network cassandratest_default Removed
Note: If the file name is other than docker-compose.yml
, then run the commands by providing the file as below
λ docker-compose -f service-compose\services.yml up -d
λ docker-compose -f service-compose\services.yml ps
λ docker-compose -f service-compose\services.yml down -v
Connecting to the database from container shell
In order to interact and execute statements, we need to connect to container shell and login to cqlsh
service.
Run the below commands post provisioning the container instance either through docker
or docker-compose
as briefed above.
λ docker-compose up -d
λ docker-compose ps
NAME COMMAND SERVICE STATUS PORTS
cassandratest-cassandra-1 "/opt/bitnami/script…" cassandra running (starting) 0.0.0.0:7000->7000/tcp, 0.0.0.0:9042->9042/tcp
Connect to container shell using docker exec
by passing the name as listed in docker-compose ps
output. Running ls -ltrs
should list something as below post connecting to the shell.
λ docker exec -it cassandratest-cassandra-1 bash
I have no name!@12102dc0991f:/$ ls -ltrs
total 76
4 drwxr-xr-x 1 root root 4096 Sep 25 2017 lib
4 drwxr-xr-x 1 root root 4096 Sep 27 16:53 var
....
....
4 drwxr-xr-x 1 root root 4096 Oct 18 09:48 sbin
4 drwxr-xr-x 1 root root 4096 Oct 18 09:48 bin
0 lrwxrwxrwx 1 root root 44 Oct 18 09:49 entrypoint.sh -> /opt/bitnami/scripts/cassandra/entrypoint.sh
0 lrwxrwxrwx 1 root root 37 Oct 18 09:49 run.sh -> /opt/bitnami/scripts/cassandra/run.sh
4 drwxrwxr-x 2 root root 4096 Oct 18 09:49 docker-entrypoint-initdb.d
4 drwxr-xr-x 1 root root 4096 Oct 18 23:27 etc
4 drwxr-xr-x 3 root root 4096 Oct 18 23:27 bitnami
....
....
Perform database operations
Cassandra Query Language (CQL)
is the primary language for communicating with the cassandra database. Syntax is slightly to SQL.
To start with, we need to create Keyspace
in Cassandra which is similar to a database in SQL.
For the purpose of this article, we shall create sample database to manage Cycling Races
documented in DataStax CQL Documentation.
Though Cassandra is NoSQL Database, We can model the domain by creating normalized tables and create foreign reference keys.
Below is the high level entity diagram on the database structure that we shall create under cycling
keyspace.
As observed, the database is to track cyclist’s ranking in one or more race’s that is conducted as part of an event.
The model design includes couple of joins
to mimic relationships, but Cassandra doesn’t enforce this as there are No Join
queries. We need to create the data model in demoralized way ensuring there is no referential integrity and thus ensuring optimal performance they need when they have to do so many joins on years’ worth of data.
To start creating objects, we need to connect to cqlsh
, which is the CLI
for interacting with Cassandra using CQL.
λ /opt/bitnami/cassandra/bin/cqlsh -u cassandra -p cassandra
Create keyspace and list them
cassandra@cqlsh> CREATE KEYSPACE IF NOT EXISTS cycling
WITH REPLICATION = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};
cassandra@cqlsh> DESCRIBE keyspaces;
cycling system_auth system_schema system_views
system system_distributed system_traces system_virtual_schema
Switch to cycling keyspace to start creating tables, types or any other database objects
cassandra@cqlsh> use cycling;
cassandra@cqlsh:cycling>
Below is few create and insert statements for reference. Refer to schema for complete set of queries.
DROP TABLE IF EXISTS cycling.cyclist_name;
CREATE TABLE cycling.cyclist_name (
id UUID PRIMARY KEY,
lastname text,
firstname text,
details_ map<text,text>
);
INSERT INTO cycling.cyclist_name (id, lastname, firstname, details_) VALUES (e7cd5752-bc0d-4157-a80f-7523add8dbcd, 'VAN DER BREGGEN', 'Anna', {'details_age':'35', 'details_bday':'27/07/1980', 'details_nation':'AUSTRALIA'});
INSERT INTO cycling.cyclist_name (id, lastname, firstname, details_) VALUES (e7ae5cf3-d358-4d99-b900-85902fda9bb0, 'FRAME', 'Alex', {'details_age':'54', 'details_bday':'27/07/1961', 'details_nation':'ITALY'});
INSERT INTO cycling.cyclist_name (id, lastname, firstname, details_) VALUES (220844bf-4860-49d6-9a4b-6b5d3a79cbfb, 'TIRALONGO', 'Paolo', {'details_age':'23', 'details_bday':'27/07/1992', 'details_nation':'CANADA'});
INSERT INTO cycling.cyclist_name (id, lastname, firstname, details_) VALUES (6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47, 'KRUIKSWIJK', 'Steven', {'details_age':'23', 'details_bday':'27/07/1992', 'details_nation':'GERMANY'});
INSERT INTO cycling.cyclist_name (id, lastname, firstname, details_) VALUES (fb372533-eb95-4bb4-8685-6ef61e994caa, 'MATTHEWS', 'Michael', {'details_age':'28', 'details_bday':'27/07/1987', 'details_nation':'NETHERLANDS'});
DROP TABLE IF EXISTS cycling.cyclist_races;
DROP TYPE IF EXISTS cycling.race;
CREATE TYPE cycling.race (
race_title text,
race_date timestamp,
race_time time);
CREATE TABLE cycling.cyclist_races (
id UUID PRIMARY KEY,
lastname text,
firstname text,
races list<FROZEN <race>> );
INSERT INTO cycling.cyclist_races (id,races) VALUES (
e7ae5cf3-d358-4d99-b900-85902fda9bb0,
[ { race_title:'17th Santos Tour Down Under Aalburg', race_date:'2017-04-14',race_time:'07:00:00' },
{ race_title:'17th Santos Tour Down Under Gelderland', race_date:'2017-04-14', race_time:'08:00:00' } ]);
INSERT INTO cycling.cyclist_races (id, lastname, firstname, races)
VALUES (
e7cd5752-bc0d-4157-a80f-7523add8dbcd, 'VAN DER BREGGEN', 'Anna',
[ {race_title:'Festival Luxembourgeois du cyclisme feminin Elsy Jacobs - Prologue - Garnich > Garnich',race_date:'2017-04-14',race_time:'08:00:00'},
{race_title:'Festival Luxembourgeois du cyclisme feminin Elsy Jacobs - Stage 2 - Garnich > Garnich',race_date:'2017-04-14',race_time:'06:00:00'},
{race_title:'Festival Luxembourgeois du cyclisme feminin Elsy Jacobs - Stage 3 - Mamer > Mamer',race_date:'2017-04-14',race_time:'10:00:00'} ]);
UPDATE cycling.cyclist_races
SET
lastname = 'FRAME',
firstname = 'Alex',
races[1] = { race_time:'06:00:00'}
WHERE id = e7ae5cf3-d358-4d99-b900-85902fda9bb0;
The above queries should create Two Tables
and One User Defined Type
.
cassandra@cqlsh:cycling> DESCRIBE TYPES;
race
cassandra@cqlsh:cycling> DESCRIBE TABLES;
cyclist_name cyclist_races
Play around with below Select
query statements
cassandra@cqlsh:cycling> SELECT * from cycling.cyclist_name;
cassandra@cqlsh:cycling> SELECT lastname, races FROM cycling.cyclist_races WHERE id = e7cd5752-bc0d-4157-a80f-7523add8dbcd;
Connecting to the database from GUI Client
I could find RazorSQL Cassandra Database Browser as an ideal option if there is a necessity to explore Cassandra database using GUI Client.
RazorSQL
is a licensed product which provides 30 Days trail access.
With cqlsh
and wide options on Cassandra drivers available at hand, We wouldn’t require a GUI client for basic usage.
Initializing Database during container provisioning
There would be a need for all of us to initialize the database as part of container provisioning for quick development & testing setup.
A minor change in the docker-compose.yml
to configure the init scripts defined under volumes
will do the trick.
version: '3'
services:
cassandra:
image: docker.io/bitnami/cassandra:latest
ports:
- '7000:7000'
- '9042:9042'
volumes:
- ./schema/cassandra:/docker-entrypoint-initdb.d - 'cassandra_data:/bitnami'
healthcheck:
test: [ "CMD", "/opt/bitnami/cassandra/bin/cqlsh", "-u cassandra", "-p cassandra" ,"-e \"describe keyspaces\"" ]
interval: 15s
timeout: 10s
retries: 10
environment:
- CASSANDRA_SEEDS=cassandra
- CASSANDRA_PASSWORD_SEEDER=yes
- CASSANDRA_PASSWORD=cassandra
volumes:
cassandra_data:
driver: local
As highlighted, couple of cql
files are created in order under schema/cassandra
folder. During the container startup, these scripts will be executed in sequence and the database objects will be created for usage.
Clone the Repo and run the below command to orchestrate container creation which will initialize the configured schema.
λ git clone https://github.com/2much2learn/article-oct192021-cassandra-with-docker.git
λ cd article-oct192021-cassandra-with-docker
λ docker-compose up -d
λ docker exec -it article-oct192021-cassandra-with-docker-cassandra-1 bash
I have no name!@d2fc96819e4b:/$ /opt/bitnami/cassandra/bin/cqlsh -u cassandra -p cassandra
cassandra@cqlsh> DESCRIBE keyspaces;
cycling system_auth system_schema system_views
system system_distributed system_traces system_virtual_schema
cassandra@cqlsh> use cycling;
cassandra@cqlsh:cycling>
cassandra@cqlsh:cycling> DESCRIBE tables;
cyclist_name cyclist_races race_times rank_by_year_and_name
cassandra@cqlsh:cycling> DESCRIBE types;
race
Conclusion
This article is more of a little hack on making our application development faster by quickly setting up database of our choice without worrying on annoying steps on installing and configuring the database. As Cassandra is one of the most commonly used NoSql databases, having it provisioned with Docker and connecting to our local application is quite simple and easy.