Setup Guide

Configuring Centralized logging with Kafka and ELK stack

2much2learn - Configuring Centralized logging with Kafka and ELK stack
Clone the source code of the article from centralized-logging-with-kafka-and-elk-stack

Introduction

In real world, any production incident soon turns out to be scary when we have to go through hell lot of logs.

Some times it will be very difficult to replicate the issue by analyzing data is scattered across different log files. This is when Centralized Logging comes to rescue to stitch together all the logs into one place.

But this comes with a challenge if centralizing logging tool is not configured to handle Log-bursting situations. Log-bursting event causes saturation, log transmission latency and sometimes log loss which should never occurs on production systems.

To handle such situation, we can publish logs to Kafka which acts as a buffer in front of Logstash to ensure resiliency. This is one of many best ways to deploy the ELK Stack to reduce log overload.

In this article, We shall orchestrate complete solution using docker to configure Kafka with ELK stack to enable centralized logging capabilities.

IMG 2

Technology stack

  • Apache Kafka - Collects logs from application & queues it.
  • Kafka Manager - A web-based management system for Kafka developed at Yahoo
  • Logstash - aggregates the data from the Kafka topic, processes it and ships to Elasticsearch
  • Elasticsearch - indexes the data.
  • Kibana - for analyzing the data.

Prerequisites

Solution tried out in this article is setup and tested on Mac OS and Ubuntu OS. If you are on windows and would like to make your hands dirty with Unix, then I would recommend going through Configure development environment on Ubuntu 19.10 article which has detailed steps on setting up development environment on Ubuntu.

At the least, you should have below system requirements and tools installed to try out the application:

  • Docker
  • Docker Compose
  • System Requirements
    • Processor: 4 Core
    • Memory: 6 GB RAM
    • Swap: 2 GB.
    • Disk image size: 100 GB

Configuring Kafka & Kafka Manager

Kafka Docker images are published by multiple authors. Below are two which are trusted by the most

For this article, we are using the image published by Spotify - spotify/kafka

Why did we choose spotify/kafka image ?

The main hurdle of running Kafka in Docker is that it depends on Zookeeper. Compared to other Kafka docker images, this image runs both Zookeeper and Kafka in the same container.

This results to:

  • No dependency on an external Zookeeper host, or linking to another container
  • Zookeeper and Kafka are configured to work together out of the box

For seamless orchestration, we shall bootstrap all the required containers with Docker Compose for this setup.

Below is the snapshot for configuring Kafka & Kafka Manager in docker-compose.yml

Configure Spotify Kafka image in docker-compose.yml
services:
  # Kafka Server & Zookeeper Docker Image
  kafkaserver:
    image: "spotify/kafka:latest"
    container_name: kafka
    # Configures docker image to run in bridge mode network
    hostname: kafkaserver
    networks:
      - kafkanet
    # Make a port available to services outside of Docker
    ports:
      - 2181:2181
      - 9092:9092
    environment:
      ADVERTISED_HOST: kafkaserver
      ADVERTISED_PORT: 9092

Kafka Manager is a tool from Yahoo for managing Apache Kafka cluster. Below is snapshot of its configuration in docker-compose.yml.

Configure Kafka Manager image in docker-compose.yml
  # Kafka Manager docker image, it is a web based tool for managing Apache Kafka.
  kafka_manager:
    image: "mzagar/kafka-manager-docker:1.3.3.4"
    container_name: kafkamanager
    #configures the kafka manager docker image to run in bridge mode network
    networks:
      - kafkanet
    # Make a port available to services outside of Docker
    ports:
      - 9000:9000
    # It Links kafka_manager container to kafkaserver container to communicate.
    links:
      - kafkaserver:kafkaserver
    environment:
      ZK_HOSTS: "kafkaserver:2181"

Configuring Elasticsearch

Elasticsearch docker-compose simple configuration in the docker-compose.yml something like this

Configure Elasticsearch image in docker-compose.yml
  # Elasticsearch Docker Image
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.4.0
    container_name: elasticsearch
    # Make a port available to services outside of Docker
    ports:
      - 9200:9200
      - 9300:9300
    # Configures docker image to run in bridge mode network
    networks:
      - kafkanet

Configuring Kibana

Kibana docker-compose simple configuration in the docker-compose.yml something like this

Configure Kibana image in docker-compose.yml
  # Kibana Docker Image
  kibana:
    image: docker.elastic.co/kibana/kibana:6.4.0
    container_name: kibana
    # Make a port available to services outside of Docker
    ports:
      - 5601:5601
    # It Links kibana container & elasticsearch container to communicate
    links:
      - elasticsearch:elasticsearch
    # Configures docker image to run in bridge mode network
    networks:
      - kafkanet
    # You can control the order of service startup and shutdown with the depends_on option.
    depends_on: ['elasticsearch']

Configuring Logstash

Logstash docker-compose simple configuration in the docker-compose.yml something like this

Configure Logstash image in docker-compose.yml
  # Logstash Docker Image
  logstash:
    image: docker.elastic.co/logstash/logstash:6.4.0
    container_name: logstash
    # It Links elasticsearch container & kafkaserver container  & logstash container to communicate
    links:
      - elasticsearch:elasticsearch
      - kafkaserver:kafkaserver
    # Configures docker image to run in bridge mode network
    networks:
      - kafkanet
    # You can control the order of service startup and shutdown with the depends_on option.
    depends_on: ['elasticsearch', 'kafkaserver']
    # Mount host volumes into docker containers to supply logstash.config file
    volumes:
      - '/private/config-dir:/usr/share/logstash/pipeline/'

Next, we will configure a Logstash pipeline that pulls our logs from a Kafka topic, process these logs and ships them on to Elasticsearch for indexing.

Setup Logstash pipeline

Pipelines are configured in logstash.conf. Below is brief notes on what we are configuring:

  • Accept logs from Kafka topics for the configured kafka cluster
  • Filter messages and apply transformation rules as per requirement. For this setup, we are converting UTC datetime to a specific timezone when persisted to index.
  • Log messages captured from SIT & UAT environments are captured to their individual index.
logstash.conf
# Plugin Configuration. This input will read events from a Kafka topic.
# Ref Link - https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html

input {
  kafka {
    bootstrap_servers => "kafkaserver:9092"
    topics => ["sit.catalogue.item","uat.catalogue.item"]
    auto_offset_reset => "earliest"
    decorate_events => true
  }
}

# Filter Plugin. A filter plugin performs intermediary processing on an event.
# Ref Link - https://www.elastic.co/guide/en/logstash/current/filter-plugins.html

filter {
  json {
    source => "message"
  }
  mutate {
    remove_field => [
      "[message]"
    ]
  }
  if (![latency] or [latency]=="") {
    mutate {
      add_field => {
        latency => -1
      }
    }
  }
  date {
    match => [ "time_stamp", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" ]
    timezone => "Europe/London"
    target => [ "app_ts" ]
    remove_field => ["time_stamp"]
  }
  if ([@metadata][kafka][topic] == "uat.catalogue.item") {
    mutate {
      add_field => {
        indexPrefix => "uat-catalogue-item"
      }
    }
  }else if ([@metadata][kafka][topic] == "sit.catalogue.item") {
    mutate {
      add_field => {
        indexPrefix => "sit-catalogue-item"
      }
    }
  }else{
    mutate {
      add_field => {
        indexPrefix => "unknown"
      }
    }
  }
}

#An output plugin sends event data to a particular destination. Outputs are the final stage in the event pipeline.
# Ref Link - https://www.elastic.co/guide/en/logstash/current/output-plugins.html
output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "%{[indexPrefix]}-logs-%{+YYYY.MM.dd}"
  }
}

Below is detailed explanation of different bits configured in logstash.conf file:

  • decorate_events: Metadata is only added to the event if the decorate_events option is set to true (it defaults to false).

Please note that @metadata fields are not part of any of your events at output time. If you need these information to be inserted into your original event, you’ll have to use the mutate filter to manually copy the required fields into your event.

decorate_events => true
  • mutate: The mutate filter allows you to perform general mutations on fields. You can rename, remove, replace, and modify fields in your events. Below is example of mutation on json field latency. If the latency is not provided from the source application we can set it to default -1
if (![latency] or [latency]=="") {
  mutate {
    add_field => {
      latency => -1
    }
  }
}
  • remove_field => message: Logstash output produces message field. Storing this raw message field to elasticsearch is only adding unused or redundant data. so we can remove message field using below filter
mutate {
  remove_field => [
    "[message]"
  ]
}
  • Date Filter plugin: The date filter is used for parsing dates from fields, and then using that date or timestamp as the logstash timestamp for the event.In below example we are using time_stamp field from JSON (which is in UTC time formate provided by source application) & converting it to "Europe/London" timezone.
date {
  match => [ "time_stamp", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" ]
  timezone => "Europe/London"
  target => [ "app_ts" ]
  remove_field => ["time_stamp"]
}
  • index-prefix field: We are creating index prefix "uat-catalogue-item" & "sit-catalogue-item" based on the metadata info provided by kaka broker. As creating multiple cluster for POC will be bit heavier. so we are using same cluser to demonstrat two enviroment. we are using below meta field to identify kafka topic for diffrent enviroment.

[@metadata][kafka][topic] : original Kafka topic name from where the message was consumed.

//If message consumed from uat kafka topic add prefix "uat-catalogue-item"
if ([@metadata][kafka][topic] == "uat.catalogue.item") {
  mutate {
    add_field => {
      indexPrefix => "uat-catalogue-item"
    }
  }
}

//If message consumed from sit kafka topic add prefix "sit-catalogue-item"
else if ([@metadata][kafka][topic] == "sit.catalogue.item") {
  mutate {
    add_field => {
      indexPrefix => "sit-catalogue-item"
    }
  }
}

//If message consumed from any other kafka topic add index prefix "unknown"
else{
  mutate {
    add_field => {
      indexPrefix => "unknown"
    }
  }
}

Note: Update volumes in docker-compose.yml

Configure log stash pipeline
volumes:
  - 'LOCATION_OF_LOG_STASH_CONF_FILE:/usr/share/logstash/pipeline/'

Syntax for LOCATION_OF_LOG_STASH_CONF_FILE varies based on OS.

MAC/Linux OS
volumes
  - '/private/config-dir:/usr/share/logstash/pipeline/'
WINDOWS
volumes
  - '//C/Users/config-dir:/usr/share/logstash/pipeline/'

Bit insights on Docker Networks

Docker networking utilizes already existing Linux Kernel Networking features like iptables, namespaces, bridges etc. With Docker Networking, we can connect various docker images running on same host or across multiple hosts. By default, three network modes are active in Docker.

  1. Bridge
  2. Host
  3. Null

Bridge Network driver provides single host networking capabilities. By default containers connect to Bridge Network. Whenever container starts, it is provided an internal IP address. All the containers connected to the internal bridge can now communicate with one another. But they can’t communicate outside the bridge network.

# Use bridge network for all the container, keeping all the container in same network will simplify the communication between the container.
networks:
  kafkanet:
    driver: bridge

Updating host file

And the final step is to configure the links that are registered between the containers in docker-compose.yml file. As we are setting up everything on the same system, we need to resolve the links to localhost i.e 127.0.0.1

127.0.0.1 kafkaserver
127.0.0.1 elasticsearch

Putting it all together

docker-compose.yml
version: "2"
services:
  # Kafka Server & Zookeeper Docker Image
  kafkaserver:
    image: "spotify/kafka:latest" 
    container_name: kafka
    # Configures docker image to run in bridge mode network
    hostname: kafkaserver
    networks:
      - kafkanet
    # Make a port available to services outside of Docker
    ports:
      - 2181:2181
      - 9092:9092
    environment:
      ADVERTISED_HOST: kafkaserver
      ADVERTISED_PORT: 9092
  
  # Kafka Manager docker image, it is a web based tool for managing Apache Kafka.
  kafka_manager:
    image: "mzagar/kafka-manager-docker:1.3.3.4"
    container_name: kafkamanager
    #configures the kafka manager docker image to run in bridge mode network
    networks:
      - kafkanet
    # Make a port available to services outside of Docker
    ports:
      - 9000:9000
    # It Links kafka_manager container to kafkaserver container to communicate.
    links:
      - kafkaserver:kafkaserver
    environment:
      ZK_HOSTS: "kafkaserver:2181"
  
  # Elasticsearch Docker Image
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.4.0
    container_name: elasticsearch
    # Make a port available to services outside of Docker
    ports:
      - 9200:9200
      - 9300:9300
    # Configures docker image to run in bridge mode network
    networks:
      - kafkanet
  
  # Kibana Docker Image
  kibana:
    image: docker.elastic.co/kibana/kibana:6.4.0
    container_name: kibana
    # Make a port available to services outside of Docker
    ports:
      - 5601:5601
    # It Links kibana container & elasticsearch container to communicate
    links:
      - elasticsearch:elasticsearch
    # Configures docker image to run in bridge mode network
    networks:
      - kafkanet
    # You can control the order of service startup and shutdown with the depends_on option.
    depends_on: ['elasticsearch']
  
  # Logstash Docker Image
  logstash:
    image: docker.elastic.co/logstash/logstash:6.4.0
    container_name: logstash
    # It Links elasticsearch container & kafkaserver container  & logstash container to communicate
    links:
      - elasticsearch:elasticsearch
      - kafkaserver:kafkaserver
    # Configures docker image to run in bridge mode network
    networks:
      - kafkanet
    # You can control the order of service startup and shutdown with the depends_on option.
    depends_on: ['elasticsearch', 'kafkaserver']
    # Mount host volumes into docker containers to supply logstash.config file
    volumes:
      - '/private/config-dir:/usr/share/logstash/pipeline/'

# Use bridge network for all the container, keeping all the container in same network will simplify the communication between the container.
networks: 
  kafkanet:
    driver: bridge

Launching the services

Once the docker-compose.yml file is ready, open your favorite terminal in the folder which contains it and run:

Launch services
$ docker-compose up

IMG 14
This should result something as below in your console.

Fire up a new terminal & check status of the containers

Check status
$ docker-compose ps

IMG 8
This should result something as below in your console.

Play around with the provisioned services to configure Centralized Logging Capabilities

  • Access http://localhost:9200/ in browser to verify if Elasticsearch is up & running

IMG 24
Elasticsearch is UP

  • Access http://localhost:5601/ in browser to verify if Kibana is up & running

IMG 25
Kibana is UP

  • Access http://localhost:9000 in browser to manage Kafka Cluster

  • Follow series of steps to configure the cluster

IMG 15
Kafka Manager is UP

IMG 16
Click on Cluster & Add cluster

IMG 17
Configure cluster & click on `Save

IMG 18
Click on Go to cluster view

IMG 19
Click on kafka_cluster

IMG 21
Navigate to Create a Topic

IMG 22
Create topic for SIT environment - `sit.catalogue.item`

IMG 35
Create topic for UAT environment - `uat.catalogue.item`

Publishing log message to kafka topics

It’s time to start preparing messages to be published to Kafka Topics. Log messages can be defined in any format. But if they are in JSON format, we can try to capture many details so that it provides additional capabilities when viewing the entries in Kibana.

Below is the sample log message which we prepared to showcase for this article. As observed, we tried to capture many details that are needed as per the requirement.

Log Message in JSON format
{
 # Application Name  
"app":"catalogue-item",

# Client Request Id
"crid":"1BFXCD",

#Unique Id
"uuid":"ec25cdc8-a336-11ea-bb37-0242ac130002",

# Message
"msg": {"app":"catalogue-item","crid":"1BFXCD","uuid":"ec25cdc8-a336-11ea-bb37-0242ac130002","msg":"Create Catlogue PayLoad - {'sku':'CTLG-123-0001','name':'The Avengers','description':'Marvel's The Avengers Movie','category':'Movies','price':0,'inventory':0}","status":"SUCCESS","latency":"1","source":"Postman API Client","destination":"Catalogue DataBase","time_stamp":"2020-05-31T13:22:10.120Z"}
,

# Status Code - SUCCESS,ERROR,WARN,INFO
"status":"SUCCESS",

# Time that passes between a user action and the resulting response
"latency":"1",

# Sorce of message
"source":"Postman API Client",

# Destination of message
"destination":"Catalogur Database",

# Application Time Stamp
"time_stamp":"2020-05-31T13:22:10.120Z"
}

Start Kafka producer using the scripts available in the provisioned docker container. Run the below command to start the producer for the provided topic.

Login to container shell using below command
$ docker ps

# CONTAINER_NAME - kafka
$ docker exec -it <CONTAINER_NAME> /bin/bash
Start producer
$ cd opt/kafka_2.11-0.10.1.0/bin

$ ./kafka-console-producer.sh --broker-list kafkaserver:9092 --topic uat.catalogue.item

Copy and paste the below messages one by one in the console to publish the messages to the provided topic.

Sample messages
## Message published to `uat.catalogue.item` kafka topic

## SUCCESS Message
{"app":"catalogue-item","crid":"1BFXCD","uuid":"ec25cdc8-a336-11ea-bb37-0242ac130002","msg":"Create Catlogue PayLoad - {'sku':'CTLG-123-0001','name':'The Avengers','description':'Marvel's The Avengers Movie','category':'Movies','price':0,'inventory':0}","status":"SUCCESS","latency":"1","source":"Postman API Client","destination":"Catalogue DataBase","time_stamp":"2020-05-31T13:22:10.120Z"}

## WARN Message
{"app":"catalogue-item","crid":"1BFXCD","uuid":"ec25cdc8-a336-11ea-bb37-0242ac130002","msg":"Create Catlogue PayLoad - {'sku':'CTLG-123-0001','name':'The Avengers','description':'Marvel's The Avengers Movie','category':'Movies','price':0,'inventory':0}","status":"WARN","latency":"1","source":"Postman API Client","destination":"Catalogue DataBase","time_stamp":"2020-05-31T13:22:10.120Z"}

## ERROR Message
{"app":"catalogue-item","crid":"1BFXCD","uuid":"ec25cdc8-a336-11ea-bb37-0242ac130002","msg":"Create Catlogue PayLoad - {'sku':'CTLG-123-0001','name':'The Avengers','description':'Marvel's The Avengers Movie','category':'Movies','price':0,'inventory':0}","status":"ERROR","latency":"1","source":"Postman API Client","destination":"Catalogue DataBase","time_stamp":"2020-05-31T13:22:10.120Z"}

IMG 26
Messages published to UAT topic

Configuring Kibana

  • Access http://localhost:5601/ in browser to popup Kibana UI and follow the below series of steps to create Index and start Visualizing messages that are pushed to Kafka Topic.
  • Click Management on side navigation bar.
  • Click on Index Patterns

IMG 27

  • It will navigate to blow Screen

IMG 28

  • Create index pattern for sit logs

IMG 29

  • Configure Settings

IMG 30

  • If Index created sucessfully will navigate to blow screen

IMG 31

  • Click Discover on side navigation bar

  • Logs available in kibana with the index created

IMG 33

  • Visualize Message in tabular format

IMG 34

Conclusion

If you had made up to this point 😄 congratulations !!! You have successfully dockerizing the ELK stack with Kafka.

how it works

Some useful commands for reference

Docker Commands

# Builds, (re)creates, starts, and attaches to containers for a service
$ docker-compose up

# Stops containers and removes containers, networks, volumes, and images created by up
$ docker-compose down

# Lists all running containers in docker engine
$ docker ps

# List docker images
$ docker images ls

# List all docker networks
$ docker network ls

Kafka Commands

Producers

You can produce messages from standard input as follows:

$ ./kafka-console-producer.sh --broker-list kafkaserver:9092 --topic uat.catalogue.item

Consumers

You can begin a consumer from the beginning of the log as follows:

$ ./kafka-console-consumer.sh --bootstrap-server kafkaserver:9092 --topic uat.catalogue.item --from-beginning

You can consume a single message as follows:

$ ./kafka-console-consumer.sh --bootstrap-server kafkaserver:9092 --topic uat.catalogue.item  --max-messages 1

You can consume and specify a consumer group as follows:

$ ./kafka-console-consumer.sh --topic uat.catalogue.item --new-consumer --bootstrap-server kafkaserver:9092 --consumer-property group.id=elk-group
Clone the source code of the article from centralized-logging-with-kafka-and-elk-stack
author

Gyayak Doshi1 Post

Software developer

Gyayak is Emergent software developer with significant 6 years of IT experience, holding exceptional knowledge of working on multiple operating systems, Core Java, Java8, SQL, Spring,SpringBoot,RESTful API, JavaScript and different software applications. Enthusiast, skilled at applying, enhancing current technical expertise. Eager to meet challenges and quickly assimilate newest and latest technologies, skills, concepts and ideas. Articulate communicator who can fluently speak the languages of both people and technology.

  • Linkedin
  • Facebook
  • Instagram

Contents

Related Posts

Get The Best Of All Hands Delivered To Your Inbox

Subscribe to our newsletter and stay updated.