It's a new year, but the work has never stopped. I have been trying to use Docker internal service discovery for identify the hostnames or VIPs of the replicated containers of a Docker Service deployed on a Docker Swarm cluster without success.

Knowing the VIPs of the replicated container is mandatory in order to use a containers' cluster to launch automation tests. As I'm focusing on Apache Jmeter on this project, we need to pass as a paremeter a list of worker nodes (containers for us) in order to work with a Distributed Load Testing Environment.

First attempt: Docker internal DNS service-discovery

Docker uses a DNS server for keep updated registries with its containers.

The used lab was:

  • Two services
    • Master with only 1 replica (jd_master)
    • Slaves/Workers with several replicas under the service name (jd_slave_)
  • Docker compose
  • Docker swarm cluster (with at least 3 machines within it for Raft consensus working)

The content of the docker-compose.yml :

version: "3.3"

services:
  master:
    image: mtenrero/jmeter
    tty: true
    networks: 
      - distributed
    volumes:
      - ./test:/test
    environment:
      - MODE=master
      - TEST_NAME=$TEST_NAME
      - REMOTES=slave
    expose:
      - 6666
    depends_on:
      - slave
    links:
      - slave

  slave:
    image: mtenrero/jmeter
    tty: true
    networks:
      - distributed
    environment:
      - MODE=node
    expose:
      - 7777
      - 1099
      - 4445
    deploy:
      replicas: 3

networks:
  distributed:
    driver: overlay

I tried to make a query to Docker DNS server with the jd_slave as target from jd_master container with dig jd_slave but it only contains a auto load-balanced IP pointing to the living jd_slave containers.

dig jd_slave

jd_slave.		600	IN	A	10.0.0.2

So this approach doesn't fit to our needs.

Investigating Docker overlay network

Under the same scenario I made a less /etc/hosts expecting to obtain the VIP of the container inside the overlay network.

This attempt was a success!! :

docker ps

CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                          NAMES
bf19063d7113        mtenrero/jmeter:latest   "/bin/ash /script/..."   11 hours ago        Up 11 hours         1099/tcp, 4445/tcp, 7777/tcp   jd_master.1.fzvsb1bh31a5po2xxm1ixko9w
1a0bdbc45513        mtenrero/jmeter:latest   "/bin/ash /script/..."   38 hours ago        Up 38 hours         1099/tcp, 4445/tcp, 7777/tcp   jd_slave.3.8mkyudv6oi16l9fsig1ia20bc
659576c67323        mtenrero/jmeter:latest   "/bin/ash /script/..."   38 hours ago        Up 38 hours         1099/tcp, 4445/tcp, 7777/tcp   jd_slave.1.j3rre3kpm00utiwsx7i9v2vj6
da4bab1f1dcc        mtenrero/jmeter:latest   "/bin/ash /script/..."   38 hours ago        Up 38 hours         1099/tcp, 4445/tcp, 7777/tcp   jd_slave.2.kkmupae4vp6mdlxml8at9iqt3
docker exec -ti bf1 /bin/ash
less /etc/hostname

bf19063d7113
less /etc/hosts

127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
10.0.0.7	bf19063d7113

Designing the external service-discovery

We do not only need the service-discovery function, we also need to know the state of the container:

  • Container image name
  • Container test execution status: WAITING FOR TEST / TESTING / FINISHED
  • Container health checker & monitoring
  • Tests watcher and coordinator based on test execution status
  • Master coordinator which must give orders to docker.socket or docker API on master

Taking care of the requirements, I've decided that we need a custom service-discover that can handle all this requirements.

ATQ_docker-design

FLIGHTCONTROLLER

By default, starts Automation Test Queue in FlightController mode.
It will listen in a predefined port for a HTTP REST request which all container call to for joining to the Test Cluster

From now, Flight Controller will manage & monitorize all the containers’ lifecycle and statuses. Keeping close attention to its health in order to maintain an updated registry.

WORKER CONTAINER (Alpine)

Once the controller is up, the first script must be run is a call to the FLIGHTCONTROLLER container / host announcing its own VIP available at:

/etc/hosts —> VIP and CONTAINER NAME

VIP is reachable from any container contained in the same overlay network
Test retry policies could be applied in order to retry a test in case of failure.

MASTER CONTAINER

Same as Worker Container but with different flag