It's a new year, but the work has never stopped. I have been trying to use Docker internal service discovery for identify the hostnames or VIPs of the replicated containers of a Docker Service deployed on a Docker Swarm cluster without success.
Knowing the VIPs of the replicated container is mandatory in order to use a containers' cluster to launch automation tests. As I'm focusing on Apache Jmeter on this project, we need to pass as a paremeter a list of worker nodes (containers for us) in order to work with a Distributed Load Testing Environment.
First attempt: Docker internal DNS service-discovery
Docker uses a DNS server for keep updated registries with its containers.
The used lab was:
- Two services
- Master with only 1 replica (jd_master)
- Slaves/Workers with several replicas under the service name (jd_slave_)
- Docker compose
- Docker swarm cluster (with at least 3 machines within it for Raft consensus working)
The content of the docker-compose.yml :
version: "3.3" services: master: image: mtenrero/jmeter tty: true networks: - distributed volumes: - ./test:/test environment: - MODE=master - TEST_NAME=$TEST_NAME - REMOTES=slave expose: - 6666 depends_on: - slave links: - slave slave: image: mtenrero/jmeter tty: true networks: - distributed environment: - MODE=node expose: - 7777 - 1099 - 4445 deploy: replicas: 3 networks: distributed: driver: overlay
I tried to make a query to Docker DNS server with the jd_slave as target from jd_master container with
dig jd_slave but it only contains a auto load-balanced IP pointing to the living jd_slave containers.
dig jd_slave jd_slave. 600 IN A 10.0.0.2
So this approach doesn't fit to our needs.
Investigating Docker overlay network
Under the same scenario I made a
less /etc/hosts expecting to obtain the VIP of the container inside the overlay network.
This attempt was a success!! :
docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES bf19063d7113 mtenrero/jmeter:latest "/bin/ash /script/..." 11 hours ago Up 11 hours 1099/tcp, 4445/tcp, 7777/tcp jd_master.1.fzvsb1bh31a5po2xxm1ixko9w 1a0bdbc45513 mtenrero/jmeter:latest "/bin/ash /script/..." 38 hours ago Up 38 hours 1099/tcp, 4445/tcp, 7777/tcp jd_slave.3.8mkyudv6oi16l9fsig1ia20bc 659576c67323 mtenrero/jmeter:latest "/bin/ash /script/..." 38 hours ago Up 38 hours 1099/tcp, 4445/tcp, 7777/tcp jd_slave.1.j3rre3kpm00utiwsx7i9v2vj6 da4bab1f1dcc mtenrero/jmeter:latest "/bin/ash /script/..." 38 hours ago Up 38 hours 1099/tcp, 4445/tcp, 7777/tcp jd_slave.2.kkmupae4vp6mdlxml8at9iqt3
docker exec -ti bf1 /bin/ash
less /etc/hostname bf19063d7113
less /etc/hosts 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 10.0.0.7 bf19063d7113
Designing the external service-discovery
We do not only need the service-discovery function, we also need to know the state of the container:
- Container image name
- Container test execution status: WAITING FOR TEST / TESTING / FINISHED
- Container health checker & monitoring
- Tests watcher and coordinator based on test execution status
- Master coordinator which must give orders to docker.socket or docker API on master
Taking care of the requirements, I've decided that we need a custom service-discover that can handle all this requirements.
By default, starts Automation Test Queue in FlightController mode.
It will listen in a predefined port for a HTTP REST request which all container call to for joining to the Test Cluster
From now, Flight Controller will manage & monitorize all the containers’ lifecycle and statuses. Keeping close attention to its health in order to maintain an updated registry.
WORKER CONTAINER (Alpine)
Once the controller is up, the first script must be run is a call to the FLIGHTCONTROLLER container / host announcing its own VIP available at:
/etc/hosts —> VIP and CONTAINER NAME
VIP is reachable from any container contained in the same overlay network
Test retry policies could be applied in order to retry a test in case of failure.
Same as Worker Container but with different flag