Attending meetups is one of the best parts of working at Bytemark. Companies and individuals come together to share experiences, ideas and new technologies, inspiring us to learn more.
One of the most exciting technologies to emerge in recent years is Kubernetes – and with over 100 attendees filling the room at the first ever Kubernetes Manchester meetup, many people agree!
The meetup was separated into three talks:
- Kubernetes 101 – Dave Lund, Booking.com
- MoneySuperKubernetes: Navigating K8s at MoneySuperMarket – Jim Davies & David Stockton, MoneySuperMarket
- Scaling Prometheus on Kubernetes – Tom Riley, Booking.com
Below are my notes from the talks, with some additional detail added for readability.
Kubernetes is fast becoming a standard for deploying and managing Cloud Native applications, due to the DevOps benefits it provides:
- Self Healing Capabilities: Containerised ephemeral applications can be replaced and restarted automatically upon failure.
- Automated Deployment: Kubernetes provides workflows for rolling out updates – and rolling back effortless if required, restoring applications to their previous state
- Scalability: Kubernetes can scale applications based on load, memory usage or other factors.
- Service Discovery: Kubernetes provides a networking layer between containers, allowing dependent containers to automatically rediscover each other upon reboot or failure.
Yet, with all this functionality comes a learning curve. Dave Lund’s talk focused on the basic functionality of Kubernetes and what vocabulary you’ll need to know to get started.
- Nodes – Nodes are the physical hardware that forms your cluster. A Node is likely to be a virtual machine hosted by a Cloud Provider or a physical machine in a data centre. It is simpler to think of nodes as the CPU/RAM resources to be used by your Kubernetes cluster, rather than unique devices.
- Cluster – Think about the ‘Borg’ from Star Trek. Nodes join together to pool their resources to make a more powerful cluster. Kubernetes moves work around the cluster when nodes are added or removed.
- Persistent Volumes – Things that run on your cluster will not run on the same node, all the time, so they need somewhere to store data, typically mounted to the whole cluster.
- Containers – Programs that run on Kubernetes are called ‘Containers’. Everything needed to run the program is in the container file.
- Pods – A ‘pod’ is a family of whales. Whales are the mascot of Docker, a way of running and creating containers. So in Kubernetes language, a ‘Pod’ is a family of containers. Any containers in the pod share resources and a network and can communicate with each other – even if they are on separate nodes.
- Deployment – A deployment defines the state of your cluster – for example, how many replicas of a pod should be running. When the deployment is added to a cluster, Kubernetes will automatically make the correct number of pods and monitor them. Should a pod fail, Kubernetes will re-create it, following the ‘deployment’ criteria.
- Ingress – The ‘door’ to your Kubernetes cluster, to allow external traffic in. For example, you could have a web ingress, allowing port 80 traffic to your application. This can be handled be an Ingress controller or a LoadBalancer.
- Probes – Probes check if the container is alive and functioning. This enables exciting DevOps improvements – such as zero downtime deploys, preventing broken deploys and self-recovering containers.
- Operators – Operators are deployed to a Kubernetes cluster and enable you to write an application to fully manage another. An operator can monitor, change pods/services, scale the cluster up/down and even call endpoints in running applications. For example, an operator could detect if a pod is running at above 90% CPU and provision more resource automatically to keep it running.
- StatefulSets – Stateless applications make Kubernetes a lot easier. But with StatefulSets in Kubernetes 1.5, you can now assign a pod a number, and assign resources to that number – such as volumes, network IDs and other indexes. So if a pod fails, it’ll be restored with the same data it had previously.
MoneySuperKubernetes: Navigating K8s at MoneySuperMarket
Jim & David focused on the operational side of using Kubernetes in production. Here are some of the main take away points.
The key benefit of Kubernetes are:
- Enabling product teams to go faster – but also safer. If a cluster can auto-restore from error, moving quickly can no longer break things.
- Developers can deploy on day one to production with little risk.
- Operational changes (such as domains and DNS) can be done via a pull request (GitOps) – there are no politics or conversation required to get code to production
- Disaster recovery is an everyday occurrence. MoneySuperMarket destroys cluster resources when not in use to save money – so all their applications can recover easily by design. Kubernetes enables this.
- They use Istio as a service mesh which provides many benefits, such as controlling traffic flows (examples: infrastructure level API monitoring, limiting concurrent connections is definable by the development team, monitoring external service provider performance against SLAs, prohibiting external access to internal services…)
- Security scanning by kalr -> clair -> clair-db of images on build and on schedule. Some vulnerabilities are not relevant but are continually flagged, so a whitelist of ‘ignored vulns’ is kept at a company-wide level
- Kubernetes can be tied into your SSO provider, so employees joining/leaving or moving squads can automatically update their permissions.
- Horiztonal Pod Auto-Scaler: If a Prometheus (monitoring solution) metric exceed X, scale the pod up. Can also be scaled by business metrics, for example, if average user search time exceeds 0.5s, then scale up the search API pod/s.
- Cluster Auto-Scaler: If available nodes do not have enough capacity to scale, get more nodes from Cloud Provider.
Scaling Prometheus on Kubernetes
Thomas covered the specifics of scaling Prometheus on Kubernetes with Thanos and has kindly shared his slides.
- What is Prometheus? A metrics monitoring solution for Kubernetes, released by SoundCloud in 2012. It is an essential part of any Kubernetes Cluster with scale aspirations.
- How do I deploy Prometheus? Use the Prometheus Operator and define Custom Resource Definitions to specific how services should be monitored and how Prometheus should scale
- How do I HA Prometheus? Traditional scaling doesn’t work (across load balancers or multiple instances) as it leads to inconsistent logging and metrics. See slides 29 to 33 for complete details. Federating via Thanos provides HA (slide 47).
- How do I get a single pane of glass for Prometheus? Federate and add Grafana to the main Promthesus instance.
- How do we retain Prometheus data for days/months/years? Prometheus was designed for short-term usage but does have remote storage APIs for connecting to third-party storage. Thanos was the solution – detailed in slides 40 – 52
- What cultural impact does this have? Development teams can observe the performance of their application and therefore work to improve it as part of the application development lifecycle.
We’re looking forward to more Kubernetes meetups in the future. If you’re looking for other events to attend, check out our Top 10 list of Cloud Native Events in 2019.
There’s also some exciting news that Bytemark will be launching a fully managed Kubernetes service later this year! Join the waitlist now to receive a £1,000 free credit.