A good question came in for the Kubernetes course:
“How to delete logs in ElasticSearch after certain period”?
A good one this. The questioner was aware that you can issue a CURL command to ElasticSearch, specifying the name of an index to delete, but this doesn’t feel very “kubernetes”. So how to do this in an elegant way – or failing that, a simple way?
Elastic do have a product that can do this, called “Curator”, but as always with Elastic’s products, I’m never sure on the licencing. Also, it seems a bit complex.
I reckon the questioner was on the right lines – with a bit of scripting and a Kubernetes cron job, we can achieve this easily.
For the answer you can jump to the end, where there’s some yaml for a cronjob, but I’m going to show my working in the next few steps…
1: The ElasticSearch API
We can indeed tell ElasticSearch to delete an index for a particular day. There’s a new index for each day. You can see your existing indexes on the Kibana “Manage Index Patterns” page.
For example, I have an index for a while back I’d like to delete called “logstash-2019.04.04”.
As ElasticSearch is running in my cluster, I’d need to exec into a container to be able to access it via curl. You’ll need a container with curl installed. (If you don’t have one, don’t worry, I’m only doing this step to illustrate the principle…in step 3 I’ll run this command from its own pod). Then I can do:
curl -XDELETE http://elasticsearch-logging.kube-system:9200/logstash-2019.04.04
Note my ElasticSearch service is called “elasticsearch-logging” and it is in the kube-system namespace, hence the DNS name.
2: Deleting logs from 90 days ago
Ok, but the requirement is to purge old logs. So how could we get rid of an index from “90 days ago”? Sounds like something we could do with a bit of shell scripting…
date -d"90 days ago" +"%Y.%m.%d"
This gives the output in just the format we need – for me today the output was “2019.03.28”. (Note: not all distributions have the version of “date” which support the fancy -d syntax. More on this shortly).
So now we’re getting somewhere – we can embed this into the API call:
curl -XDELETE http://elasticsearch-logging.kube-system:9200/logstash-`date -d"90 days ago" +"%Y.%m.%d"`
Ugly but as we say round my way, “handsome is as handsome does”. I don’t know what that means.
Alright so how to apply this to Kubernetes? We can now write a CronJob which triggers this command every day. It’s common to use a minimal distro for these types of jobs, so I’m using alpine here. The drawback is that the minimal distros are just that, and alpine doesn’t come with curl OR the advanced date command. So I’ve done an ugly “apk add” in the command.
The curl package is obviously “curl” but “coreutils” gives the enhanced date command.
It would be much better to build your own image from a Dockerfile but I’ll leave that as a future enhancement. For now the following should work:
apiVersion: batch/v1beta1 kind: CronJob metadata: name: cron-job spec: # CronJob schedule: "0 0 * * *" jobTemplate: spec: # JOB template: spec: # Pod containers: - image: alpine command: ["/bin/sh","-c"] args: ["apk add curl && apk add coreutils && curl -XDELETE http://elasticsearch-logging.kube-system:9200/logstash-`date -d'90 days ago' +'%Y.%m.%d'`"] name: logging-purger restartPolicy: Never backoffLimit: 2
This will run at midnight each day and delete the index from 90 days ago.
If you’re not familiar with CronJobs in Kubenetes, you’ve obviously not bought my enhanced, all-singing-and-dancing Kubernetes course! Check out a preview below and then run to here where you can get the full 20+ hour course with a discount code!