08.05.2018

Data retention for Clickhouse persistent data storage

Please use this script only if you use per day partitions (default after version 2.0.87). If you installed FastNetMon before, please raise ticket to support team.

Clickhouse does not have bundled capability to remove old data but you can use CRON script to implement this task.

Please install Clickhouse library for Python:

sudo apt-get install python-pip
sudo pip install clickhouse-driver

Then, please create following script in /usr/local/bin/clickhouse_retention.py (please specify required number of days in value max_days):

#!/usr/bin/python

import time
import datetime
import clickhouse_driver

# With this configuration this script will remove all partitions older than max_days
max_days = 7

client = clickhouse_driver.Client('127.0.0.1')

client.execute('USE fastnetmon');

tables = client.execute('SHOW TABLES')

now = datetime.datetime.now()

for table in tables:
    print "Process table:", table[0]

    # Select all active partitions for this table
    partitions = client.execute("SELECT DISTINCT partition, table FROM system.parts where active and table='" + table[0]  +"'")

    for partition in partitions:
        partition_name = partition[0].replace("'", "", -1)

        if partition_name == "0000-00-00":
            continue

        # Parse date in format '2018-05-08'
        partition_date = datetime.datetime.strptime(partition_name, "%Y-%m-%d") 

        delta = now - partition_date
        if delta.days > max_days:
            print "I will remove", partition_name

            partition_delete_command = "ALTER TABLE fastnetmon." + table[0] + " DROP PARTITION '" + partition_name + "'"
            client.execute(partition_delete_command)

Set executable bit for script:

chmod +x /usr/local/bin/clickhouse_retention.py

Finally, please add following cron entry in file /etc/cron.d/clickhouse_data_cleanup:

SHELL=/bin/sh PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
0 2 * * * root /usr/local/bin/clickhouse_retention.py

This script will remove all data older than max_days at 2 AM.