how to stream from kafka to cassandra and increment counters

I have apache access log file and i want to store access counts (total/daily/hourly) of each page in a cassandra table.

I am trying to do it by using kafka connect to stream from log file to a kafka topic. In order to increment metrics counters in Cassandra can I use Kafka Connect again? Otherwise which other tool should be used here e.g. kafka streams, spark, flink, kafka connect etc?

1 answer

  • answered 2017-10-11 10:05 Robin Moffatt

    You're talking about doing stream processing, which Kafka can do - either with Kafka Streams, or KSQL. KSQL runs on top of Kafka Streams, and gives you a very simple way to build the kind of aggregations that you're talking about.

    Here's an example of doing aggregations of streams of data in KSQL

    SELECT PAGE_ID,COUNT(*) FROM PAGE_CLICKS WINDOW TUMBLING (SIZE 1 HOUR) GROUP BY PAGE_ID
    

    See more at : https://www.confluent.io/blog/using-ksql-to-analyse-query-and-transform-data-in-kafka

    You can take the output of KSQL which is actually just a Kafka topic, and stream that through Kafka Connect e.g. to Elasticsearch, Cassandra, and so on.

    You mention other stream processing tools, they're valid too - depends in part on existing skills and language preferences (e.g. Kafka Streams is Java library, KSQL is … KSQL, Spark Streaming has Python as well as Java, etc), but also deployment preferences. Kafka Streams is just a Java library to deploy within your existing application. KSQL is deployable in a cluster, and so on.