how to stream from kafka to cassandra and increment counters

I have apache access log file and i want to store access counts (total/daily/hourly) of each page in a cassandra table.

I am trying to do it by using kafka connect to stream from log file to a kafka topic. In order to increment metrics counters in Cassandra can I use Kafka Connect again? Otherwise which other tool should be used here e.g. kafka streams, spark, flink, kafka connect etc?

  • answered 2017-10-11 10:05 Robin Moffatt

    You're talking about doing stream processing, which Kafka can do - either with Kafka Streams, or KSQL. KSQL runs on top of Kafka Streams, and gives you a very simple way to build the kind of aggregations that you're talking about.

    Here's an example of doing aggregations of streams of data in KSQL


    You can take the output of KSQL which is actually just a Kafka topic, and stream that through Kafka Connect e.g. to Elasticsearch, Cassandra, and so on.

    You mention other stream processing tools, they're valid too - depends in part on existing skills and language preferences (e.g. Kafka Streams is Java library, KSQL is … KSQL, Spark Streaming has Python as well as Java, etc), but also deployment preferences. Kafka Streams is just a Java library to deploy within your existing application. KSQL is deployable in a cluster, and so on.