Apache Flink

Apache Flink

Application and Data / Data Stores / Big Data Tools
Needs advice
on
Apache FlinkApache Flink
and
Kafka StreamsKafka Streams

We currently have 2 Kafka Streams topics that have records coming in continuously. We're looking into joining the 2 streams based on a key with a window of 5 minutes based on their timestamp.

Should I consider kStream - kStream join or Apache Flink window joins? Or is there any other better way to achieve this?

READ MORE
4 upvotes·198K views
Replies (2)
Recommends
on
Apache Flink

Hi !

I would say it depends on three things:

  • the volume (number of events by seconds)
  • the versatility you need.
  • the vendor lock on

If volume is an issue then maybe flink may be a better option. Flink is distributed and is architectured as such.

If you are ok with having everything based on Kafka then maybe Kafka streams is a better option, but if at some point you need to integrate with several systems maybe you should consider flink.

If you are okay with sticking with confluent tools it’s okay, but beware of technology. Flink is more generalist than Kafka streams.

READ MORE
5 upvotes·2 comments·3.6K views
Eran Levy
Eran Levy
·
January 12th 2022 at 4:58AM

Hi, If you are already working with Kafka Streams, I suggest to continue with that otherwise there is capability that not exists there or not fitting your needs..

·
Reply
S C
S C
·
January 12th 2022 at 6:32PM

Thank you for your input. We're talking ~ 6mil records per minute so a significant amount of data. We also want to join records within a small window and push them to a different topic. We're okay with something out of confluent so looks like Fink is a better option here.

·
Reply
Team Lead at Extenda Retail AS·
Recommends
on
Kafka Streams

Hi I would recommend to continue with Kafka tools. Kafka Streams integrates seamlessly with Kafka, and you also get the benefits of automatic schema management if you choose to use Schema Registry. I see that some recommend Flink with high volumes, but Kafka Streams is perfectly capable of doing high volumes and windowed operations as well. I guess the vendor lock in, that is also mentioned, is probably done equally on the messaging side as with your processing application.

Further details can be found here: https://www.confluent.io/blog/apache-flink-apache-kafka-streams-comparison-guideline-users/

Good luck with your choice!

READ MORE
4 upvotes·328 views
Needs advice
on
Apache FlinkApache Flink
and
KafkaKafka

Please tell me why you still choose Kafka after using both modules. Also, Apache Flink is faster then Kafka, isn't it?

READ MORE
2 upvotes·15.1K views

I have to build a data processing application with an Apache Beam stack and Apache Flink runner on an Amazon EMR cluster. I saw some instability with the process and EMR clusters that keep going down. Here, the Apache Beam application gets inputs from Kafka and sends the accumulative data streams to another Kafka topic. Any advice on how to make the process more stable?

READ MORE
6 upvotes·2.8M views
Replies (1)
Ingeniero de software, bigdata architect ·

So, you are using Apache Beam and Apache Flink to read from an input kafka topic, apply some transformations to the input and then write to another output kafka topic? it looks like that this is a solution for kafka-streams framework, isn't?. if the process is not very stable, it is probably because you don't have the right amount of memory for these processes, or you don't have enough dedicated cores for it.

Investigate using the Confluent platform's control-center tool, look at logs, examine process exceptions, focus on caused by.

Unless you have a great need to use Apache Flink's supposedly better real-time data streaming capabilities, stick with kafka-streams to do that task. Then look into doing the same with Beam and Flink, but when you have it, you can measure if you really have a big performance improvement when reading and writing to kafka topics. I honestly doubt it.

READ MORE
4 upvotes·3.5K views
Technical Architect at Pepcus·
Needs advice
on
Apache FlinkApache Flink
and
KafkaKafka

I need to build the Alert & Notification framework with the use of a scheduled program. We will analyze the events from the database table and filter events that are falling under a day timespan and send these event messages over email. Currently, we are using Kafka Pub/Sub for messaging. The customer wants us to move on Apache Flink, I am trying to understand how Apache Flink could be fit better for us.

READ MORE
7 upvotes·717.7K views
Replies (1)
Recommends
on
Apache Flink

I recommend Apache Flink because it is the pro tool for everybody who has a serious stream processing use case. Flink is used by huge companies, such as Uber, Alibaba or Netflix. AWS is offering Flink as a hosted service. The reason for these companies to decide for Flink are manyfold: Flink offers great performance, support for very large state, exactly-once processing semantics, different APIs (with SQL growing a lot lately), ... Flink supports a many different deployment models, including Kubernetes, Hadoop YARN or custom deployments.

The drawbacks of Apache Flink are medium steep learning curve, and plenty of options (APIs, deployment models, state backends, ...)

These are my personal views, and I have a bias towards Flink, because I've worked a lot on it:

Flink and Kafka (the message bus) work together very well, and that's also the most popular combination (I'm guessing). There's also Kafka Streams, a stream processing library using Kafka (the message bus) as a data transport layer. Some considerations of Kafka Streams vs Flink:

  • KStreams has a hard dependency on Kafka, Flink is independent of the message bus, and can easily read and write to many systems (KStreams requires Kafka connect for that)
  • Since KStreams is doing data exchange via kafka topics, there's a lot of load on the Kafka cluster (size it appropriately). Monitoring becomes difficult as processing and data storage are in the same cluster. Do you really want your production data being discarded because your processing is eating up all your IO?
  • Flink is the older project, it has been battle tested for many years across a lot of different scenarios. There's more libraries, such as a CEP (Complex Event Processing) library and more and more machine learning integrations.
READ MORE
4 upvotes·1 comment·10.6K views
Nicholas Nezis
Nicholas Nezis
·
July 1st 2020 at 4:14AM

How does Flink compare with Apache Heron?

·
Reply