#Apache NiFi

Hi, I hope this helps!

After a Controller Service has been configured, it must be enabled in order to run. Do this using the "Enable" button (Enable Button) in the far-right column of the Controller Services tab. In order to modify an existing/running controller service, the DFM needs to stop/disable it (as well as all referencing reporting tasks and controller services). Do this using the "Disable" button (Disable Button). Rather than having to hunt down each component that is referenced by that controller service, the DFM has the ability to stop/disable them when disabling the controller service in question. When enabling a controller service, the DFM has the option to either start/enable the controller service and all referencing components or start/enable only the controller service itself.


The first check I intended to do was the comparison of data for the mentioned customer between our source database and Apache Kafka, but then realized this is not possible since the retention time for our Kafka topics are only two days and the mentioned customer data is older. So I had to test other customers. And there I witnessed missing data for a few customers and my assumption was true, the way NIFI reads the data ("read committed") and tries to prevent dirty reads we should not see missing data, but as the source database commits both the creation of new record (creation of new customer ID) and optional data (birthday of a customer) together - and "reading" or selecting that data is only possible AFTER BOTH are committed therefore we lose some records where the customer still has not entered the birthdate which is a mandatory registration field and takes a few seconds after the initial registration.

Even though my team aim to provide real-time data but the solution here was to add a 500 milliseconds delay which was tested which is covers the difference between the registration time of customer and updating time of its record to add the the birthdate.

Missing Data while reading data from database

Our team uses Apache NiFi to offload data from source database transform it and send it to our stakeholders with Apache Kafka. Recently Swiss Regulators as one of our stakeholders claimed that there were missing records for some customers.

In order to automate this manual task a python script has been written which uses the NiPyApi package to connect to NiFi via REST API and get the list of all controller services and enable them just in a few seconds instead of manually enabling them one by one which takes hours.

Automate the enabling/disabling of Apache NiFi controller services

In our company we are using cloudera manager to manage our CDH (Cloudera Data Hub) cluster and to automate cluster operations. We recently had to upgrade our CDH to CDP (Cloudera Data Platform), since we are using Apache NiFi to control our real-time data, every time we had to restart NiFI during the upgrade all the controller services got disabled and after the restart we had to enable them manually which is very time consuming due to vast number of controller services.
Subscribe to #Apache NiFi