🌀 Apache Nifi - Full Tutorial 2023

Welcome to the Apache NiFi course for beginners! In this comprehensive guide, we will take you through all the basics of Apache NiFi – a powerful, open-source data integration tool. This course is designed for people with little or no experience with Apache NiFi, and by the end of it, you will have a solid foundation of the tool’s concepts, architecture, and components. Let’s get started!

If you prefer to know how to use SparQL or get insight about a how to create a full application in which Apache Nifi is used, feel free to read our full tutorial on Apache Nifi, Kafka, and Spark M.

Introduction to Apache NiFi

Apache NiFi is an easy-to-use, scalable, and flexible data integration tool that helps you automate the flow of data between systems. With its powerful GUI and numerous built-in processors, Apache NiFi simplifies the process of moving, manipulating, and transforming data. Its core design principles include:

Web-based user interface
Flow-based programming
Data provenance and traceability
Scalability and extensibility
Fault tolerance and high availability

Apache NiFi can act as a powerful data source, allowing you to easily ingest and transform large volumes of data from various sources. With its scalable architecture and numerous built-in processors, NiFi simplifies the process of moving, manipulating, and transforming data from sources such as log files, databases, and messaging systems. The data sources can be:

SQL: PostgreSQL, Oracle, MySql
NoSQL: MongoDB, Couchbase
Search Engine: Elastic Search, Solr
Cache Server: Redis, HBase
Messaging Queue: Kafka
Aws entities: S3

Installing Apache NiFi on Mac and Linux

Before we dive into the details of Apache NiFi, let’s first install it on your machine. Follow these simple steps:

Download the latest version of Apache NiFi from the official website.
Extract the downloaded archive to a folder of your choice.
Navigate to the extracted folder and run the nifi.sh or nifi.bat file, depending on your operating system.
Wait for the NiFi web server to start, and then open your browser and go to http://localhost:8080/nifi.

Installing Apache NiFi on Windows 10

Before you start the installation, make sure you meet the following prerequisites:

Install JDK 8.0 64 bit:
1. Apache NiFi requires a 64-bit version of JDK 8.0 to be installed on your system. You can download the latest version of JDK 8.0 64 bit from the official Oracle website.
2. During the installation, make sure to select the 64-bit version of the JDK.
Set the JAVA_HOME environment variable:
1. After installing JDK 8.0, you need to set the JAVA_HOME environment variable to point to the installation directory.
2. It is recommended to install Java to C:/java instead of C:/Program Files to avoid read-only restrictions.
3. You can set the JAVA_HOME variable by following these steps:
  1. Open the Start menu and search for “Environment Variables”.
  2. Click on “Edit the system environment variables”.
  3. Click on “Environment Variables”.
  4. Under “System Variables”, click on “New”.
  5. Enter “JAVA_HOME” as the variable name.
  6. Enter the path to your JDK 8.0 installation directory as the variable value. For example, C:\java\jdk1.8.0_311.
Ensure that your system meets the minimum memory requirement for Windows: Apache NiFi requires at least 4GB of RAM to run properly on Windows.
Now, start installing Apache NiFi on your Windows 10:
Download the NiFi installer:
1. You can download the NiFi installer from the official Apache NiFi website.
2. Go to the download page and select the Windows version of NiFi.
3. You will be prompted to enter your email address before you can download the installer.
Extract the NiFi files from the installer:
1. After downloading the installer, extract the NiFi files to a location from where you want to run the application (Right-click on the installer and select “Extract All”).
2. Be sure to choose a location where you have write permissions and where you want to install NiFi.
Run the NiFi installer:
1. Navigate to the extracted folder and run the NiFi installer.
2. You will see a welcome screen, which you can click through to proceed.
Accept the license agreement and select the installation location:
1. It is recommended to install NiFi to C:/nifi or some root folder. Choose the location where you want to install NiFi and click “Next”.
Complete the installation wizard: Follow the installation wizard and click “Install” to start the installation process. The installer will copy the NiFi files to the selected location.
Launch Apache NiFi:
1. After the installation is complete, launch Apache NiFi by double-clicking on the nifi shortcut on your desktop, or by navigating to the installation directory and running the nifi.bat file in the “bin” directory. This will start the NiFi application.
Wait for a few moments until Apache NiFi is up and running: It may take a few moments for Apache NiFi to start up completely. You can check the status of NiFi by looking at the command prompt or the log files. Once NiFi is up and running, you will see a message indicating that the web UI is available at http://localhost:8080/nifi.
Access the NiFi UI: Open a web browser and navigate to http://localhost:8080/nifi. You should see the Apache NiFi UI, which allows you

Apache NiFi Components

Now that we have Apache NiFi up and running, let’s take a look at its core components.

Processor

The Processor is the fundamental unit of work in Apache NiFi. It is responsible for reading data from an input source, performing some operation on the data, and then passing the data to an output destination. A Processor can be thought of as a small, reusable program that performs a specific data transformation or manipulation. Apache Nifi has 280+ processors.

FlowFile

The FlowFile is a data unit that represents a piece of data flowing through the Apache NiFi system. It contains the actual data payload as well as metadata, such as its origin, size, and timestamp. FlowFiles are created by Processors and can be passed along to other Processors or written to an output destination.

Examples of FlowFile: XML, PlainText, CSV, SQL

Connection

The Connection represents the link between two Processors in the Apache NiFi data flow. It is used to pass FlowFiles from one Processor to another, and it can also be used to specify data routing and prioritization.

Controller Service

The Controller Service is a shared service that can be used by multiple Processors in the Apache NiFi system. It provides functionality such as data encryption, compression, and database connections. By using Controller Services, you can avoid duplicating code and configuration across multiple Processors.

Template

A Template is a pre-built data flow that can be reused across multiple projects or environments. It can be thought of as a blueprint for a data flow, containing one or more Processors, Connections, and Controller Services.

Building a Simple Data Flow

Now that we have a basic understanding of the core components of Apache NiFi, let’s build a simple data flow.

Step 1: Add a Processor

The first step is to add a Processor to read data from an input source. To do this, click on the “Add Processor” button on the left-hand side of the NiFi UI and select the “GetFile” Processor from the list. This Processor reads data from a file on your local file system.

In Apache NiFi, a Processor is a fundamental building block that performs a specific task, such as transforming data, routing data, or interacting with external systems. In this step, we’ll add a Processor to our NiFi flow.

Open the NiFi user interface by navigating to http://localhost:8080/nifi in your web browser.
From the NiFi toolbar, click the “Operate” button to switch to the Operate view.
From the Operate view, click the “Add Processor” button on the left-hand side of the screen.
In the “Add Processor” dialog box, you can choose from a variety of pre-built processors. For this example, we’ll choose the “LogAttribute” processor, which logs the attributes of incoming flow files.
Click the “Add” button to add the selected processor to the canvas.
Connect the processor to the flow by clicking and dragging the arrow on the right-hand side of the processor to the desired downstream processor or output port.
Configure the processor by clicking on it and selecting the “Properties” tab. Here, you can configure the processor’s settings, such as the log level or the attribute name to log.
Start the processor by right-clicking on it and selecting “Start” from the context menu.

Congratulations, you’ve successfully added a Processor to your NiFi flow! Keep in mind that there are many other types of processors available in NiFi, each with their own unique functionality.

Step 2: Configure the Processor

Next, we need to configure the “GetFile” Processor. Click on the Processor to open its configuration window. Here, you can specify the input file location and other settings, such as how often the Processor should check for new files. Once you’ve configured the Processor, click “Apply” to save the changes.

In the NiFi UI, click on the canvas to add a new processor.
Drag the desired processor from the Processor Group to the canvas.
Click on the processor to open its configuration dialog. The configuration dialog for each processor will be different depending on the processor chosen.
Configure the processor properties according to your use case. This may involve specifying input and output ports, setting filters or transformations, and defining custom properties.
Save the configuration by clicking the Apply button at the bottom of the configuration dialog.
Once the processor is properly configured, you can connect it to other processors or output streams as necessary using NiFi’s intuitive drag-and-drop interface.

As an example, let’s consider configuring the PutFile processor, which allows you to write data to a file on your local file system. The configuration dialog for the PutFile processor will include several properties, such as the destination directory and file name, as well as options for handling conflicts if the file already exists. You would configure these properties according to your specific use case, for example specifying a directory path and file name that matches the data you are processing. Once you have saved the configuration, you can connect the PutFile processor to other processors or output streams as necessary to complete your data flow.

Step 3: Add a Processor to Process the Data

Now that we have a Processor to read data from an input source, we need to add another Processor to process the data. Let’s add a Processor to split the data into individual lines. To do this, click on the “Add Processor” button and select the “SplitText” Processor.

In the NiFi UI, click on the canvas to add a new processor.
Drag the desired processor from the Processor Group to the canvas. In this case, we want to split the data into individual lines, so we will choose the “SplitText” processor from the list of available processors.
Click on the “SplitText” processor to open its configuration dialog.
In the configuration dialog, you can configure the properties for the “SplitText” processor. In this case, we want to split the data into individual lines, so we will set the “Line Separator” property to the appropriate character or string that separates each line of data.
Save the configuration by clicking the Apply button at the bottom of the configuration dialog.
Once the “SplitText” processor is properly configured, you can connect it to the “GetFile” processor by dragging the arrow from the output port of the “GetFile” processor to the input port of the “SplitText” processor.
The “SplitText” processor will now split the data into individual lines and pass them along to the next processor in the data flow.

As an example, let’s say we have a file containing a list of customer orders, with each order on a separate line. We want to process each order individually, so we need to split the data into individual lines. To do this, we would add the “SplitText” processor to our data flow and configure it to split the data based on the line separator character. Once the “SplitText” processor is properly configured, we can connect it to the “GetFile” processor and the data will flow from the input file, through the “SplitText” processor, and on to the next processor in the data flow.

Step 4: Configure the Processor

Next, we need to configure the “SplitText” Processor. Click on the Processor to open its configuration window. Here, you can specify the delimiter that the Processor should use to split the data into individual lines. Once you’ve configured the Processor, click “Apply” to save the changes.

In the NiFi UI, locate the SplitText processor that you added in Step 3.
Click on the SplitText processor to open its configuration window.
In the configuration window, you will see several properties that you can modify, such as the delimiter, the maximum number of lines to split, and the character set encoding.
The “Delimiter” property specifies the character or string that the processor will use to split the input data into individual lines. For example, if your input data is a CSV file with comma-separated values, you could set the delimiter to “,” to split the file into individual rows.
You can also use regular expressions to define the delimiter. For instance, you could set the delimiter property to “(\r\n|\r|\n)” to split the input data on any line break character, regardless of the platform.
Once you have configured the processor properties, click “Apply” to save the changes.

As an example, let’s say that we have a text file containing a list of names, one per line, and we want to split this file into individual records using the line break character as the delimiter. To do this, we would configure the SplitText processor as follows:

In the configuration window for the SplitText processor, set the “Delimiter” property to “(\r\n|\r|\n)” to split the input data on any line break character.
Save the configuration changes by clicking “Apply”.

Once you have configured the SplitText processor, you can connect it to other processors or output streams as necessary to complete your data flow. For example, you might connect the SplitText processor to a ConvertRecord processor to convert the text data into a different format, or to a PutDatabaseRecord processor to insert the data into a database table.

Step 5: Connect the Processors

Now that we have two Processors, we need to connect them to create a data flow. To do this, click on the “Add Connection” button on the right-hand side of the “GetFile” Processor and drag the connection to the “SplitText” Processor. This will create a Connection between the two Processors.

In the NiFi UI, locate the two processors that you want to connect. In this case, we have the “GetFile” processor and the “SplitText” processor.
Click on the “GetFile” processor to select it, and then click on the “Add Connection” button on the right-hand side of the processor.
Once you’ve clicked on the “Add Connection” button, you will see that your mouse pointer has turned into a connection arrow. Now click and drag the connection arrow to the “SplitText” processor to connect the two processors. You should see a line appear between the two processors indicating that they are now connected.
Depending on the type of processor, you may need to configure the relationship between the processors. In the case of the “GetFile” processor, you will need to configure the relationship between the processor and the connection. Click on the “GetFile” processor to open its configuration dialog, and then click on the “Settings” tab. From there, select the appropriate relationship that you want to use for the connection. For example, you may want to use the “success” relationship if the file is successfully read by the “GetFile” processor.
Once you have connected the two processors and configured the relationship, you can continue building out your data flow by connecting additional processors or output streams as necessary.

As an example, let’s consider a use case where we are using the “GetFile” processor to read a CSV file containing customer data, and then using the “SplitText” processor to split the data into individual records. We would connect the “GetFile” processor to the “SplitText” processor by clicking on the “Add Connection” button on the “GetFile” processor, dragging the connection arrow to the “SplitText” processor, and configuring the “success” relationship in the “GetFile” processor’s configuration dialog. This would create a data flow that reads the CSV file and splits it into individual customer records, which we could then process and write to an output stream using additional processors.

Step 6: Add a Processor to Write the Data

Finally, we need to add a Processor to write the processed data to an output destination. Let’s add a Processor to write the data to a file. To do this, click on the “Add Processor” button and select the “PutFile” Processor.

In this step, we will add a processor to write the processed data to an output destination. The PutFile processor is an appropriate choice for this purpose, as it writes data to a file on the local file system. Here’s how to add a PutFile processor in NiFi:

Click on the “Add Processor” button in the toolbar.
In the “Add Processor” dialog, select the “PutFile” processor from the list of available processors.
Click the “Add” button to add the processor to the canvas.
Connect the output of the processor that you previously configured in Step 5 to the input of the PutFile processor.
Configure the PutFile processor to specify the destination directory and file name for the output data. You can also specify options for handling conflicts if the file already exists.
Save the configuration by clicking the “Apply” button.
Start the data flow by clicking the “Start” button in the toolbar.

As an example, let’s say we want to write the processed data to a file called “output.txt” in the “C:\output” directory. We would configure the PutFile processor as follows:

In the NiFi UI, click on the PutFile processor to open its configuration dialog.
In the “General” tab, set the “Directory” property to “C:\output”.
In the “Properties” tab, set the “Filename” property to “output.txt”.
Optionally, configure the “Conflict Resolution” options to specify what should happen if a file with the same name already exists.
Save the configuration by clicking the “Apply” button.

Once the PutFile processor is properly configured, we can connect it to the previous processor in the data flow, which is responsible for processing the data. The output of that processor will be sent to the input of the PutFile processor, which will write the data to the specified file in the specified directory.

Step 7: Configure the Processor

Next, we need to configure the “PutFile” Processor. Click on the Processor to open its configuration window. Here, you can specify the output file location and other settings, such as whether to append the data to an existing file or create a new file. Once you’ve configured the Processor, click “Apply” to save the changes.

Click on the “PutFile” Processor that you added to the canvas in Step 6 to open its configuration window.
In the configuration window, you will see several properties that you can configure, including the destination directory and file name, as well as options for handling conflicts if the file already exists.
To specify the output file location, enter the file path in the “Directory” property. For example, you might enter “C:/output” to specify a directory on your local file system.
Next, specify the file name by entering it in the “Filename” property. For example, you might enter “output.txt” to specify a file name for the output data.
If you want to append the data to an existing file, set the “Append to an existing file” property to “true”. Otherwise, leave it as “false” to create a new file each time the Processor is triggered.
Once you’ve configured the Processor properties to your liking, click the “Apply” button to save the changes.
You can now connect the “PutFile” Processor to other Processors or output streams as necessary to complete your data flow. For example, you might connect it to a “LogAttribute” Processor to log the data attributes before writing the data to the output file.

Step 8: Connect the Processors

Now that we have a Processor to write the data, we need to connect it to the “SplitText” Processor. To do this, click on the “Add Connection” button on the right-hand side of the “SplitText” Processor and drag the connection to the “PutFile” Processor. This will create a Connection between the two Processors.

In the NiFi UI, click on the “SplitText” processor to select it.
On the right-hand side of the “SplitText” processor, click on the “Add Connection” button. This will open the “Add Connection” dialog.
In the “Add Connection” dialog, select the output port of the “SplitText” processor that you want to connect.
Drag the connection to the input port of the “PutFile” processor.
Release the mouse button to create the connection.

Once the connection is created, data will flow from the “SplitText” processor to the “PutFile” processor according to the flow defined in the data flow diagram. In this case, the “SplitText” processor will split the input text into lines, and the resulting lines will be written to individual files by the “PutFile” processor.

For example, let’s say that we want to split a log file containing multiple log entries into separate files, with each file containing a single log entry. We would first use the “SplitText” processor to split the log file into individual lines, then use the “PutFile” processor to write each line to a separate file. We would connect the output port of the “SplitText” processor to the input port of the “PutFile” processor to create a connection between the two processors. This would allow data to flow from the “SplitText” processor to the “PutFile” processor, with each line being written to a separate file as specified in the processor configuration.

Step 9: Start the Data Flow

Now that we have all the components in place and connected, we can start the data flow by clicking on the “Start” button on the top-right corner of the NiFi UI. This will start the data flow, and you should see data flowing through the Processors in real-time.

Congratulations! You have just built a simple data flow in Apache NiFi. This is just the tip of the iceberg when it comes to the capabilities of Apache NiFi, and we encourage you to explore the tool further.

After configuring the processors and connecting them to create your data flow, it’s time to start the flow and see the data moving through the processors.

To start the data flow, simply click on the “Start” button located on the top-right corner of the NiFi UI. Once you click on the “Start” button, the data flow will start, and you should see data moving through the processors in real-time.

For example, if you are processing data from an input source like a CSV file, you should see the data being ingested by the input processor, processed by any additional processors in your flow, and then written to an output destination like a database or another file. You can monitor the progress of the data flow by viewing the status of each processor in the NiFi UI, and you can pause or stop the flow at any time using the corresponding buttons in the UI.

Congratulations! You have just built a simple data flow in Apache NiFi. While this tutorial only scratches the surface of what you can do with NiFi, it should give you a good starting point for building your own data flows. As you become more familiar with the tool, you can explore additional features and configurations to create more complex data flows tailored to your specific needs.

Templates

You can create, share, import, and export dataflow templates. Here is a link where you can find these templates: Apache Nifi Templates

Apache Nifi Registry

Sure! Let’s say you work for a company that has multiple teams working on different projects, and each team has its own set of data flows. With NiFi Registry, you can create a centralized repository for all of these flows, making it easy for teams to share components, collaborate on new flows, and track changes over time.

For example, let’s say that the marketing team has built a data flow for processing customer data, and the sales team needs to use this flow as part of their own data processing pipeline. With NiFi Registry, the marketing team can publish their flow to the registry, and the sales team can then import the flow into their own NiFi instance. This saves the sales team time and effort, as they don’t have to build their own flow from scratch.

NiFi Registry also allows you to manage versions of your flows, so you can keep track of changes over time and roll back to previous versions if necessary. This can be especially useful when you have multiple teams working on the same flow, as it ensures that everyone is using the same version and avoids conflicts or errors that could arise from using outdated versions.

Overall, NiFi Registry is a powerful tool for managing your data flows and collaborating with others. It can help you streamline your data processing pipeline, improve efficiency, and ensure consistency across teams and projects.

Apache Nifi services

DistributedMapCacheServer

DistributedMapCacheServer in Apache NiFi is a service that allows users to store key-value pairs in a distributed cache, which can be shared across multiple NiFi nodes. The primary use case for DistributedMapCacheServer is to improve performance and reduce data duplication in complex data processing workflows.

Here are some common scenarios where DistributedMapCacheServer can be useful:

Shared resources: If multiple NiFi processors need access to a common resource, such as a database connection or an API token, storing that resource in DistributedMapCacheServer can help avoid duplicating the resource across multiple processors, which can reduce network overhead and improve performance.
Lookups: If a processor needs to perform a lookup for a value based on a key, such as looking up a user ID based on an email address, storing the lookup table in DistributedMapCacheServer can help speed up the lookup process and reduce processing time.
Caching: If a processor needs to cache results for a certain amount of time, such as caching the results of an API call for a few minutes, storing the cached results in DistributedMapCacheServer can help avoid unnecessary API calls and improve performance.

Overall, DistributedMapCacheServer can be used to improve performance, reduce data duplication, and simplify complex data processing workflows in Apache NiFi. However, it’s important to carefully evaluate the specific requirements of your data processing workflow before deciding to use DistributedMapCacheServer, as it may not be necessary or appropriate for every use case.

StandardSSLContextService

StandardSSLContextService in Apache NiFi is a service that provides SSL/TLS security for NiFi components that require secure communication, such as web servers and client processors. It allows users to configure SSL/TLS settings, such as certificates, keys, and truststores, which can be used to encrypt data transmissions and verify the identity of communicating parties.

Here are some scenarios where StandardSSLContextService can be useful:

Secure communication: If a NiFi component, such as a web server or client processor, needs to communicate securely over a network, using StandardSSLContextService can help ensure that data transmissions are encrypted and cannot be intercepted or tampered with by unauthorized parties.
Authentication and authorization: If a NiFi component needs to verify the identity of communicating parties, such as when authenticating users or authorizing access to resources, using StandardSSLContextService can help provide a secure and reliable mechanism for authentication and authorization.
Compliance: If a NiFi component needs to comply with security regulations, such as HIPAA, PCI-DSS, or GDPR, using StandardSSLContextService can help ensure that sensitive data is protected and that compliance requirements are met.

Overall, StandardSSLContextService is an important component of NiFi’s security infrastructure, and it can be used to ensure secure and reliable communication between NiFi components and external systems. However, it’s important to configure and manage StandardSSLContextService carefully to ensure that it meets the specific security requirements of your data processing workflow.

Conclusion

In this course, we have covered the basics of Apache NiFi. We started by introducing you to the core components of NiFi, including processors, connections, and flowfiles, and discussed how they work together to form a data flow. We then walked you through the installation process on Windows and provided a step-by-step guide to building a simple data flow that reads data from a CSV file and writes it to a JSON file.

We hope that this course has provided you with a solid foundation for working with Apache NiFi and that you now have a good understanding of its capabilities and how to use it to build data flows. With NiFi’s intuitive drag-and-drop interface and powerful processors, you can easily build data pipelines that integrate with various data sources and systems.

If you are interested in exploring NiFi further, there are many resources available online, including the official documentation, user forums, and online courses. You can also experiment with different processors and connectors to build more complex data flows that meet your specific needs.

If you have any questions or feedback about this course, please don’t hesitate to reach out to us. We value your input and are always looking for ways to improve our courses. Thank you for taking this course, and we wish you the best of luck in your NiFi journey!

Whenever I work, I like having a list of processors in front of me. Get them here easily. Here are 10 categories of Apache NiFi processors with 25 examples for each category:

🔌 Input Processors:

GetFile: Reads data from files on disk
GetFTP: Downloads files from an FTP server
GetSFTP: Downloads files from an SFTP server
GetHTTP: Retrieves data from a HTTP server
GetTwitter: Retrieves tweets from Twitter’s API
GetMongo: Retrieves data from a MongoDB database
GetKafka: Reads messages from an Apache Kafka topic
GetJMSQueue: Reads messages from a JMS queue
GetSMTP: Retrieves email from a SMTP server
GetSyslog: Reads data from a Syslog server
GetSNMP: Retrieves data using SNMP protocol
GetAzureEventHub: Retrieves events from Azure Event Hub
GetAzureIoTHub: Retrieves telemetry data from Azure IoT Hub
GetAzureBlobStorage: Reads data from Azure Blob Storage
GetAzureDataLakeStore: Reads data from Azure Data Lake Store
GetGoogleCloudStorage: Reads data from Google Cloud Storage
GetGoogleCloudPubSub: Reads messages from Google Cloud Pub/Sub
GetGoogleCloudBigQuery: Retrieves data from Google Cloud BigQuery
GetAmazonS3: Reads data from Amazon S3
GetAmazonDynamoDB: Retrieves data from Amazon DynamoDB
GetAmazonKinesis: Reads data from Amazon Kinesis Stream
GetAmazonSQS: Reads messages from Amazon SQS
GetAmazonSES: Retrieves email from Amazon SES
GetAzureCosmosDB: Reads data from Azure Cosmos DB

🔌 Output Processors:

PutFile: Writes data to files on disk
PutFTP: Uploads files to an FTP server
PutSFTP: Uploads files to an SFTP server
PutHTTP: Sends data to a HTTP server
PutTwitter: Sends tweets to Twitter’s API
PutMongo: Inserts data into a MongoDB database
PutKafka: Writes messages to an Apache Kafka topic
PutJMSQueue: Writes messages to a JMS queue
PutSMTP: Sends email using a SMTP server
PutSyslog: Sends data to a Syslog server
PutSNMP: Sends data using SNMP protocol
PutAzureEventHub: Sends events to Azure Event Hub
PutAzureIoTHub: Sends telemetry data to Azure IoT Hub
PutAzureBlobStorage: Writes data to Azure Blob Storage
PutAzureDataLakeStore: Writes data to Azure Data Lake Store
PutGoogleCloudStorage: Writes data to Google Cloud Storage
PutGoogleCloudPubSub: Writes messages to Google Cloud Pub/Sub
PutGoogleCloudBigQuery: Inserts data into Google Cloud BigQuery
PutAmazonS3: Writes data to Amazon S3
PutAmazonDynamoDB: Inserts data into Amazon DynamoDB
PutAmazonKinesis: Writes data to Amazon Kinesis Stream
PutAmazonSQS: Writes messages to Amazon SQS
PutAmazonSES: Sends email using Amazon SES
PutAzureCosmosDB: Writes data to Azure Cosmos DB

🔄 Transformation:

AttributesToJSON: Converts attributes to JSON format.
CompressContent: Compresses the content of a FlowFile.
DecryptContent: Decrypts the content of a FlowFile using the specified algorithm and key.
EncryptContent: Encrypts the content of a FlowFile using the specified algorithm and key.
ExecuteScript: Executes a user-defined script to transform the contents of a FlowFile.
ExtractText: Extracts text from the content of a FlowFile.
JSONtoAttributes: Extracts fields from a JSON document and sets them as attributes of a FlowFile.
JoltTransformJSON: Transforms the content of a FlowFile using a Jolt specification.
QueryRecord: Applies SQL-like operations to the contents of a FlowFile.
UpdateAttribute: Updates the attributes of a FlowFile.

🔒 Security:

EncryptContent: Encrypts the content of a FlowFile using the specified algorithm and key.
HashContent: Computes the hash of the content of a FlowFile using the specified algorithm.
PGPDecrypt: Decrypts a PGP-encrypted FlowFile using the specified key.
PGPEncrypt: Encrypts a FlowFile using PGP encryption.
Sign: Digitally signs the content of a FlowFile using the specified key.
VerifySignature: Verifies the digital signature of a FlowFile using the specified key.
SSLContextService: Configures a SSLContext for secure communication with external systems.
SecureHashContent: Computes a secure hash of the content of a FlowFile using the specified algorithm.
AuthenticateHTTP: Authenticates the HTTP request using Basic Authentication, Digest Authentication or Kerberos Authentication.
EncryptContent: Encrypts the content of a FlowFile using the specified algorithm and key.

🔗 Integration:

ConsumeAMQP: Consumes messages from an AMQP (Advanced Message Queuing Protocol) broker.
ConsumeJMS: Consumes messages from a JMS (Java Message Service) provider.
ConsumeKafka: Consumes messages from an Apache Kafka cluster.
ConsumeMQTT: Consumes messages from an MQTT (Message Queuing Telemetry Transport) broker.
ConsumeSMTP: Retrieves emails from a POP3 or IMAP email server.
ConsumeSNS: Consumes messages from an AWS SNS (Simple Notification Service) topic.
ConsumeSFTP: Retrieves files from an SFTP (Secure File Transfer Protocol) server.
ConsumeTwitter: Retrieves tweets from Twitter based on a search term or user.
ConsumeWebSocket: Receives messages from a WebSocket server.

📦 System:

DistributeLoad: Distributes load across multiple instances of NiFi running on different nodes.
EvaluateJsonPath: Extracts values from a JSON document using a JSONPath expression.
FetchFile: Fetches a file from a remote server via HTTP or HTTPS.
GetFile: Fetches a file from a remote server via SFTP, FTP or NFS.
InvokeHTTP: Sends an HTTP request and receives a response.
ListFile: Lists the files in a directory on a remote server via SFTP, FTP or NFS.
LogAttribute: Logs the attributes of a FlowFile.
MergeContent: Merges multiple FlowFiles into a single FlowFile.
Notify: Sends an email, text message or Slack message.
PutFile: Puts a file on a remote server via SFTP, FTP or NFS.

🔍 Query:

EvaluateXPath: Extracts values from an XML document using

🕵️ Data Profiling and Investigation:

AttributesToJSON: Converts attributes to a JSON document.
HashAttribute: Computes a hash of the content of an attribute and sets it as a new attribute.
RouteOnAttribute: Routes FlowFiles to different relationships based on the values of attributes.

📥 Input Sources:

GetFile: Retrieves files from a directory.
GetFTP: Retrieves files from an FTP server.
GetHTTP: Retrieves data from an HTTP server.

📤 Output Destinations:

PutFile: Writes the contents of a FlowFile to disk.
PutFTP: Sends a FlowFile to an FTP server.
PutS3Object: Uploads a FlowFile to an Amazon S3 bucket.

📜 Data Transformation:

ConvertRecord: Converts data between formats using user-specified schemas.
ExecuteScript: Executes a user-specified script to modify the content of a FlowFile.
SplitText: Splits a text file into multiple FlowFiles based on a specified delimiter.

🔍 Data Query and Manipulation:

QueryRecord: Performs SQL-like queries on a FlowFile and returns the results.
LookupRecord: Looks up data in a lookup service and adds the results to the original FlowFile.
UpdateRecord: Updates the contents of a FlowFile using a user-specified schema.
SPARQL Query Processor in NiFi is a powerful tool for querying RDF data sources using SPARQL queries. It allows users to extract, transform, and load RDF data using a drag-and-drop interface and supports a variety of output formats, including RDF, JSON, and CSV

🧹 Data Cleaning:

ReplaceText: Replaces text in the content of a FlowFile using a specified regular expression.
CleanLog: Parses log data and creates structured records for further processing.
DetectDuplicate: Filters out duplicate records based on a specified field.

🔒 Security:

EncryptContent: Encrypts the contents of a FlowFile using a specified algorithm.
Sign: Signs the contents of a FlowFile using a specified algorithm.
ValidateSignatures: Validates the signature of a FlowFile using a specified key.

I hope this full tutorial helps you a lot! Let me know if you have any other questions.

Written by

Albert Oplog

Hi, I'm Albert Oplog. I would humbly like to share my tech journey with people all around the world.

3 Comments

SPARQL – Full Tutorial in 2023 says:
April 18, 2023 at 8:29 pm

[…] SPARQL and Apache NiFi can be a powerful way to extract, transform, and load (ETL) RDF data. Apache NiFi is an open-source […]
Concrete Real World Application with: Apache Nifi, Kafka, and Spark ML - says:
April 25, 2023 at 2:25 pm

[…] we are going to create a full real-world data-driven application using Apache Nifi, Kafka, and Spark […]
🖥️ What is MarkLogic Server ? - says:
May 5, 2023 at 12:53 pm

[…] Apache Nifi – Full Tutorial 2023 – on 🌀 Concrete Real World Application with: Apache Nifi, Kafka, and Spark ML […]

Comments are closed.