ArkFlow v0.3.0 is now available!
We are thrilled to announce a significant update to ArkFlow! ArkFlow is a high-performance stream processing engine built on Rust, dedicated to providing powerful and easy-to-use data stream processing capabilities. The latest version brings many exciting new features and significant enhancements designed to help you process complex data streams more efficiently.
In this blog post, we will delve into these new features, from broader data source connectivity and more powerful processing capabilities with custom extensions, to landmark distributed computing experimental support, and continuous performance and usability optimizations.
New Upgrades: A Deep Dive into Enhanced Core Capabilities
ArkFlow's core components [input, processing, output, and buffering] have all received significant enhancements and expansions. Let's explore them.
Broader Input Connectivity: Accessing More Diverse Data Streams
To enable ArkFlow to connect to a wider data ecosystem, we have introduced several new input components and strengthened the functionality of existing ones:
-
Websocket Input (New): ArkFlow can now directly subscribe to messages from WebSocket connections. This is an important addition for scenarios requiring the processing of data from web applications or other real-time frontend push data.
-
Nats Input (New): Added support for the Nats messaging system, allowing subscription to messages from Nats topics. Nats is popular in modern cloud-native and microservice architectures for its high performance and simplicity. Integrating Nats allows ArkFlow to better fit into these environments. Its configuration has also been further optimized.
-
Redis Input (New): ArkFlow can now also subscribe to messages from Redis channels or lists. As a widely used cache and message broker, this new input component allows users to easily leverage existing Redis data streams. Its configuration has also been refactored for improved usability.
More Powerful and Efficient Processing Capabilities
The value of data lies in its flow and transformation. The new version also brings significant improvements in data processing:
- JSON Processor Optimization: JSON processing performance has been improved, particularly the json_to_arrow conversion logic. JSON is a ubiquitous data format, and optimizing its processing, especially by converting to the columnar storage format Arrow, can significantly enhance the performance of downstream SQL processing or other analytical tasks.
- SQL Processor Enhancement and Custom UDF Support: The SQL processor's configuration and documentation have been refactored. More excitingly, the SQL processor now supports custom User-Defined Functions (UDFs). This powerful feature allows users to register their own functions, greatly expanding SQL's processing capabilities to meet specific business logic and complex data transformation needs. To ensure stability and ease of use, the UDF registration mechanism has also been improved, such as disallowing the injection of functions with the same name and refactoring the registration function.
- Protobuf Processor Adds Field Filtering: The Protobuf processor now supports field filtering. Protobuf is highly efficient for structured data, and field filtering allows users to select only necessary data, thereby reducing processing overhead and data volume.
- Introduction of VRL (Vector Remap Language) Processor: ArkFlow now has built-in support for VRL processing. VRL is a language specifically designed for transforming observability data (like logs and metrics), but its utility can extend to general data manipulation.
Optimizations in JSON to Arrow conversion, significant enhancements to the SQL processor (especially UDF support), and improvements to the Protobuf processor indicate that ArkFlow is focusing on use cases involving structured and semi-structured data, where both transformation flexibility (SQL, VRL, UDF) and performance (Arrow, Protobuf filtering) are crucial.
More Flexible Output and Error Handling
Processed data needs to be reliably delivered to its destination, and error handling is equally important.
- Nats Output (New!): ArkFlow now supports publishing processed data and error data to Nats topics. This complements the Nats input component, allowing users to build complete Nats-to-Nats data pipelines or integrate with other Nats consumers. Providing consistent Nats support for regular output and error output reflects our consideration for the completeness of new component integration.
- Enhanced Error Output Mechanism: ArkFlow has comprehensively enhanced its error output mechanism. This provides more powerful and flexible management and routing options for data that encounters issues during processing, thereby increasing the resilience of data pipelines and simplifying the debugging process.
This symmetrical addition of Nats input, output, and error output, along with the improvement of overall error handling capabilities, demonstrates ArkFlow's thorough strategy in integrating new components and strengthening core functionalities, ensuring users can build more reliable and maintainable data processing links.
More Granular Buffering and Windowing Capabilities
Buffering mechanisms in stream processing systems are crucial for handling backpressure, temporarily storing messages, and performing time-window aggregations.
Memory Buffer: ArkFlow continues to offer robust memory buffering capabilities, which are critical for high-throughput scenarios and basic window aggregation operations.
New Windowing Buffer Components! To meet more complex time-series analysis and event stream processing needs, ArkFlow has introduced three specialized windowing buffer components:
- Session Window: Used for grouping data based on periods of inactivity, ideal for analyzing user sessions or related event chains.
- Sliding Window: Provides continuously overlapping windows, allowing for smooth analysis of recent data, often used in scenarios like moving average calculations.
- Tumbling Window: Divides data into fixed-size, non-overlapping time segments, suitable for aggregated reporting on fixed intervals.
These new windowing buffer mechanisms greatly enhance the flexibility and power of ArkFlow in handling time-sensitive data, enabling users to define and execute complex data aggregation and analysis logic with greater precision. They complement the existing memory buffer, providing users with a richer toolset to tackle diverse stream processing challenges.
Experimental introduction of distributed computing capabilities
The most notable feature evolution in recent ArkFlow development is undoubtedly the experimental distributed computing support. It signifies that ArkFlow is evolving towards systems capable of handling larger data volumes and more demanding processing loads, allowing tasks to be distributed across multiple nodes or worker units.
Underlying Optimizations: Continuously Improving Performance and Usability
In addition to introducing heavyweight new components, the ArkFlow team also continuously refines existing functionalities to provide better performance and user experience.
We re-emphasize the JSON to Arrow conversion optimization , a concrete example of performance improvement.
To enhance the performance of the final product, the build process has also been optimized, for instance, by enabling Link-Time Optimization (LTO), setting codegen-units to 1 and setting opt-level to 3.
The plugin mechanism has been refactored, paving the way for easier development and integration of custom extensions in the future, further reinforcing ArkFlow's commitment to extensibility. Documentation updates are also an area of continuous investment. To improve the clarity and usability of the documentation, we have split document versions. These ongoing documentation updates demonstrate our understanding that good documentation is crucial for product adoption and ease of use, especially as new features are constantly being added.
The project structure includes arkflow-plugin-examples , indicating our commitment to helping users extend the system. Modular design and extensibility are core features of ArkFlow. Providing examples is a practical way to lower the barrier for users to write custom plugins, which can greatly enhance the platform's capabilities beyond what the core team provides and foster a vibrant ecosystem.
Experience It: NowStart or Upgrade Your ArkFlow Journey
We encourage you to try the latest version of ArkFlow immediately!
Using the standard Rust toolchain (cargo) and common YAML configuration makes ArkFlow relatively easy to get started with for developers familiar with the Rust ecosystem and common configuration practices.
Acknowledgement: Power of community
Thanks to @xiaoziv for bringing us the VRL processor.
Join the ArkFlow Community, Shape the Future Together
We believe that flowing data can generate greater value and are committed to building a vast data processing ecosystem to simplify the threshold for data processing. The development of ArkFlow relies on community support:
- Feel free to Star our project on GitHub!
- Actively participate in GitHub Discussions, which has sections for announcements, Q&A, and more. The project is actively fostering a community through GitHub Discussions, which is crucial for the growth, feedback, and contributions of an open-source project.
- If you are interested, you are also welcome to become a contributor.
Discord: https://discord.gg/CwKhzb8pux
ArkFlow is committed to continuous improvement, with more exciting features in the pipeline. Stay tuned!