SQL
The SQL processor component allows you to process data using SQL queries. It uses DataFusion as the query engine to execute SQL statements on the data.
Reference to SQL.
Configuration
query
The SQL query statement to execute on the data.
type: string
table_name
The table name to use in SQL queries. This is the name that will be used to reference the data in your SQL queries.
type: string
default: flow
ballista (experimental)
Optional configuration for distributed computing using Ballista. When configured, SQL queries will be executed in a distributed manner.
type: object
required: false
properties:
-
remote_url
: Ballista server URL (e.g., "df://localhost:50050")type:
string
required:
true
temporary_list
Optional list of temporary data sources that can be referenced in SQL queries. Each temporary source allows you to access external data during query execution.
type: array
required: false
properties:
-
name
: Name of the temporary data source to referencetype:
string
required:
true
-
table_name
: Table name to use for this temporary data in SQL queriestype:
string
required:
true
-
key
: Key expression or value used to retrieve data from the temporary sourcetype:
object
required:
true
properties:
-
expr
: Expression string to evaluate for the keytype:
string
required:
false
-
value
: Static string value to use as the keytype:
string
required:
false
-
Examples
Basic SQL Query
- processor:
type: "sql"
query: "SELECT id, name, age FROM users WHERE age > 18"
table_name: "users"
SQL Query with Temporary Data Sources
- temporary:
- name: user_profiles
type: "redis"
mode:
type: single
url: redis://127.0.0.1:6379
redis_type:
type: string
processor:
type: "sql"
query: "SELECT u.id, u.name, p.title FROM users u JOIN profiles p ON u.id = p.user_id"
table_name: "users"
temporary_list:
- name: "user_profiles"
table_name: "profiles"
key:
expr: "user_id"
SQL Query with Ballista (Distributed Computing)
- processor:
type: "sql"
query: "SELECT COUNT(*) as total FROM large_dataset"
table_name: "large_dataset"
ballista:
remote_url: "df://localhost:50050"