SQL
The SQL input component allows you to query data from various input sources using SQL. It supports both local file-based data sources and database connections, with optional distributed computing capabilities through Ballista.
Reference to SQL.
Configuration
select_sql
The SQL query statement to execute. This query will be applied to the data source specified in input_type.
type: string
required: true
ballista (experimental)
Optional configuration for distributed computing using Ballista. When configured, SQL queries will be executed in a distributed manner.
type: object
required: false
properties:
-
remote_url: Ballista server URL (e.g., "df://localhost:50050")type:
stringrequired:
true
input_type
Specifies the type and configuration of the input source to query from.
type: object
required: true
The configuration varies based on the input type selected. Available input types are detailed below.
Input Type Configurations
Avro
-
table_name: Optional table name used in SQL queries (defaults to "flow")type:
stringrequired:
false -
path: Path to Avro filetype:
stringrequired:
true
Arrow
-
table_name: Optional table name used in SQL queries (defaults to "flow")type:
stringrequired:
false -
path: Path to Arrow filetype:
stringrequired:
true
Json
-
table_name: Optional table name used in SQL queries (defaults to "flow")type:
stringrequired:
false -
path: Path to JSON filetype:
stringrequired:
true
Csv
-
table_name: Optional table name used in SQL queries (defaults to "flow")type:
stringrequired:
false -
path: Path to CSV filetype:
stringrequired:
true
Parquet
-
table_name: Optional table name used in SQL queries (defaults to "flow")type:
stringrequired:
false -
path: Path to Parquet filetype:
stringrequired:
true
Mysql
-
name: Optional connection name (defaults to "flow")type:
stringrequired:
false -
uri: MySQL connection URItype:
stringrequired:
true -
ssl:-
ssl_mode: SSL mode for connection securitytype:
stringrequired:
true -
root_cert: Optional root certificate pathtype:
stringrequired:
false
-
DuckDB
-
name: Optional connection name (defaults to "flow")type:
stringrequired:
false -
path: Path to DuckDB filetype:
stringrequired:
true
Postgres
-
name: Optional connection name (defaults to "flow")type:
stringrequired:
false -
uri: PostgreSQL connection URItype:
stringrequired:
true -
ssl:-
ssl_mode: SSL mode for connection securitytype:
stringrequired:
true -
root_cert: Optional root certificate pathtype:
stringrequired:
false
-
Sqlite
-
name: Optional connection name (defaults to "flow")type:
stringrequired:
false -
path: Path to SQLite filetype:
stringrequired:
true
Examples
Basic MySQL Connection
input:
type: "sql"
select_sql: "SELECT * FROM flow"
input_type:
type: "mysql"
name: "my_mysql"
uri: "mysql://user:password@localhost:3306/db"
ssl:
ssl_mode: "verify_identity"
root_cert: "/path/to/cert.pem"
Distributed Query with Ballista
input:
type: "sql"
select_sql: "SELECT * FROM flow where id > 1000"
ballista:
remote_url: "df://localhost:50050"
input_type:
type: "parquet"
table_name: "parquet_table"
path: "/path/to/data.parquet"