Starburst Galaxy destination user guide
Support Level: Community
Latest Version: 0.0.1
Definition Id: 4528e960-6f7b-4412-8555-7e0097e1da17
Overview
The Starburst Galaxy destination syncs data to Starburst Galaxy great lake catalogs in Apache Iceberg table format. Each stream is written to its own Iceberg table.
Features
Feature | Supported | Notes |
---|---|---|
Overwrite Sync | ✅ | Warning: this mode deletes all previously synced data in the destination table. |
Append Sync | ✅ | |
Append + Deduped | ❌ | |
Namespaces | ✅ | |
SSL | ✅ | SSL is enabled. |
Data storage
Starburst Galaxy supports various object storages; however, only Amazon S3 is supported by this connector.
Configuration
Category | Parameter | Type | Notes |
---|---|---|---|
Starburst Galaxy | Hostname | string | Required. Located in the Connection info section of the view clusters pane in Starburst Galaxy. |
Port | string | Optional. Located in the Connection info section of the view clusters pane in Starburst Galaxy. Defaults to 443 . | |
User | string | Required. Galaxy user found in the Connection info section of the view clusters pane in Starburst Galaxy. | |
Password | string | Required. Password for the specified Galaxy user. | |
Amazon S3 catalog | string | Required. Name of the Amazon S3 catalog created in the Galaxy domain. | |
Amazon S3 catalog schema | string | Optional. The default Starburst Galaxy Amazon S3 catalog schema where tables are written to if the source does not specify a namespace. Each data stream is written to a table in this schema. Defaults to public . | |
Staging Object Store - Amazon S3 | Bucket name | string | Required. Name of the bucket where the staging data is stored. |
Bucket path | string | Required. Sets the subdirectory of the specified S3 bucket used for storing staging data. | |
Bucket region | string | Required. Sets the region of the specified S3 bucket. | |
Access key | string | Required. AWS/Minio credential. | |
Secret key | string | Required. AWS/Minio credential. | |
General | Purge staging Iceberg table | boolean | Optional. Indicates that staging Iceberg table is purged after a data sync is complete. Enabled by default. Disable it for debugging purposes only. |
Staging files
S3
Data streams are written to a temporary Iceberg table, and then loaded into Amazon S3 Starburst Galaxy catalog in the Iceberg table format.
Staging table is deleted after a sync is complete if the Purge staging Iceberg table
is enabled.
The following is an example of a full path for a staging file:
s3://<bucket-name>/<bucket-path>/<namespace/schema>/<temp Iceberg table name {_airbyte_tmp_random-three-chars_stream-name}>
For example:
s3://galaxy_bucket/data_output_path/test_schema/_airbyte_tmp_qey_user
↑ ↑ ↑ ↑
| | | temporary Iceberg table holding data
| | source namespace or provided schema name
| |
| bucket path
bucket name
Target Iceberg SQL table
Streams are synced in the Starburst Galaxy Amazon S3 catalog with Iceberg table format.
Output schema
Each table in the output schema has the following columns:
Column | Type | Description |
---|---|---|
_airbyte_ab_id | varchar | UUID. |
_airbyte_emitted_at | timestamp(6) | Data emission timestamp. |
Data fields from the source stream | various | All the fields from the source stream will be populated as an individual column in the target table. |
_airbyte_additional_properties | map(varchar, varchar) | Additional properties. |
The Airbyte data stream's JSON schema is converted to an Avro schema. The JSON object is then converted to an Avro record; the Avro record is written to a staging Iceberg table. As the data stream can be generated from any data source, the JSON-to-Avro conversion process has arbitrary rules and limitations. Learn more about how source data is converted to Avro.
Datatype support
Learn more about Starburst Galaxy Iceberg type mapping.
Getting started
Requirements
- Starburst Galaxy cluster. Required credentials are found in the Connection info section of the view clusters page
- A Starburst Galaxy S3 catalog created within the Galaxy domain, and attached to a running cluster.
- Credentials for S3 bucket.
- Grant S3 bucket location privileges to the role user is assigned to.
Changelog
Version | Date | Pull Request | Subject |
---|---|---|---|
0.0.1 | 2023-03-28 | #24620 | Initial public release. |