Data Ingestion
AWS S3 CSV
AWS S3 is a scalable object storage service that can be used to store and retrieve files.
Overview
Amazon Simple Storage Service (S3) is an object storage service offering industry-leading scalability, availability, and durability. Datazone provides native integration with AWS S3 to read data files directly from your S3 buckets.
Connection Parameters
Parameter | Required | Description |
---|---|---|
Name | Yes | A unique identifier for your AWS S3 source |
AWS Access Key ID | Yes | The access key ID from your AWS credentials |
AWS Secret Access Key | Yes | The secret access key from your AWS credentials |
AWS Region | Yes | The AWS region where your S3 bucket is located (e.g., us-east-1) |
Bucket Name | Yes | The name of the S3 bucket containing your data files |
Required Permissions
The AWS IAM user account needs the following permissions on the specified S3 bucket:
s3:GetObject
- For reading files from the buckets3:ListBucket
- For listing contents of the buckets3:GetBucketLocation
- For determining the bucket’s region
Limitations
Be aware of the following limitations when working with AWS S3 CSV sources:
- CSV, TXT, Parquet, JSON files are supported
- UTF-8 encoding is recommended
- Individual file size limits apply based on your AWS S3 configuration
- The S3 bucket and Datazone instance should ideally be in the same region for optimal performance
- Cross-region access may incur additional AWS charges
Next Steps
After configuring your AWS S3 source:
- Create extracts to specify which CSV files to ingest
- Configure scheduling for recurring extracts
- Integrate the source into your data pipelines
For more information about working with extracts and pipelines, refer to their respective documentation sections.