Overview

Amazon Simple Storage Service (S3) is an object storage service offering industry-leading scalability, availability, and durability. Datazone provides native integration with AWS S3 to read data files directly from your S3 buckets.

Connection Parameters

ParameterRequiredDescription
NameYesA unique identifier for your AWS S3 source
AWS Access Key IDYesThe access key ID from your AWS credentials
AWS Secret Access KeyYesThe secret access key from your AWS credentials
AWS RegionYesThe AWS region where your S3 bucket is located (e.g., us-east-1)
Bucket NameYesThe name of the S3 bucket containing your data files

Required Permissions

The AWS IAM user account needs the following permissions on the specified S3 bucket:

  • s3:GetObject - For reading files from the bucket
  • s3:ListBucket - For listing contents of the bucket
  • s3:GetBucketLocation - For determining the bucket’s region

Limitations

Be aware of the following limitations when working with AWS S3 CSV sources:

  • CSV, TXT, Parquet, JSON files are supported
  • UTF-8 encoding is recommended
  • Individual file size limits apply based on your AWS S3 configuration
  • The S3 bucket and Datazone instance should ideally be in the same region for optimal performance
  • Cross-region access may incur additional AWS charges

Next Steps

After configuring your AWS S3 source:

  1. Create extracts to specify which CSV files to ingest
  2. Configure scheduling for recurring extracts
  3. Integrate the source into your data pipelines

For more information about working with extracts and pipelines, refer to their respective documentation sections.