Introduction
Google Cloud Storage (GCS) is a service for storing objects in Google Cloud. While GCS provides native XML, JSON, and gRPC APIs, Google recommends using a higher level interface such as GCS Fuse or the Cloud Storage client libraries when possible. Higher level interfaces simplify integration, provide optimal GCS performance, and are actively developed with new features and bug fixes.
Many customers have existing software built for Amazon S3 using the AWS SDK. GCS’s XML interface is S3-compatible and often existing code needs only minor adjustments to work with GCS. However, don’t forget that AWS designs their software for use with their services and may make changes without considering other providers.
The Breaking Change
AWS recently added default data integrity protections to their SDKs and CLI. While beneficial for S3 users, this change made the default settings of the AWS SDK incompatible with most third-party S3-compatible services, including GCS.
After upgrading your AWS SDK, some operations (including uploads and downloads) will fail against GCS with errors like:
An error occurred (SignatureDoesNotMatch) when calling the PutObject operation: Invalid argument.
Expected checksum x4Vs7w== did not match calculated checksum: u4E0XQ==
Though providers may eventually adapt their endpoints to be compatible with the new AWS tool defaults, in the meantime it’s possible revert to the previous behavior.
Impacted Versions
Amazon rolled out this default incrementally. My testing shows the following version history for when the defaults changed and the ability to override became available:
- Boto3 (Python SDK):
- boto3 <= 1.35.99: No checksum config needed.
- boto3 >= 1.36.00: Requires
when_required
checksum config.
- boto3 <= 1.35.99: No checksum config needed.
- AWS CLI:
- awscli <= 2.22.35: No checksum config needed.
- awscli >= 2.23.0 <= 2.23.4: Incompatible; checksum config unavailable.
- awscli >= 2.23.5: Requires
when_required
checksum config.
- awscli <= 2.22.35: No checksum config needed.
Other AWS SDKs are also affected. For example, see the GitHub announcement for the Go SDK updates.
The Solution
For the latest versions of AWS tools to work with GCS, two settings must be applied. These settings force the libraries to use the previous default checksum parameters that are compatible with GCS. They can be configured via environment variables (for both AWS CLI and Python SDK) or client configuration (e.g., Boto client config, AWS CLI profile).
Environment Variables:
Set these before running your application or CLI commands:
AWS_REQUEST_CHECKSUM_CALCULATION='when_required'
AWS_RESPONSE_CHECKSUM_VALIDATION='when_required'
Client Configuration (Boto client config or AWS CLI profile):
Alternatively, configure these directly:
request_checksum_calculation='when_required'
response_checksum_validation='when_required'
Practical Examples:
- Python (Boto3): See the config syntax example and environment variable technique.
- AWS CLI: An example of the environment variable technique with Docker is available.
- Go: A simple test using this Stack Overflow solution worked for me.
For more code examples visit my Github repository.
Conclusion
While using AWS SDKs with GCS can ease and accelerate adoption, be sure to test compatibility before adopting new client versions. For this latest AWS update applying these simple changes should regain compatibility that was lost due to AWS’s recent updates to default data integrity protection.
As always, comments are welcome!