Cost Optimization: Caching Secrets and DynamoDB with Redis/Valkey

March 19, 2025

In applications that make intensive use of Secrets Manager and DynamoDB, operational costs can skyrocket quickly if proper optimization strategies are not applied. In this article, I share how we implemented a caching solution with Redis (and later Valkey) that allowed us to reduce the costs of these services from 300$ to 45$ monthly, while also improving the latency of our applications.

The problem: Disproportionate costs in API calls

Our architecture consisted of various components that frequently accessed secrets and configuration data:

Affected components:

Async workers: Processes that query secrets on every execution
Web platform: REST API that accesses configurations stored in DynamoDB
Microservices: Different services that need connection credentials

The main problem? Every time a worker or the web needed a secret or configuration, it made a direct request to AWS Secrets Manager or DynamoDB. This meant:

Costs per API calls: Thousands of daily requests that accumulate on the bill
Additional latency: Each request requires a round-trip to AWS
External dependency: Availability depended on AWS services

The solution: Caching layer with Redis

The strategy was simple but effective: interpose a caching layer between the application and AWS services.

Initial architecture: Redis in-cluster

The first implementation used Redis deployed inside the Kubernetes cluster:

Characteristics:

Deployment as StatefulSet inside the EKS cluster
Persistence with EBS volumes
TTL (Time To Live) configuration for keys

Immediate benefits:

Drastic reduction of requests: The vast majority of queries were served from cache
Latency improvement: Local queries were significantly faster
Lower cost: The cost of Redis inside the cluster was insignificant compared to savings

Implementation: Cache-Aside Pattern

For the implementation, we used the Cache-Aside pattern (also known as Lazy Loading):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Pseudocode of the Cache-Aside pattern
def get_secret(secret_name):
    # 1. First, try to get from cache
    cached_value = redis.get(f"secret:{secret_name}")
    if cached_value:
        return cached_value

    # 2. If it doesn't exist in cache, go to the source
    secret_value = secrets_manager.get_secret_value(secret_name)

    # 3. Store in cache with appropriate TTL
    redis.setex(f"secret:{secret_name}", 3600, secret_value)

    return secret_value

TTL strategy:

Secrets Manager and DynamoDB: 1 hour (3600 seconds) - Balance between freshness and savings

Migration to Valkey (AWS)

Once we validated that the caching approach worked correctly, we decided to migrate to Valkey managed by AWS (ElastiCache).

Why Valkey?

Valkey is the open source project that emerged after the Redis divergence. AWS adopted it as the default option for ElastiCache.

Migration advantages:

Reduced management: No need to manage cache infrastructure
High availability: Automatic replication and failover
Scalability: Easy to scale vertically or horizontally
Monitoring: Native integration with CloudWatch
Security: Encryption in transit and at rest

Implemented configuration:

Node type: cache.t4g.small (Graviton instances for savings)

Quantifiable results

Cost savings

Before (without cache):

Secrets Manager: 250$/month
DynamoDB: 50$/month (mainly in API calls)
Total: 300$/month

After (with Valkey):

Secrets Manager: 25$/month (90% reduction)
DynamoDB: 10$/month (80% reduction)
Valkey (ElastiCache): 10$/month
Total: 45$/month

Total savings: 255$/month (85% reduction)

Performance improvement

Beyond economic savings, performance improvements have been significant:

Query latency: Significant reduction in secret and configuration queries
Throughput: Increased capacity to handle more requests per second
User experience: APIs respond faster thanks to local cache

Operational benefits

Reduction of external dependencies: Most requests don’t leave the cluster
Better user experience: APIs respond faster
Fewer limitations: Significant reduction of throttling at Secrets Manager and DynamoDB
Scalability: The system can handle more load with the same resources

Lessons learned

1. Not everything needs to be cached

It’s important to analyze which data really benefits from cache:

High read frequency, low write frequency: ✅ Perfect for cache
Constantly changing data: ❌ Better to access directly
Critical security secrets: Consider very short TTLs

2. Size matters

The Valkey instance size should be adjusted based on:

Volume of data to cache
Access pattern (read-heavy vs write-heavy)
Latency requirements

In our case, a cache.t4g.small has been sufficient, but monitoring is necessary.

3. Monitoring is key

We have implemented alerts for:

Cache hit rate: Should be >90%
Evictions: If they increase, memory needs to be expanded
Connection errors: Indicate problems in Valkey

4. Planning the fallback

We should always have a plan in case Valkey goes down:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
def get_with_fallback(key, fetcher, ttl=3600):
    try:
        # Try to get from cache
        cached = redis.get(key)
        if cached:
            return cached
    except RedisError:
        # If Redis fails, go directly to the source
        pass  # Fallback

    # Get from source (AWS)
    value = fetcher()

    # Try to cache (optional, can fail)
    try:
        redis.setex(key, ttl, value)
    except RedisError:
        pass  # Don't fail the operation if cache fails

    return value

This strategy ensures the system continues to function even if the cache fails.

Conclusion

Implementing a caching layer with Redis/Valkey between our application and AWS services (Secrets Manager and DynamoDB) has been one of the most impactful optimizations we’ve performed.

Final results:

Economic savings: 255$/month (85% reduction)
Performance improvement: Significant latency reduction in queries
Better user experience: Much more responsive APIs
Greater capacity: Possible throughput increase

The key to success has been:

Analyze usage pattern before implementing
Start simple (Redis in-cluster) and then scale (Valkey AWS)
Implement appropriate TTL strategies and invalidation
Monitor continuously to optimize
Plan fallback to guarantee availability

For any application that makes intensive use of Secrets Manager, DynamoDB, or any AWS service with costs per API call, using a cache like Valkey is, almost certainly, one of the best investments you can make. The return is immediate, both in costs and performance.

If your application has a high monthly bill in these services, I encourage you to try this strategy. The results, as we’ve seen, can be surprising.