• Shift Elevate
  • Posts
  • Retry Pattern: Building Resilient Cloud Applications

Retry Pattern: Building Resilient Cloud Applications

Ever found yourself staring at a failed API call, wondering if it's a temporary glitch or a permanent failure? In cloud environments, transient (temporary and self resolving) failures are inevitable due to network timeouts, service unavailability, or temporary resource constraints. The Retry pattern provides a systematic approach to handle these failures gracefully, ensuring your applications remain resilient and responsive.

This pattern is essential for any cloud native application that communicates with external services, databases, or APIs. By implementing intelligent retry logic, you can significantly improve your application's reliability and user experience.

Understanding the Retry Pattern

The Retry pattern enables applications to handle transient failures by automatically retrying failed operations. Instead of immediately failing when an operation doesn't succeed, the application waits for a brief period and attempts the operation again.

Key Retry Strategies

  • Fixed Backoff: Constant wait time between retry attempts.

  • Linear Backoff: Wait time increases linearly with each retry attempt.

  • Exponential Backoff: Wait time increases exponentially with each retry attempt.

Core Components

Retry Pattern Components

Implementing the Retry Pattern in Java

Let's build a robust retry mechanism that handles various failure scenarios gracefully.

Implementation Overview

Retry Policy Interface

public interface RetryPolicy {
    boolean shouldRetry(Exception exception, int attemptCount);
    long getWaitTime(int attemptCount);
    int getMaxAttempts();
}

Retry Policy Implementation

public class ExponentialBackoffPolicy implements RetryPolicy {
    private final int maxAttempts;
    private final long initialWaitTime;
    private final double backoffMultiplier;
    
    public ExponentialBackoffPolicy(int maxAttempts, long initialWaitTime, double backoffMultiplier) {
        this.maxAttempts = maxAttempts;
        this.initialWaitTime = initialWaitTime;
        this.backoffMultiplier = backoffMultiplier;
    }
    
    @Override
    public boolean shouldRetry(Exception exception, int attemptCount) {
        // Don't retry on certain exceptions
        if (exception instanceof IllegalArgumentException) {
            return false;
        }
        return attemptCount < maxAttempts;
    }
    
    @Override
    public long getWaitTime(int attemptCount) {
        return (long) (initialWaitTime * Math.pow(backoffMultiplier, attemptCount - 1));
    }
    
    @Override
    public int getMaxAttempts() {
        return maxAttempts;
    }
}

Retry Handler

public class RetryHandler {
    private final RetryPolicy retryPolicy;
    
    public RetryHandler(RetryPolicy retryPolicy) {
        this.retryPolicy = retryPolicy;
    }
    
    public void executeWithRetry(Runnable operation) throws Exception {
        int attemptCount = 0;
        Exception lastException = null;
        
        while (attemptCount < retryPolicy.getMaxAttempts()) {
            try {
                attemptCount++;
                operation.run();
                return; // Success - exit the retry loop
            } catch (Exception e) {
                lastException = e;
                
                if (!retryPolicy.shouldRetry(e, attemptCount)){
                    throw e;
                }
                
                if (attemptCount < retryPolicy.getMaxAttempts()) {
                    long waitTime = retryPolicy.getWaitTime(attemptCount);
                    System.out.println("Attempt " + attemptCount + " failed. Retrying in " + waitTime + "ms...");
                    Thread.sleep(waitTime);
                }
            }
        }
        
        throw new RuntimeException("Operation failed after " + attemptCount + " attempts", lastException);
    }
}

Practical Payment Service Example

// Simple data class for payment requests
public class PaymentRequest {
    private final String transactionId;
    private final double amount;
    
    public PaymentRequest(String transactionId, double amount) {
        this.transactionId = transactionId;
        this.amount = amount;
    }
    
    public String getTransactionId() { return transactionId; }
    public double getAmount() { return amount; }
}

public class PaymentService {
    private final RetryHandler retryHandler;
    
    public PaymentService() {
        RetryPolicy policy = new ExponentialBackoffPolicy(3, 1000, 2.0);
        this.retryHandler = new RetryHandler(policy);
    }
    
    public void processPayment(PaymentRequest request) throws Exception {
        retryHandler.executeWithRetry(() -> {
            // Simulate external payment gateway call
            callPaymentGateway(request);
        });
    }
    
    private void callPaymentGateway(PaymentRequest request) throws Exception {
        // Simulate network failure or service unavailability
        double random = Math.random();
        if (random < 0.7) {
            throw new RuntimeException("Payment gateway temporarily unavailable");
        }
        
        System.out.println("Payment processed successfully for amount: " + request.getAmount());
    }
}

// Usage Example
public class RetryPatternDemo {
    public static void main(String[] args) {
        PaymentService paymentService = new PaymentService();
        
        try {
            PaymentRequest request = new PaymentRequest("12345", 100.0);
            paymentService.processPayment(request);
            System.out.println("Payment completed successfully!");
        } catch (Exception e) {
            System.err.println("Payment failed: " + e.getMessage());
        }
    }
}

Best Practices

Production Considerations

Retryable vs Non-Retryable Exceptions: Distinguish between transient failures (network timeouts) and permanent failures (invalid input)

public class RetryableException extends RuntimeException {
    public RetryableException(String message) {
        super(message);
    }
}

public class NonRetryableException extends RuntimeException {
    public NonRetryableException(String message) {
        super(message);
    }
}

Idempotency: Ensure operations can be safely retried without side effects

public class IdempotentPaymentService {
    private final Set<String> processedTransactions = new ConcurrentHashSet<>();
    private final RetryHandler retryHandler;
    
    public IdempotentPaymentService() {
        RetryPolicy policy = new ExponentialBackoffPolicy(3, 1000, 2.0);
        this.retryHandler = new RetryHandler(policy);
    }
    
    public void processPayment(PaymentRequest request) throws Exception {
        String transactionId = request.getTransactionId();
        
        // Check if already processed
        if (processedTransactions.contains(transactionId)) {
            System.out.println("Transaction " + transactionId + " already processed");
            return;
        }
        
        retryHandler.executeWithRetry(() -> {
            callPaymentGateway(request);
        });
        
        // Mark as processed only after successful execution
        processedTransactions.add(transactionId);
        System.out.println("Transaction " + transactionId + " processed successfully");
    }
    
    private void callPaymentGateway(PaymentRequest request) throws Exception {
        // Simulate external payment gateway call
        double random = Math.random();
        if (random < 0.7) {
            throw new RuntimeException("Payment gateway temporarily unavailable");
        }
        
        System.out.println("Payment processed successfully for amount: " + request.getAmount());
    }
}

Performance and Scalability Considerations

  • Resource Management: Retry attempts consume resources implement timeouts and limits

  • Cascading Failures: Avoid retry storms by implementing circuit breakers

  • Monitoring: Track retry success rates, latency impact, and resource consumption

The Retry pattern is fundamental to building resilient cloud applications. By implementing intelligent retry logic with proper backoff strategies, you can significantly improve your application's reliability and provide a better user experience during transient failures.

Found this helpful? Share it with a colleague who's struggling with unreliable API calls or service failures. Have questions about implementing retry patterns in your specific use case? Email us directly, we read every message and the best questions become future newsletter topics.