Back to projects

Language-Bridge-Automation

Building an AI Translation System: A Serverless Architecture Learning Project

Exploring event-driven design and AWS services through practical implementation

Introduction

Over the past few weeks, I was accepted into the Azubi Talent Mobility Program (Cloud Engineering Track). As part of our core curriculum, I was tasked with building an AI automation system that would accept files (JSON format) or text inputs, upload them to Amazon S3, trigger a Lambda function automatically, and use Amazon Translate to process the content.

This project was designed purely for learning purposes - to provide hands-on experience with serverless architecture, event-driven systems, and AWS service integration. The primary goal was educational rather than commercial deployment.

Architecture Overview

After analyzing the requirements, I designed this serverless architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client        │────▢│  API Gateway    │────▢│ Lambda (Node.js)│────▢│   S3 Bucket     β”‚
β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚
β”‚ β€’ curl requests β”‚     β”‚ β€’ REST endpointsβ”‚     β”‚ β€’ Request proc. β”‚     β”‚ β€’ Store requestsβ”‚
β”‚ β€’ JSON payloads β”‚     β”‚ β€’ Route to Lambdaβ”‚     β”‚ β€’ Validation    β”‚     β”‚ β€’ Trigger eventsβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚ β€’ File handling β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–²                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
         β”‚                                                                       β”‚
         β”‚                                                                       β”‚ S3 Event
         β”‚                                                                       β”‚ ObjectCreated
         β”‚                                                                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Response      │◀────│   S3 Bucket     │◀────│Lambda (Python)  │◀────│   S3 Trigger    β”‚
β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚
β”‚ β€’ JSON response β”‚     β”‚ β€’ Store results β”‚     β”‚ β€’ AWS Translate β”‚     β”‚ β€’ Event handler β”‚
β”‚ β€’ Status codes  β”‚     β”‚ β€’ File storage  β”‚     β”‚ β€’ Process text  β”‚     β”‚ β€’ Auto invoke   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚ β€’ Save results  β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                         β”‚
                                                         β–Ό
                                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                β”‚  AWS Translate  β”‚
                                                β”‚                 β”‚
                                                β”‚ β€’ 75+ languages β”‚
                                                β”‚ β€’ AI translationβ”‚
                                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

System Components

API Gateway: Provides RESTful endpoints for translation requests Lambda (Node.js): Handles request processing, validation, and file management S3 Storage: Stores translation requests and triggers downstream processing Lambda (Python): Processes translation using AWS Translate service Event-Driven Flow: S3 object creation automatically triggers translation processing

This architecture embraces the serverless paradigm - no infrastructure management, automatic scaling, and pay-per-use pricing.

Technical Implementation

Node.js API Handler

The primary challenge was building an API that could handle both text and file translations while working within Lambda's execution model:

// api/src/main.ts
export const translateTextHandler = async (req: Request, res: Response) => {
    try {
        const { text, sourceLanguage = 'auto', targetLanguage = 'en', waitForResult = true } = req.body;
        
        if (!text) {
            return res.status(400).json({ error: 'Text is required' });
        }

        const requestId = uuidv4();
        const translationRequest = {
            request_id: requestId,
            text: text,
            source_language: sourceLanguage,
            target_language: targetLanguage,
            timestamp: new Date().toISOString(),
            type: 'text_translation'
        };

        await uploadTranslationRequest(translationRequest, requestId);
        
        if (!waitForResult) {
            return res.status(202).json({
                message: 'Translation request submitted successfully',
                requestId: requestId,
                status: 'processing'
            });
        }

        const result = await waitForTranslationResult(requestId, 30);
        
        if (result) {
            return res.status(200).json({
                message: 'Translation completed successfully',
                requestId: requestId,
                status: 'completed',
                result: result
            });
        }
    } catch (error) {
        console.error('Translation error:', error);
        return res.status(500).json({ error: 'Failed to process translation' });
    }
};

Lambda Adapter Pattern

One significant challenge was adapting Express-style code for AWS Lambda. Lambda functions receive events in a specific format, not HTTP request objects. I created an adapter layer:

// infrastructure/index.js
class MockRequest {
    constructor(event) {
        this.body = event.body ? JSON.parse(event.body) : {};
        this.params = event.pathParameters || {};
        this.query = event.queryStringParameters || {};
        this.headers = event.headers || {};
        this.method = event.httpMethod;
        this.path = event.path;
    }
}

class MockResponse {
    constructor() {
        this.statusCode = 200;
        this.responseBody = {};
        this.headers = {
            'Content-Type': 'application/json',
            'Access-Control-Allow-Origin': '*'
        };
    }

    status(code) {
        this.statusCode = code;
        return this;
    }

    json(body) {
        this.responseBody = body;
        return this;
    }

    toAPIGatewayResponse() {
        return {
            statusCode: this.statusCode,
            headers: this.headers,
            body: JSON.stringify(this.responseBody)
        };
    }
}

This pattern allowed me to maintain familiar Express-style handler logic while ensuring compatibility with Lambda's event model.

Infrastructure as Code with Terraform

The project emphasized reproducible infrastructure, so I learned Terraform for managing AWS resources:

# infrastructure/main.tf
resource "aws_lambda_function" "api_handler" {
  filename         = "nodejs-api-lambda.zip"
  function_name    = "ai-automation-nodejs-api-handler"
  role            = aws_iam_role.api_lambda_role.arn
  handler         = "index.handler"
  runtime         = "nodejs18.x"
  timeout         = 30
  memory_size     = 512

  environment {
    variables = {
      AWS_S3_BUCKET_NAME = data.aws_s3_bucket.existing_bucket.id
      NODE_ENV          = "production"
    }
  }
}

resource "aws_api_gateway_rest_api" "translation_api" {
  name        = "ai-automation-api"
  description = "AI Translation Learning Project"
  
  endpoint_configuration {
    types = ["REGIONAL"]
  }
}

Using Terraform taught me the value of declarative infrastructure management. Changes could be version-controlled, reviewed, and applied consistently.

Event-Driven Architecture

The most elegant aspect of the solution was the event-driven translation pipeline:

Translation Processing Flow

  1. Request Submission: API receives translation request via HTTP POST
  2. S3 Upload: Request data stored in S3 under requests/ prefix
  3. Automatic Trigger: S3 ObjectCreated event automatically invokes Python Lambda
  4. Translation Processing: AWS Translate processes the content
  5. Result Storage: Translated content stored in S3 under responses/ prefix
  6. Status Retrieval: Users can poll for results via status endpoint

Python Translation Worker

# lambda/translation_handler.py
import json
import boto3
from datetime import datetime

def lambda_handler(event, context):
    s3 = boto3.client('s3')
    translate = boto3.client('translate')
    
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']
        
        response = s3.get_object(Bucket=bucket, Key=key)
        request_data = json.loads(response['Body'].read())
        
        result = translate.translate_text(
            Text=request_data['text'],
            SourceLanguageCode=request_data['source_language'],
            TargetLanguageCode=request_data['target_language']
        )
        
        response_data = {
            'requestId': request_data['request_id'],
            'originalText': request_data['text'],
            'translatedText': result['TranslatedText'],
            'sourceLanguage': result['SourceLanguageCode'],
            'targetLanguage': result['TargetLanguageCode'],
            'timestamp': datetime.utcnow().isoformat(),
            'status': 'completed'
        }
        
        s3.put_object(
            Bucket=bucket,
            Key=f"responses/{request_data['request_id']}.json",
            Body=json.dumps(response_data),
            ContentType='application/json'
        )
    
    return {'statusCode': 200}

This event-driven approach eliminated the need for polling mechanisms or complex orchestration. The system automatically processes requests as they arrive.

Testing and Validation

API Testing

I developed comprehensive tests to validate system functionality:

# Text translation test
curl -X POST 'https://api-endpoint/translate/text' \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "Hello, how are you today?",
    "sourceLanguage": "en",
    "targetLanguage": "es",
    "waitForResult": false
  }'

Response Validation

{
  "message": "Translation request submitted successfully",
  "requestId": "9c45f245-39f5-4835-a2f6-9526ad268f65",
  "status": "processing"
}

Status Checking

curl -s 'https://api-endpoint/translate/status/9c45f245-39f5-4835-a2f6-9526ad268f65'
{
  "message": "Translation completed",
  "requestId": "9c45f245-39f5-4835-a2f6-9526ad268f65",
  "status": "completed",
  "result": {
    "originalText": "Hello, how are you today?",
    "translatedText": "Hola, ΒΏcΓ³mo estΓ‘s hoy?",
    "sourceLanguage": "en",
    "targetLanguage": "es"
  }
}

Deployment Process

Lambda Package Building

Creating deployable Lambda packages required learning about dependency management and build processes:

#!/bin/bash
BUILD_DIR="/tmp/nodejs-lambda-build-$(date +%s)"
mkdir -p "$BUILD_DIR"

cp -r api/src "$BUILD_DIR/"
cp api/package.json "$BUILD_DIR/"
cp api/tsconfig.json "$BUILD_DIR/"
cp infrastructure/index.js "$BUILD_DIR/"

cd "$BUILD_DIR"
npm install --production
npm install -g typescript
tsc

zip -r nodejs-api-lambda.zip . -x "*.ts" "tsconfig.json"

Infrastructure Deployment

cd infrastructure
terraform init
terraform plan
terraform apply -auto-approve

Learning Outcomes

Technical Skills Developed

Serverless Architecture: Understanding Lambda execution models, cold starts, and optimization strategies Event-Driven Design: Implementing loosely coupled systems using S3 events and async processing Infrastructure as Code: Managing cloud resources declaratively with Terraform API Design: Creating RESTful interfaces with proper HTTP semantics AWS Service Integration: Working with Lambda, S3, API Gateway, and Translate services

System Design Insights

Asynchronous Processing: Learning when to use sync vs async patterns in distributed systems Error Handling: Implementing robust error handling across service boundaries Resource Management: Understanding Lambda memory, timeout, and cost optimization Security: Implementing least-privilege IAM roles and secure service communication

Challenges and Solutions

Lambda Cold Starts

Problem: Initial Lambda invocations had noticeable latency Solution: Optimized package size and memory allocation

Asynchronous Processing Complexity

Problem: Managing request lifecycle across multiple services Solution: Implemented comprehensive request tracking with unique IDs and status endpoints

IAM Permission Management

Problem: Understanding least-privilege access patterns Solution: Studied AWS security best practices and implemented role-based access control

Technical Specifications

System Capabilities

  • Processing Time: 2-5 seconds for typical translation requests
  • Supported Languages: 75+ language pairs via AWS Translate
  • File Format Support: JSON structure translation
  • Scalability: Automatic scaling based on demand
  • Error Handling: Comprehensive validation and error responses

API Reference

| Method | Endpoint | Purpose | |--------|----------|---------| | POST | /translate/text | Submit text for translation | | POST | /translate/file | Submit JSON file for translation | | GET | /translate/status/{id} | Check translation status |

Reflection

Building this AI translation system provided valuable hands-on experience with modern cloud development practices. The project successfully demonstrated serverless architecture principles, event-driven design patterns, and AWS service integration.

While created for learning purposes, the system is fully functional and demonstrates practical implementation of cloud-native patterns. The experience reinforced the importance of understanding serverless design and the power of managed services in reducing complexity while increasing scalability.

The event-driven architecture proved particularly elegant, automatically processing requests without complex orchestration while maintaining clean separation of concerns between components.


Project Repository: LanguageBridge on GitHub