SageMaker ML Pipeline
A data science team builds an ML pipeline for iris flower classification. Container images are built through CI/CD pipelines - not locally. Source code lands in CodeCommit, CodePipeline triggers CodeBuild to build Docker images with privileged_mode = true, and the resulting images are pushed to ECR. SageMaker then trains models using those images, deploys inference endpoints backed by running Docker containers, and serves real predictions via SageMaker Runtime. This scenario validates the full SageMaker lifecycle integrated with the CI/CD toolchain.
Services
| Service | Role |
|---|---|
| SageMaker | Model, EndpointConfig, Endpoint, TrainingJob, ProcessingJob, TransformJob |
| SageMaker Runtime | InvokeEndpoint with request proxying to Docker-hosted inference container |
| ECR | Training image repository, inference image repository |
| CodeCommit | Training source repo, inference source repo |
| CodeBuild | Docker image builds with privileged_mode = true for both training and inference |
| CodePipeline | Source-to-Build pipelines for training and inference images |
| IAM | SageMaker execution role, CodeBuild role, CodePipeline role |
| S3 | Training data bucket (iris CSV), model artifacts bucket, pipeline artifact bucket |
Architecture
CodeCommit (training src) CodeCommit (inference src)
| |
CodePipeline CodePipeline
| |
CodeBuild CodeBuild
(docker build) (docker build)
| |
ECR (training image) ECR (inference image)
\ /
\ /
SageMaker Training Job
(ECR training image -> Docker container)
(reads CSV from S3 /opt/ml/input/)
(writes model.json to S3 /opt/ml/model/)
|
SageMaker Model
(ECR inference image)
|
SageMaker EndpointConfig
(production variant: "primary")
|
SageMaker Endpoint (InService)
(Docker inference container running)
|
SageMaker Runtime InvokeEndpoint
(POST /invocations -> prediction response)
What This Validates
- CI/CD-driven container image builds landing in ECR before SageMaker resource creation
- SageMaker training jobs backed by real Docker containers reading from S3 and writing artifacts
- Endpoint deployment transitioning through
CreatingtoInServicewith a live Docker container - Real-time inference via SageMaker Runtime proxying requests to the Docker-hosted inference server
- Mid-flight training job stop reaching
Stoppedstatus - Endpoint update swapping the model config and returning to
InService - Processing jobs and batch transform jobs completing with correct state transitions
- Duplicate model name rejection (
ResourceInUse) and missing resource errors - Full observability: training duration metrics, endpoint modification timestamps, tag round-trips
Application
Training container
A Python script implementing the SageMaker training contract. It reads a CSV dataset from /opt/ml/input/data/train/, computes per-class feature centroids as a nearest-centroid model, writes model.json to /opt/ml/model/, and exits with code 0 on success.
Inference container
A Python HTTP server implementing the SageMaker inference contract:
| Method | Path | Description |
|---|---|---|
GET |
/ping |
Health check - returns 200 OK |
POST |
/invocations |
Accepts {"features": [sepal_length, sepal_width, petal_length, petal_width]}, returns {"prediction": "species", "features": [...]} |
Test Coverage
Tests run across four scripts: smoke checks (model ARN, EndpointConfig variant count, endpoint InService status, training data in S3, images in ECR), integration tests (training lifecycle, training stop, real-time inference, endpoint update, list operations, processing job, batch transform, error handling), CI/CD tests (pipeline execution status, CodeCommit repository existence, ECR image presence), and observability tests (tag round-trips, training duration after completion, endpoint LastModifiedTime change after update).