SageMaker ML Pipeline

A data science team builds an ML pipeline for iris flower classification. Container images are built through CI/CD pipelines - not locally. Source code lands in CodeCommit, CodePipeline triggers CodeBuild to build Docker images with privileged_mode = true, and the resulting images are pushed to ECR. SageMaker then trains models using those images, deploys inference endpoints backed by running Docker containers, and serves real predictions via SageMaker Runtime. This scenario validates the full SageMaker lifecycle integrated with the CI/CD toolchain.

Services

Service	Role
SageMaker	Model, EndpointConfig, Endpoint, TrainingJob, ProcessingJob, TransformJob
SageMaker Runtime	InvokeEndpoint with request proxying to Docker-hosted inference container
ECR	Training image repository, inference image repository
CodeCommit	Training source repo, inference source repo
CodeBuild	Docker image builds with `privileged_mode = true` for both training and inference
CodePipeline	Source-to-Build pipelines for training and inference images
IAM	SageMaker execution role, CodeBuild role, CodePipeline role
S3	Training data bucket (iris CSV), model artifacts bucket, pipeline artifact bucket

Architecture

CodeCommit (training src)     CodeCommit (inference src)
        |                               |
   CodePipeline                    CodePipeline
        |                               |
   CodeBuild                       CodeBuild
   (docker build)                  (docker build)
        |                               |
   ECR (training image)           ECR (inference image)
                    \                 /
                     \               /
                  SageMaker Training Job
                  (ECR training image -> Docker container)
                  (reads CSV from S3 /opt/ml/input/)
                  (writes model.json to S3 /opt/ml/model/)
                          |
                    SageMaker Model
                    (ECR inference image)
                          |
                  SageMaker EndpointConfig
                  (production variant: "primary")
                          |
                  SageMaker Endpoint (InService)
                  (Docker inference container running)
                          |
             SageMaker Runtime InvokeEndpoint
             (POST /invocations -> prediction response)

What This Validates

CI/CD-driven container image builds landing in ECR before SageMaker resource creation
SageMaker training jobs backed by real Docker containers reading from S3 and writing artifacts
Endpoint deployment transitioning through Creating to InService with a live Docker container
Real-time inference via SageMaker Runtime proxying requests to the Docker-hosted inference server
Mid-flight training job stop reaching Stopped status
Endpoint update swapping the model config and returning to InService
Processing jobs and batch transform jobs completing with correct state transitions
Duplicate model name rejection (ResourceInUse) and missing resource errors
Full observability: training duration metrics, endpoint modification timestamps, tag round-trips

Application

Training container

A Python script implementing the SageMaker training contract. It reads a CSV dataset from /opt/ml/input/data/train/, computes per-class feature centroids as a nearest-centroid model, writes model.json to /opt/ml/model/, and exits with code 0 on success.

Inference container

A Python HTTP server implementing the SageMaker inference contract:

Method	Path	Description
`GET`	`/ping`	Health check - returns 200 OK
`POST`	`/invocations`	Accepts `{"features": [sepal_length, sepal_width, petal_length, petal_width]}`, returns `{"prediction": "species", "features": [...]}`

Test Coverage

Tests run across four scripts: smoke checks (model ARN, EndpointConfig variant count, endpoint InService status, training data in S3, images in ECR), integration tests (training lifecycle, training stop, real-time inference, endpoint update, list operations, processing job, batch transform, error handling), CI/CD tests (pipeline execution status, CodeCommit repository existence, ECR image presence), and observability tests (tag round-trips, training duration after completion, endpoint LastModifiedTime change after update).