Trade Surveillance Lakehouse

A capital-markets trade surveillance and regulatory reporting platform that ingests market events through Kinesis, detects suspicious trading patterns (wash trades, spoofing) via Kinesis Analytics V2 SQL, delivers raw data to S3 via Firehose, enriches events with Lambda, catalogs data with Glue crawlers, and runs compliance queries with Athena. The scenario exercises Simfra's full streaming analytics stack from real-time ingestion through pattern detection to interactive SQL queries.

Services

Service Role
Kinesis Data Streams 2-shard market event ingestion stream + 1-shard alerts output stream, both KMS encrypted
Kinesis Analytics V2 SQL-based real-time pattern detection for wash trades and spoofing bursts
Kinesis Data Firehose Durable delivery from Kinesis to S3 raw layer with JSONL formatting
S3 Data lake with raw, curated, alerts, reports, and athena-results partitions - all SSE-KMS
Glue Data catalog with 3 crawlers (raw, curated, reports) for schema inference
Athena Workgroup and 5 named surveillance queries for compliance analytics
Lambda 4 Python 3.12 functions: producer, enricher, alert publisher, report generator
EventBridge Scheduled rules for producer (1-min) and report generator (5-min)
SNS Surveillance alert topic with KMS encryption
CloudWatch Logs Pipeline observability for all Lambda functions
KMS Single customer-managed key for all data at rest
IAM 10 least-privilege roles scoped per function
CodeCommit Lambda source repository
CodeBuild Lambda function packager
CodePipeline Source to build deployment pipeline

Architecture

EventBridge (1-min schedule) --> Producer Lambda
                                    |
                                    v
                              Kinesis Stream (2 shards, KMS)
                             /          |           \
                            v           v            v
                      Firehose    Lambda ESM     KAV2 SQL
                         |       (enricher)    (pattern detector)
                         v           |              |
                   S3 raw/           v              v
                  (JSONL)      S3 curated/    Kinesis alerts stream
                                (Hive-        /           \
                              partitioned)   v             v
                                        Lambda ESM    (consumers)
                                     (alert_publisher)
                                        /        \
                                       v          v
                                    SNS topic   S3 alerts/

EventBridge (5-min schedule) --> Report Generator Lambda --> Athena --> S3 reports/

Glue Crawlers --> Glue Data Catalog (raw, curated, reports tables)
CodeCommit --> CodeBuild --> CodePipeline

The scenario uses a triple-consumer architecture on the market events stream: Firehose handles raw archival with time-based S3 prefixes, Lambda enriches events with risk scoring and writes Hive-partitioned output, and Kinesis Analytics V2 performs real-time SQL pattern detection. Detected patterns flow to a separate alerts stream consumed by another Lambda that publishes to SNS and archives to S3.

What This Validates

  • Kinesis stream ingestion with KMS encryption and multi-shard parallelism
  • Kinesis Analytics V2 SQL application consuming from and producing to Kinesis streams
  • Real-time pattern detection with tumbling window SQL (wash trades, spoofing bursts)
  • Firehose delivery from Kinesis source to S3 with JSONL formatting
  • Triple consumption from the same Kinesis stream (Firehose + Lambda ESM + KAV2)
  • Lambda event source mapping with Kinesis, processing batches of records
  • Lambda enrichment with derived fields (risk score, venue category, regulatory flag)
  • Hive-partitioned S3 data layout recognized by Glue and queryable by Athena
  • Glue crawler schema inference and partition detection on S3 data
  • Athena SQL queries with aggregations, filtering, and surveillance-specific analytics
  • EventBridge scheduled Lambda invocation on recurring intervals
  • SNS topic publishing with KMS encryption
  • End-to-end CI/CD pipeline with CodeCommit, CodeBuild, and CodePipeline

Test Coverage

Tests include smoke checks for all 18 resources (both Kinesis streams, Firehose, S3 bucket, KAV2 application, Glue database, 3 crawlers, Athena workgroup, 4 Lambda functions, SNS topic, CodePipeline, CloudWatch log group, KMS key), integration tests for the full pipeline (producer invocation, Firehose flush and S3 delivery, enricher output with risk scoring, KAV2 application status, Glue crawler runs with schema and partition verification, Athena queries including wash-trade detection), security tests for KMS encryption across all services and per-function IAM role scoping with negative permission checks, and performance tests with burst ingestion (5 parallel producer invocations) and concurrent Athena queries.