Data Engineer — Pipeline Reliability & Operations
I build data pipelines with a focus on failure recovery, observability, and operational correctness in real-world execution environments.
Auto-recovering ingestion pipeline for real-time and batch cryptocurrency market data. [View Repository]
- The Engineering: Operated long-running WebSocket consumers handling trade and ticker streams; candle data processed via batch pipelines.
- The Fix: Diagnosed consumer downtime caused by manual process execution; migrated to Linux-managed services (Systemd) to enable auto-restart.
- The Output: Built Silver/Gold aggregation layers in Snowflake for downstream analytics.
Stack: Python, Kafka, Airflow, dbt, Snowflake, AWS EC2, GCP Compute Engine.
Team-based pipeline integrating macro-economic datasets. [View Repository]
- The Engineering: Ingested NASDAQ, S&P 500, and crypto data via Airflow.
- The Quality: Implemented dbt tests for data validation and designed Slack-based alerting for data anomalies.
Stack: Python, Airflow, dbt, Snowflake, Slack API.
Serverless batch processing pipeline. [View Repository]
- The Architecture: S3 Raw Staging → Snowflake Loading → Preset Visualization.
- The Goal: Optimized for cost by utilizing S3 as a staging layer before warehouse compute.
Stack: Python, AWS S3, Snowflake, Preset.
I maintain a daily log of technical challenges, specifically focusing on pipeline failures and infrastructure debugging.
- LinkedIn: linkedin.com/in/yeoreumsong
- Email: yeoreum.mail@gmail.com


