Skip to content
View yeoreums's full-sized avatar

Block or report yeoreums

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
yeoreums/README.md

Hi, I’m Yeoreum Song 👋

Data Engineer — Pipeline Reliability & Operations

I build data pipelines with a focus on failure recovery, observability, and operational correctness in real-world execution environments.


🛠 Technical Stack

Python SQL Airflow dbt Kafka Snowflake AWS GCP Docker


📂 Engineering Projects

1. Upbit Real-Time & Batch Data Pipeline

Auto-recovering ingestion pipeline for real-time and batch cryptocurrency market data. [View Repository]

  • The Engineering: Operated long-running WebSocket consumers handling trade and ticker streams; candle data processed via batch pipelines.
  • The Fix: Diagnosed consumer downtime caused by manual process execution; migrated to Linux-managed services (Systemd) to enable auto-restart.
  • The Output: Built Silver/Gold aggregation layers in Snowflake for downstream analytics.

Stack: Python, Kafka, Airflow, dbt, Snowflake, AWS EC2, GCP Compute Engine.


2. Economic Data Warehouse

Team-based pipeline integrating macro-economic datasets. [View Repository]

  • The Engineering: Ingested NASDAQ, S&P 500, and crypto data via Airflow.
  • The Quality: Implemented dbt tests for data validation and designed Slack-based alerting for data anomalies.

Stack: Python, Airflow, dbt, Snowflake, Slack API.


3. KMA Weather Data Lake

Serverless batch processing pipeline. [View Repository]

  • The Architecture: S3 Raw Staging → Snowflake Loading → Preset Visualization.
  • The Goal: Optimized for cost by utilizing S3 as a staging layer before warehouse compute.

Stack: Python, AWS S3, Snowflake, Preset.


📚 Continuous Learning (TIL)

I maintain a daily log of technical challenges, specifically focusing on pipeline failures and infrastructure debugging.

👉 Visit my TIL Repository


📬 Contact

Pinned Loading

  1. DE7-team6-final/upbit-data-pipeline DE7-team6-final/upbit-data-pipeline Public

    Real-time & batch crypto market data pipeline with streaming alerts and analytics dashboards

    Python 3

  2. binance-realtime-anomaly binance-realtime-anomaly Public

    Real-time BTC anomaly detection using Isolation Forest on Binance WebSocket streams. Behavior-based market regime detection validated with statistical significance testing across 15,000+ windows.

    Python

  3. DE7-Team8-8bit/economy-etl DE7-Team8-8bit/economy-etl Public

    Economic Indicators ETL Pipeline: Collect, transform, and load data (NASDAQ, S&P500, Crypto, FX, Interest, Gold, Oil, Dollar Index) with Airflow, Snowflake, EC2, and GitHub Actions

    Python

  4. life-in-weeks life-in-weeks Public

    A deterministic model of time that visualizes life as a finite timeline based on user-defined assumptions

    CSS

  5. DE7-2nd/KMA-Data-Viz DE7-2nd/KMA-Data-Viz Public

    기상청데이터를 바탕으로 시각화 대시보드 구성 프로젝트

    Jupyter Notebook

  6. TIL TIL Public

    Today I learned

    Python