This document summarizes Henning Jacobs' talk on running Kubernetes in production and the many ways clusters can crash. It describes several incidents Zalando faced with their Kubernetes clusters that led to outages, including API server issues causing ingress problems, etcd deletion causing cluster downtime, EC2 networking issues, image pulling failures, and credential processing bottlenecks preventing deployments. Each incident highlighted lessons around disaster recovery planning, automated testing of upgrades, monitoring cloud infrastructure, and avoiding resource starvation.