Stories by Prasad Paravatha on Medium

My Favorite Tech Podcasts - 2025 edition

Prasad Paravatha — Mon, 17 Mar 2025 18:26:30 GMT

Its been a while I published my favorite tech podcasts, time to revisit the topic as the tech world has gone through AI/ML frenzy! 😎

Hard Fork from NYTimes, tech journalists Kevin Roose and Casey Newton are funny funny funny!. I call them Dave Chappele and Chris Rock of tech-journalists-turned-podcasters.
They cover wide range of tech topics, but these days like everything else, its AI heavy.

2. Kubernetes Bytes, hosts Ryan Wallner and Bhavin Shah have deep understanding of K8s eco system, their conversation style is more of a witty banter.
They also have a good run down of latest news and developments about all things K8s.

3. Practical AI, hosts Chris Benson and Daniel Whitenack are real world AI/ML practitioners. As the title suggests, they have very practical discussions and tips about AI/ML tooling, frameworks and new developments, not the AI influencer fluff from most other AI podcasts.

4. KubeFM, another great K8s podcast by host Bart Farrell. This podcast brings in leading K8s experts covering wide range of complex topics like service mesh, multi-tenancy, GitOps, platform engineering etc.

5. Latent Space, another great AI podcast by hosts swyx and Alessio, they are very prolific, churning out interesting discussions every week.
They cover both product and technical topics. If you want keep on top of the latest AI news, this is a must listen!

My MLOps Journey

Prasad Paravatha — Mon, 23 Jan 2023 03:26:20 GMT

My totally un-planned and fortunate journey into the fascinating world of MLOps

Background : ICYMI, Here is a good blurb about MLOps on wikipedia.
MLOps is still in the early stages. Different companies use MLOps, ML Platform or ML Engineering interchangeably to describe this nascent field.
I have a slightly different take purely based on my personal experience.
(Yes, every MLOps post needs a mandatory Venn Diagram as below)

MLOps is a highly multi-disciplinary field with a need to have expertise in at least 2 of the 3 sub-domains. It’s very rare to come across engineers who are good in all 3 sub-domains (I call them MLE unicorns).

My 3 year journey in to MLOps

Year 1:
It all started in January, 2018 at Discover. I was hired to work in a Data Engineering team. Due to an organizational change a new team was formed to evaluate different ML solutions/platforms. I was fortunate to be moved in to that team, and we started the exciting journey of building our own Cloud-Native Data Science Platform.
As I already had experience in Application development with building Web applications, Micro services and CI/CD pipelines, I started working on building those components of the platform. In parallel, I started shadowing other engineers in our team with deep expertise in Docker, Kubernetes, Snowflake and AWS.

Year 2:
2019 was all about developing my own knowledge of Docker, Kubernetes, Snowflake and AWS while continuing to work on Application development.
We as a team encountered lots of interesting challenges with respect to Scalability, Security and Integration of different components of the platform (Kubernetes/OpenShift, AWS and Snowflake).
After several re-designs, we built a scalable platform as a service called AIR9 with the goal to support over 1000 users (Data Scientists, Data analysts & Data Engineers).

Year 3:
By the start of 2020, I became fairly comfortable with Docker, Kubernetes/OpenShift, AWS and Snowflake.
I started getting involved in Data science and Machine learning side of the platform. I was primarily working on Spark on Kubernetes, Jupyter/Python, RStudio/R, MLflow and tuning/troubleshooting Data science and Machine learning workloads.

Summary:
We are in the middle of some of the most interesting advances in AI/ML. There is lot of hype, but the potential for great breakthroughs in AI/ML is undeniable.

Credit: Gartner

Here is my advice to someone starting or planning to start their MLOps/ML Platform journey.

Get into this field, if you are passionate and enjoy complexity.
Make a plan to develop basic competency in multi-disciplines.
Start with your area of expertise and learn from your team mates with expertise in other sub-domains.
There is no substitute to the real world experience. So, seek opportunities in the field or adjacent fields like Data engineering or Platform engineering.
Develop a good work ethic and be willing to be highly collaborative.

My Favorite Tech Podcasts!

Prasad Paravatha — Sun, 10 Apr 2022 00:33:59 GMT

Listening to podcasts is my idea of unwinding 😎

The Kubernetes Podcast from Google by Craig Box

Source : https://kubernetespodcast.com/

https://kubernetespodcast.com/ : This is by far my favorite tech podcast.
Craig makes it fun and peppy. This podcast covers wide range of topics from kubernetes to cloud-native to latest industry news. If you work with Kubernetes, make sure to listen to this!

2. Gradient Dissent by Lukas Biewald

Source : https://wandb.ai/fully-connected/podcast

https://wandb.ai/fully-connected/podcast : This is more do to with the ML industry leaders and CEOs, tends to be more philosophical, and discussions are usually about AI/ML industry trends.
Lukas has a very humble demeanor. His interview style is of a student asking their professor questions!

3. MLOps Community by Demetrios Brinkmann and Vishnu Rachakonda

Source: https://mlops.community/

https://mlops.community/watch/ : This is more than a podcast. They do meetups, live discussions, technical deep dives with in the ML Ops domain.
My favorite part is both Demetrios and Vishnu sometimes let their guests completely take over the discussions, which leads the discussions in to a more interesting directions (like this episode with Michael Del Balso and Erik Bernhardsson)

4. TWIML AI Podcast by Sam Charrington

Source: https://twimlai.com/shows/

https://twimlai.com/shows/ : This podcast covers wide range of topics in AI/ML with guests from academia, research organizations, industry leaders and technology executives. You will get to learn about some very interesting and new applications of AI/ML.

5. The Data Exchange by Ben Lorica

Source: https://gradientflow.com/podcast/

https://gradientflow.com/podcast/ : This podcast covers AI/ML and Data engineering. Most guests tend to be start up founders and big tech executives in AI/ML and Data engineering domain, and the discussion are usually very high level.

GitHub Copilot: Your friendly neighborhood AI pair programmer!

Prasad Paravatha — Fri, 29 Oct 2021 13:10:43 GMT

First impressions : GitHub’s Co-Pilot using OpenAI Codex is a game changer.

Source: GitHub

Background : From Wikipedia - GitHub Copilot is an artificial intelligence tool developed by GitHub and OpenAI to assist users of Visual Studio Code by autocompleting code. GitHub Copilot uses OpenAI Codex, a modified version of GPT-3 (a language model designed to produce human-like text) that is designed to produce valid computer code. Copilot is trained on public GitHub repositories of any license.

You write comments, it pops out code. Thats one upside of this, engineers finally writing comments in their code!

Co-Pilot in VS code: Writing Python code

Co-Pilot in VS code: Writing Go code

Co-Pilot in IntelliJ: Writing Java code

Amazon SageMaker: Promising, yet long way to go!

Prasad Paravatha — Tue, 05 Oct 2021 23:30:34 GMT

Great for ML Engineers, too technical for Data Scientists/Analysts

Source : Amazon Sagemaker

Background: I spent 2 days playing around with different Sagemaker features.
There are way too many features and examples to try. TMI AWS, TMI!

My first take-away : This is really well designed for Engineers by Engineers. Not just any engineers, but ML Engineers who are well versed with AWS services, and good understanding of Docker/DevOps. You basically have to be an MLE Unicorn.
My Second take-away: This is a very expensive tool, could potentially burn the entire IT budget of an org with in 1 month by less than 10–20 users.
My Third take-away: Deep Vendor lock-in, AWS is THE best Cloud provider when it comes to infra, so not a big deal for now. Could be an issue in the near future as Azure is catching up

Design:
1. Product design is not great as the entire setup seems too technical for Data Scientists and/or Data Analysts. Good strategy to get more people to try AWS certifications ;-)
2. UX/UI design is done really well , the dashboards and different pages/menus are well organized.

1. Model Training on CPU

1.1: Model Training works well and its organized well

1.2: Nice Visualizations to monitor resource utilization

1.3: You can launch experiments from Jupyter notebooks:
* My favorite feature: you can switch instance type and kernel type in JupyterLab
* Not so favorite: for some reason they are using old Jupyter Lab version 1.2.x

1.4: Hyper parameter tuning works well
* I couldn’t figure out how to compare experiments

1.5: Feature Engineering example worked
* Definitely need decent understanding of Spark and Kafka

1.6: Training on GPU is too technical, worked only for some use-cases.
* None of the Ray examples worked
* Hugging face PyTorch examples worked well

2. Model Deployment = Meh!

2.1 This is where things got a bit confusing.
* It seemed like, you’d use Sagemaker endpoints to do some ad-doc testing
* I couldn’t find any kind of workflow orchestration for model deployments

2.2 Ad-hoc deployment : Failed and aborted

Kubeflow 1.3 : Shiny, Secure and Scalable

Prasad Paravatha — Wed, 28 Apr 2021 23:55:59 GMT

Kubeflow 1.3 : Shiny, Secure and Scalable

Improved UX, UI, Security and support for hybrid-cloud

Background: I tried Kubeflow 1.0 in May 2020, with a narrow focus of Cloud native ML pipelines.
With the latest Kubeflow 1.3 release, they streamlined the setup process, improved security and user experience. Even with these updates, there is still a learning curve for non-technical/non-engineering users. Another improvement is the ability to pick and choose the components you want to install.
IMO, the ideal use-case is a cross-functional Data Science team with a mix of Platform Engineers, ML Engineers and Data Scientists.

Setup and Improved UX/UI:
Relatively easy to setup compared to version 1.0, easy to understand UX and responsive UI.
I tried this on a GKE cluster with 6 nodes (Total Cluster resources = 12 cores, 60GB).
Oh, they now have ready to use JuptyterLab, Open-source VS code (Code-Server) and RStudio.

Code-Server (VS Code in the browser)

Compared to version 1.0, Ready to use Pipelines and Experiments work seamlessly. There is a lot of customization you can do (which I did not explore).

Other updates and enhancements

Katib is now AutoML (for hyper parameter tuning)
Volumes to create and attach Object storage or PV/PVC
Tensorboards for Visualizations (I am not sure how widely this will be used)
KFP comes in 2 flavors
* KFP with Argo workflows as default version.
* KFP with Tekton (This is an additional option with contributions from IBM and RedHat)
Runs
* Adhoc manual runs
* Scheduled runs (Using Argo Workflows)
Metrics and Metadata UI have improved as well

Summary:
There is a renewed interest in Kubeflow from almost every major cloud services company (AWS, GCP, Azure, IBM, RedHat/OpenShift). In addition to these cloud providers, you can now run Kubeflow as an Operator or on Microk8s, MiniKF, Kind etc.

In Defense of Agile Methodologies

Prasad Paravatha — Mon, 12 Apr 2021 01:47:40 GMT

How Agile transformed Tech industry and super charged software delivery

Agile practices in the wild

Background : I first came across Agile methodologies during my graduate school program (Distributed and Multimedia Information Systems at Heriot-Watt University in Edinburgh, UK) . My first thoughts after learning the concepts such as Agile unified process (AUP), Extreme programming (XP), Test-driven development (TDD) and Pair programming was, why would companies/teams use anything other than these new software development practices.
Later, when I worked at PayPal, Yodlee and CareFirst BCBS, I could clearly see the benefits of using Agile methodologies in the real world. These companies had very mature Agile practices and used them very effectively to speed up development, get rid of inefficiencies and deliver features quickly to end users.

Side note : If you are a developer/engineer, I highly recommend reading one of the best books about writing code in an Agile environment : Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin (a proponent of agile software development)

A case for Agile:

Success stories

PayPal transformed from a relatively small start-up to one the most innovative FinTech companies by embracing Agile at Enterprise level (White paper). Not into reading research papers?, You can watch these videos : short-video and long-video.
Spotify even created their own version of Agile : The Spotify model
Not just Tech companies, but Media companies like NPR and NYTimes use Agile/Scrum at the enterprise level.
CareFirst BCBS went through similar transformation using Agile/SCRUM practices.

Agile is as much of a mind-set as it is a software development methodology

In my experience, people stuck in old ways (often lazy thinking) tend to resist adopting to Agile practices.
The beauty of Agile is, its flexible and variant, and comes in different flavors such as Scrum, Kanban, Scrumban, SAFe etc. to suit your needs.

Agile and high performant teams
These days engineering teams and project management teams use the below offshoots of Agile almost everyday.

Rapid prototyping and Proof of Concepts (RAD: Rapid-application development)
Code reviews (Pair programming).
Automated Testing (TDD:Test Driven Development and BDD : Behavior-driven development)
Modern DevOps (Continuous Integration, Continuous Delivery, Continuous Deployment)
Collaboration (Cross-functional teams)
Speed to market (Time boxing, Iterative development and incremental delivery)

Summary
IMO, In the rapidly evolving technology sector, The Manifesto for Agile Software Development is ever more imporant. In order for Agile practices to take root and produce results, the top leadership of that company need to push for them. Once you decide and define which Agile framework works for your company/teams (this could take several years), it changes work culture, empowers individual teams to make certain decisions and speeds up software delivery.

Feast: Setup your own ML Feature store on Kubernetes

Prasad Paravatha — Tue, 30 Mar 2021 02:11:49 GMT

A promising cloud-based open-source ML Feature store solution!

History: Feast has been through several revisions in the past year. With the current version (0.9), its possible to setup end-to-end on a barebones k8s cluster.
Feast team is currently working on version 0.10 to be released in April 2021 (which is expected to further simplify the architecture and the setup). There are companies around the world that are already using Feast or in the process of integrating.

Background (From Feast website): Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production.

As you can see below, even with the basic use-case, there is lot happening with in Feast in terms of technology stack.

Below Architecture diagram provides additional technologies you could plug-in based on your requirements.

Setup :
I used a 6 node GKE cluster (version 1.18.16-gke.302) with GS bucket and Spark k8s operator.
I also tried using a barebones k8s cluster with Spark in standalone mode using local tmp staging/work folders attached to a k8s node.

Jupyter+Kafka+Postgres+Redis+Spark+Grafana+Prometheus

After you create features, save them in the data-lake and fetch the features using a Jupyter notebook, you can check Prometheus metrics and Grafana dashboards!

As an optional setup, Feast integrates with Prometheus and Grafana to provide visualizations

Grafana dashboard with Feast integration

Summary and next steps:
Feast came a long way from the initial versions in 2019. Till version 0.9 came along, it was closely tied to a particular cloud provider (GCP) and/or set of a technologies.
Feast team has been working hard to simplify a very complex use-case, de-couple technologies and make it cloud-provider agnostic. Their RFC for version 0.10 looks very promising.

Blind spots in ML : Data Engineers & Data Scientists

Prasad Paravatha — Thu, 04 Mar 2021 14:31:39 GMT

Blind spots in ML : Data Engineers & Data Scientists

Basic mis-understandings of each others work

Background: For the past 3 years, I have been working at the intersection of Cloud (AWS, OpenShift/Kubernetes, Docker, Snowflake), Software Engineering (UI and Microservices), and Data Science (Python, Spark, H2O, R, SAS). Its been a great learning experience working with talented engineers in building AIR9 Data Science Platform.

This post is about my observations working with Data Engineers and Data Scientists/Analysts, and their blind-spots when it comes to Machine learning projects.
Side note #1: There is an interesting back story to the word “data scientist”.
Side note #2: IMO, Data Engineer is a simplistic term, Data Platform Engineer is more accurate.

Resource requirements : CPU & RAM
Engineers: Why do you need instances for more than 1 hour?
Scientists: Data science is an iterative process. It’s not a straight line, things go in circular and/or zigzag pattern.
I’d side with Data Scientists on this as ML workloads are bursty and resource requirements fluctuate.
Size of Datasets : Number of columns and records
Scientists: My code worked last week on a similar instance with a very similar dataset of 1 million records, why is it not working now? Why is it taking longer to complete?
Engineers: Are number of columns same in the dataset from last week?
Are the column types similar? is your dataset in compressed or uncompressed format?
I’d side with Data Engineers on this as Data scientists tend to underestimate the resource requirements w.r.t datasets (Every tool has different memory requirements: h2o needs 3–4 times RAM the size of the dataset, SASStudio doesn’t need high RAM to dataset ratios).
Dependency management: Packages and Extensions
Scientists: Why am I not allowed to download packages from the internet? Why my new packages are breaking my code?
Engineers: Some of the Packages and Extensions have security vulnerabilities and backward compatibility issues. Circular/Cyclic dependencies are also a big factor in Python (and R) w.r.t to security and backward compatibility (Java/Scala handle this a bit better than other languages).
I’d side with Data Engineers on this, but Data scientists are rapidly gaining knowledge on how to use them and work around this.
Coding best practices:
Engineers: Why did you hardcode this configuration? Why did you write this segment of code in this way?
Scientists: I had to do it this way for data preparation, feature engineering and model tuning steps.
I’d side with Data Scientists on this, Data Engineers are not knowledgable about the typical ML lifecycle.

RayML using Kubernetes operator

Prasad Paravatha — Mon, 15 Feb 2021 00:48:25 GMT

RayML + Kubernetes = Finally, a truly scalable Distributed ML solution

Background (From Ray website): Ray is an open-source distributed execution framework that makes it easy to scale your applications and to leverage state of the art machine learning libraries. Ray provides a simple, universal API for building distributed applications (supports Python and Java API).

I first came across RayML on Software Engineering daily podcast in July 2020.
In September 2020, I attended Ray Summit 2020 organized by Anyscale (A startup founded by the creators of Ray from the UC Berkeley RISELab, the successor to the AMPLab, that created Apache Spark and Databricks).
In one of the Ray summit sessions about running on Ray on k8s, several attendees requested Anyscale/Ray team to revive the k8s operator. Which they did in the short span of 3 months and now it’s available in betaish phase.
There is still engineering work being done for different use-cases. Anyscale/Ray team allocated time and resources to make enhancements to enable k8s as a cluster manager.

Review: This post is a primarily about my experience with setup and running ML workloads using Ray k8s operator.

Benefits of using Ray k8s operator as a cluster manager:
1. Provides ability to run production grade ML workloads.
2. Autoscaler will horizontally scale pods.
3. Dashboard provides CPU/Memory resource usage and logs.
4. Network and Security policies can be set at the namespace and/or cluster level.
5. Relatively easier dependency management, you can pick and choose versions and underlying package dependencies.
6. More efficient allocation and usage of resources
7. Single tenant dedicated clusters.

Setup : I used GKE cluster with 5 nodes (each with 2 cores & 8 GB) and Helm to setup Jupyter pod.

Ray Dashboard: Ray dashboard dynamically refreshes summary (CPU, Memory, Disk) for head and worker pods. You can also see the status of each pod, jobs and very detailed logs in one place. Below, I have 1 head pod and 2 worker pods. (you can pick min and max workers when creating your single tenant cluster)

Ray Workload: I tried a relatively simple hyper parameter tuning code with num_evaluations=20. The autoscaler kicked in and added a new worker pod dynamically.

Ray Dashboard with ML run: Below, I have 1 head pod & 3 worker pods while hyper parameter tuning code is running.

After the run is completed. The cluster automatically scaled down to 2 worker pods.

Summary: Ray is the buzziest new tech (14K stars on GitHub) in recent times, with a lot of potential as its built from ground up for Cloud computing and ML technologies. As Ray provides Java and Python API, it could be used beyond ML use-cases such as ETL and other Analytics workloads.