<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Prasad Paravatha on Medium]]></title>
        <description><![CDATA[Stories by Prasad Paravatha on Medium]]></description>
        <link>https://medium.com/@paravatha?source=rss-498285cfc25d------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*YNb93HwA5KS6BVZyGjFwfg.jpeg</url>
            <title>Stories by Prasad Paravatha on Medium</title>
            <link>https://medium.com/@paravatha?source=rss-498285cfc25d------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Tue, 14 Apr 2026 06:45:10 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@paravatha/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[My Favorite Tech Podcasts - 2025 edition]]></title>
            <link>https://paravatha.medium.com/my-favorite-tech-podcasts-2025-edition-ae3ae8a4cfef?source=rss-498285cfc25d------2</link>
            <guid isPermaLink="false">https://medium.com/p/ae3ae8a4cfef</guid>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[platform-engineering]]></category>
            <category><![CDATA[tech-podcasts]]></category>
            <category><![CDATA[ai]]></category>
            <category><![CDATA[kubernetes]]></category>
            <dc:creator><![CDATA[Prasad Paravatha]]></dc:creator>
            <pubDate>Mon, 17 Mar 2025 18:26:30 GMT</pubDate>
            <atom:updated>2025-03-18T00:11:48.168Z</atom:updated>
            <content:encoded><![CDATA[<p>Its been a while I published my <a href="https://paravatha.medium.com/my-favorite-tech-podcasts-7c2f9fde9aad">favorite tech podcasts</a>, time to revisit the topic as the tech world has gone through AI/ML frenzy! 😎</p><ol><li><a href="https://www.nytimes.com/column/hard-fork"><strong>Hard Fork from NYTimes</strong></a>, tech journalists<strong> Kevin Roose</strong> and <strong>Casey Newton</strong> are funny funny funny!. I call them <em>Dave Chappele</em> and <em>Chris Rock</em> of tech-journalists-turned-podcasters. <br>They cover wide range of tech topics, but these days like everything else, its AI heavy.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/640/0*HpNiaDoC5irGgkgj" /></figure><p><strong>2. </strong><a href="https://www.kubernetesbytes.com"><strong>Kubernetes Bytes</strong></a>, hosts <strong>Ryan Wallner</strong> and <strong>Bhavin Shah</strong> have deep understanding of K8s eco system, their conversation style is more of a witty banter.<br>They also have a good run down of latest news and developments about all things K8s.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*71K8PqoHQcxhCZLH66xjgA.jpeg" /></figure><p><strong>3. </strong><a href="https://practicalai.fm/"><strong>Practical AI</strong></a>, hosts <strong>Chris Benson</strong> and <strong>Daniel Whitenack</strong> are real world AI/ML practitioners. As the title suggests, they have very practical discussions and tips about AI/ML tooling, frameworks and new developments, not the AI influencer fluff from most other AI podcasts.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*qGHSPsaEhJLrKe4lDy-sGA.png" /></figure><p><strong>4. </strong><a href="https://kube.fm/episodes"><strong>KubeFM</strong></a><strong>, </strong>another great K8s podcast by host <strong>Bart Farrell</strong>. This podcast brings in leading K8s experts covering wide range of complex topics like service mesh, multi-tenancy, GitOps, platform engineering etc.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/430/0*65zfBE3KxEyngdde.png" /></figure><p>5. <a href="https://www.latent.space/podcast"><strong>Latent Space</strong></a><strong>, </strong>another great AI podcast by hosts <strong>swyx</strong> and <strong>Alessio, </strong>they are very prolific, churning out interesting discussions every week. <br>They cover both product and technical topics. If you want keep on top of the latest AI news, this is a must listen!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*by8Cn2wwr5MRzEhi.png" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ae3ae8a4cfef" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[My MLOps Journey]]></title>
            <link>https://paravatha.medium.com/my-mlops-journey-d058ad81b6f3?source=rss-498285cfc25d------2</link>
            <guid isPermaLink="false">https://medium.com/p/d058ad81b6f3</guid>
            <category><![CDATA[mlops]]></category>
            <category><![CDATA[ml-platform]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[mlops-platform]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Prasad Paravatha]]></dc:creator>
            <pubDate>Mon, 23 Jan 2023 03:26:20 GMT</pubDate>
            <atom:updated>2023-01-24T03:56:02.630Z</atom:updated>
            <content:encoded><![CDATA[<p>My totally un-planned and fortunate journey into the fascinating world of MLOps</p><p><strong><em>Background : </em></strong>ICYMI, Here is a good blurb about <a href="https://en.wikipedia.org/wiki/MLOps">MLOps on wikipedia</a>.<br>MLOps is still in the early stages. Different companies use MLOps, ML Platform or ML Engineering interchangeably to describe this nascent field. <br>I have a slightly different take purely based on my personal experience. <br>(Yes, every MLOps post needs a mandatory Venn Diagram as below)</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*ckZp7AoJ5NpQ0gDeTj3VKg.jpeg" /></figure><p>MLOps is a highly multi-disciplinary field with a need to have expertise in at least 2 of the 3 sub-domains. It’s very rare to come across engineers who are good in all 3 sub-domains (I call them <a href="https://www.linkedin.com/posts/paravatha_machinelearning-kudos-activity-6844620987221340160-qcTO/">MLE unicorns</a>).</p><p><strong>My 3 year journey in to MLOps</strong></p><p><strong>Year 1</strong>: <br>It all started in January, 2018 at Discover. I was hired to work in a Data Engineering team. Due to an organizational change a new team was formed to evaluate different ML solutions/platforms. I was fortunate to be moved in to that team, and we started the exciting journey of building our own <a href="https://aws.amazon.com/solutions/case-studies/discover-financial-services-case-study/">Cloud-Native Data Science Platform</a>. <br>As I already had experience in Application development with building Web applications, Micro services and CI/CD pipelines, I started working on building those components of the platform. In parallel, I started shadowing other engineers in our team with deep expertise in Docker, Kubernetes, Snowflake and AWS.</p><p><strong>Year 2:</strong> <br>2019 was all about developing my own knowledge of Docker, Kubernetes, Snowflake and AWS while continuing to work on Application development.<br>We as a team encountered lots of interesting challenges with respect to Scalability, Security and Integration of different components of the platform (Kubernetes/OpenShift, AWS and Snowflake). <br>After several re-designs, we built a scalable <a href="https://en.wikipedia.org/wiki/Platform_as_a_service">platform as a service</a> called <a href="https://cloud.redhat.com/hubfs/Discover-OpenShift-Commons-Oct28-Final.pdf">AIR9</a> with the goal to support over 1000 users (Data Scientists, Data analysts &amp; Data Engineers).</p><p><strong>Year 3: <br></strong>By the start of 2020, I became fairly comfortable with Docker, Kubernetes/OpenShift, AWS and Snowflake. <br>I started getting involved in Data science and Machine learning side of the platform. I was primarily working on Spark on Kubernetes, Jupyter/Python, RStudio/R, MLflow and tuning/troubleshooting Data science and Machine learning workloads.</p><p><strong>Summary:<br></strong>We are in the middle of <a href="https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2022-gartner-hype-cycle">some of the most interesting advances in AI/ML</a>. There is lot of hype, but the potential for great breakthroughs in AI/ML is undeniable.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zpSWcHThMtwMKf-slPxLOg.png" /><figcaption>Credit: Gartner</figcaption></figure><p>Here is my advice to someone starting or planning to start their MLOps/ML Platform journey.</p><ul><li>Get into this field, if you are passionate and enjoy complexity.</li><li>Make a plan to develop basic competency in multi-disciplines.</li><li>Start with your area of expertise and learn from your team mates with expertise in other sub-domains.</li><li>There is no substitute to the real world experience. So, seek opportunities in the field or adjacent fields like Data engineering or Platform engineering.</li><li>Develop a good work ethic and be willing to be highly collaborative.</li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d058ad81b6f3" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[My Favorite Tech Podcasts!]]></title>
            <link>https://paravatha.medium.com/my-favorite-tech-podcasts-7c2f9fde9aad?source=rss-498285cfc25d------2</link>
            <guid isPermaLink="false">https://medium.com/p/7c2f9fde9aad</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[technology]]></category>
            <category><![CDATA[kubernetes]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[podcast]]></category>
            <dc:creator><![CDATA[Prasad Paravatha]]></dc:creator>
            <pubDate>Sun, 10 Apr 2022 00:33:59 GMT</pubDate>
            <atom:updated>2022-04-10T03:33:33.598Z</atom:updated>
            <content:encoded><![CDATA[<p>Listening to podcasts is my idea of unwinding 😎</p><ol><li><strong>The Kubernetes Podcast from Google</strong> by <a href="https://twitter.com/craigbox"><strong>Craig Box</strong></a></li></ol><figure><img alt="Image showing kubernetes podcast" src="https://cdn-images-1.medium.com/max/316/1*5t61dy2em_cTKaI1z9iZGQ.jpeg" /><figcaption>Source : <a href="https://kubernetespodcast.com/">https://kubernetespodcast.com/</a></figcaption></figure><p><a href="https://kubernetespodcast.com/"><strong>https://kubernetespodcast.com/</strong></a> : This is by far my favorite tech podcast. <br>Craig makes it fun and peppy. This podcast covers wide range of topics from kubernetes to cloud-native to latest industry news. If you work with Kubernetes, make sure to listen to this!</p><p><strong>2. Gradient Dissent</strong> by <a href="https://twitter.com/l2k">Lukas Biewald</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/308/1*tp7F1I1gVkMpMlX7CAlJSg.jpeg" /><figcaption>Source : <a href="https://wandb.ai/fully-connected/podcast">https://wandb.ai/fully-connected/podcast</a></figcaption></figure><p><a href="https://wandb.ai/fully-connected/podcast"><strong>https://wandb.ai/fully-connected/podcast</strong></a><strong> : </strong>This is more do to with the ML industry leaders and CEOs, tends to be more philosophical, and discussions are usually about AI/ML industry trends.<br>Lukas has a very humble demeanor. His interview style is of a student asking their professor questions!</p><p>3. <strong>MLOps Community</strong> by <a href="https://twitter.com/Dpbrinkm"><strong>Demetrios Brinkmann</strong></a><strong> </strong>and <a href="https://www.linkedin.com/in/vrachakonda/"><strong>Vishnu Rachakonda</strong></a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/308/1*6-n7NMV-jbUncEjXz0Boaw.jpeg" /><figcaption>Source: <a href="https://mlops.community/">https://mlops.community/</a></figcaption></figure><p><a href="https://mlops.community/watch/"><strong>https://mlops.community/watch/</strong></a><strong> : </strong>This is more than a podcast. They do meetups, live discussions, technical deep dives with in the ML Ops domain.<br>My favorite part is both Demetrios and Vishnu sometimes let their guests completely take over the discussions, which leads the discussions in to a more interesting directions (like <a href="https://podcasts.apple.com/us/podcast/the-future-of-ml-and-data-platforms-michael-del/id1505372978?i=1000537247078">this episode with Michael Del Balso and Erik Bernhardsson</a>)</p><p>4. <strong>TWIML AI Podcast </strong>by <a href="https://twitter.com/samcharrington"><strong>Sam Charrington</strong></a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/328/1*GX12TzKBmqEOFgon1VoFRg.png" /><figcaption>Source: <a href="https://twimlai.com/shows/">https://twimlai.com/shows/</a></figcaption></figure><p><a href="https://twimlai.com/shows/"><strong>https://twimlai.com/shows/</strong></a><strong> :</strong> This podcast covers wide range of topics in AI/ML with guests from academia, research organizations, industry leaders and technology executives. You will get to learn about some very interesting and new applications of AI/ML.</p><p>5. <strong>The Data Exchange</strong> by <strong>Ben Lorica</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/308/1*p6i2KOmnQzxeJcQt4c9Sjg.jpeg" /><figcaption>Source: <a href="https://gradientflow.com/podcast/">https://gradientflow.com/podcast/</a></figcaption></figure><p><a href="https://gradientflow.com/podcast/"><strong>https://gradientflow.com/podcast/</strong></a><strong> : </strong>This podcast covers AI/ML and Data engineering. Most guests tend to be start up founders and big tech executives in AI/ML and Data engineering domain, and the discussion are usually very high level.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7c2f9fde9aad" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[GitHub Copilot: Your friendly neighborhood AI pair programmer!]]></title>
            <link>https://paravatha.medium.com/github-copilot-your-friendly-neighborhood-ai-pair-programmer-158daf60ff54?source=rss-498285cfc25d------2</link>
            <guid isPermaLink="false">https://medium.com/p/158daf60ff54</guid>
            <category><![CDATA[github-copilot]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[developer]]></category>
            <category><![CDATA[github]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <dc:creator><![CDATA[Prasad Paravatha]]></dc:creator>
            <pubDate>Fri, 29 Oct 2021 13:10:43 GMT</pubDate>
            <atom:updated>2021-10-29T13:10:43.990Z</atom:updated>
            <content:encoded><![CDATA[<p>First impressions : GitHub’s Co-Pilot using OpenAI Codex is a game changer.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KiWliV0t7Kl3FgaamNX_tw.png" /><figcaption>Source: GitHub</figcaption></figure><p><strong><em>Background</em></strong> : <em>From Wikipedia - </em><a href="https://copilot.github.com/"><strong><em>GitHub Copilot</em></strong></a><em> is an artificial intelligence tool developed by GitHub and OpenAI to assist users of Visual Studio Code by autocompleting code. GitHub Copilot uses OpenAI Codex, a modified version of GPT-3 (a language model designed to produce human-like text) that is designed to produce valid computer code. Copilot is trained on public GitHub repositories of any license.</em></p><p><strong><em>You write comments, it pops out code. Thats one upside of this, engineers finally writing comments in their code!</em></strong></p><h3><strong>Co-Pilot in VS code: Writing Python code</strong></h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/988/1*Dd3oGU0PZ7xoY_2VVe7vfg.gif" /></figure><h3>Co-Pilot in VS code: Writing Go code</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*lnApVAIEqcyEej9QRLoP6g.gif" /></figure><h3><strong>Co-Pilot in IntelliJ: Writing Java code</strong></h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YQrhuNSzko6_bZYLbJArLw.gif" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=158daf60ff54" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Amazon SageMaker: Promising, yet long way to go!]]></title>
            <link>https://paravatha.medium.com/amazon-sagemaker-promising-yet-long-way-to-go-127f313f55a?source=rss-498285cfc25d------2</link>
            <guid isPermaLink="false">https://medium.com/p/127f313f55a</guid>
            <category><![CDATA[platform-as-a-service]]></category>
            <category><![CDATA[aws-sagemaker]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[aws]]></category>
            <dc:creator><![CDATA[Prasad Paravatha]]></dc:creator>
            <pubDate>Tue, 05 Oct 2021 23:30:34 GMT</pubDate>
            <atom:updated>2021-10-06T00:03:02.006Z</atom:updated>
            <content:encoded><![CDATA[<h4><strong>Great for ML Engineers, too technical for Data Scientists/Analysts</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/720/1*wh-k4TNQAZAvUo5xPy-DJg.jpeg" /><figcaption>Source : Amazon Sagemaker</figcaption></figure><blockquote><strong>Background: </strong>I spent 2 days playing around with different <a href="https://aws.amazon.com/sagemaker/features/"><em>Sagemaker features</em></a>.<br>There are way too many features and examples to try. TMI AWS, TMI!</blockquote><blockquote><strong><em>My first take-away</em></strong><em> :</em><strong><em> </em></strong><em>This is really well designed </em><strong><em>for Engineers by Engineers. </em></strong><em>Not just any engineers, but ML Engineers who are well versed with AWS services, and good understanding of Docker/DevOps. You basically have to be an </em><a href="https://www.linkedin.com/posts/paravatha_machinelearning-kudos-activity-6844620987221340160-qcTO">MLE Unicorn</a>.<br><strong><em>My Second take-away</em></strong><em>:</em> <em>This is a</em><strong><em> very expensive tool, </em></strong><em>could potentially burn the entire IT budget of an org with in 1 month by less than 10–20 users. <br></em><strong><em>My Third take-away</em></strong><em>: Deep Vendor lock-in, AWS is THE best Cloud provider when it comes to infra,</em><strong><em> so not a big deal for now. </em></strong><em>Could be an issue in the </em><a href="https://www.statista.com/statistics/967365/worldwide-cloud-infrastructure-services-market-share-vendor/"><em>near future as Azure is catching up</em></a></blockquote><blockquote><strong>Design:<br></strong>1<strong>. Product design</strong> is not great as the entire setup seems too technical for Data Scientists and/or Data Analysts. Good strategy to get more people to try AWS certifications ;-)<br>2. <strong>UX/UI design </strong>is done really well , the dashboards and different pages/menus are well organized.</blockquote><h3><strong>1. Model Training on CPU</strong></h3><p><strong>1.1: Model Training works well and its organized well</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cas3wdvOIM6SFh7lang3vw.jpeg" /></figure><p><strong>1.2: Nice Visualizations to monitor resource utilization</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SmaXtg1t7gD0UjOSuOp_Kw.png" /></figure><p><strong>1.3: You can launch experiments from Jupyter notebooks: <br>* </strong><em>My favorite feature: you can switch instance type and kernel type in JupyterLab</em><strong><br></strong><em>* Not so favorite: for some reason they are using old Jupyter Lab version 1.2.x</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*3ltMNhLZQ1LTJ0Ag0E2M7A.jpeg" /></figure><p><strong>1.4: Hyper parameter tuning works well<br>* </strong><em>I couldn’t figure out how to compare experiments</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*8TVoQMk74vbco5kbSdBaFw.jpeg" /></figure><p><strong>1.5: Feature Engineering example worked<br></strong>* Definitely need decent understanding of Spark and Kafka</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*KNOTDGGaOntzBLgwOksMsA.jpeg" /></figure><p><strong>1.6: Training on GPU is too technical, worked only for some use-cases. <br></strong><em>* None of the </em><strong><em>Ray</em></strong><em> examples worked<br>* Hugging face PyTorch examples worked well</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YEy6W6PbqExhHHDrzN8L4A.jpeg" /></figure><h3>2. Model Deployment = Meh!</h3><p><strong>2.1 This is where things got a bit confusing. <br></strong><em>* It seemed like, you’d use Sagemaker endpoints to do some ad-doc testing <br>* I couldn’t find any kind of workflow orchestration for model deployments</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZHOJLQAd5j4rcsIOc9j2jw.jpeg" /></figure><p><strong>2.2 Ad-hoc deployment : Failed and aborted</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*4rUpNDYVy6CTGY3k5zg1kA.jpeg" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=127f313f55a" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Kubeflow 1.3 : Shiny, Secure and Scalable]]></title>
            <link>https://paravatha.medium.com/kubeflow-1-3-shiny-secure-and-scalable-4ae90e1336dd?source=rss-498285cfc25d------2</link>
            <guid isPermaLink="false">https://medium.com/p/4ae90e1336dd</guid>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[kubeflow]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[kubernetes]]></category>
            <category><![CDATA[cloud-services]]></category>
            <dc:creator><![CDATA[Prasad Paravatha]]></dc:creator>
            <pubDate>Wed, 28 Apr 2021 23:55:59 GMT</pubDate>
            <atom:updated>2021-04-29T02:17:01.054Z</atom:updated>
            <content:encoded><![CDATA[<h3>Kubeflow 1.3 : Shiny, Secure and Scalable</h3><h4>Improved UX, UI, Security and support for hybrid-cloud</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/606/1*vzD5fMDVbg8tpc0vtkGKyw.jpeg" /></figure><p><strong><em>Background: </em></strong><em>I tried Kubeflow 1.0 in May 2020, with a narrow focus of </em><a href="https://paravatha.medium.com/cloud-native-ml-pipelines-using-kubeflow-51f3c95e75fe"><em>Cloud native ML pipelines</em></a><em>. <br>With the latest Kubeflow 1.3 release, they streamlined the setup process, improved security and user experience. Even with these updates, there is still a learning curve for non-technical/non-engineering users. Another improvement is the ability to pick and choose the components you want to install.<br></em><strong><em>IMO, the ideal use-case is a cross-functional Data Science team with a mix of Platform Engineers, ML Engineers and Data Scientists.</em></strong></p><ol><li><strong>Setup and Improved UX/UI: </strong><br>Relatively easy to setup compared to version 1.0, easy to understand UX and responsive UI. <br>I tried this on a GKE cluster with 6 nodes (Total Cluster resources = 12 cores, 60GB).<br>Oh, they now have ready to use JuptyterLab, Open-source VS code (Code-Server) and RStudio.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nQyB80uj3Vg7g5i7mAO7eA.png" /></figure><p><strong>Code-Server (VS Code in the browser)</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JZVEGHZykAEO6Bk2KycsLg.jpeg" /></figure><p><strong>Compared to version 1.0, Ready to use Pipelines and Experiments work seamlessly. </strong>There is a lot of customization you can do (which I did not explore).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*D1M2UfPCukekHL3IDkrhaw.jpeg" /></figure><p><strong>Other updates and enhancements</strong></p><ol><li>Katib is now AutoML (for hyper parameter tuning)</li><li>Volumes to create and attach Object storage or PV/PVC</li><li>Tensorboards for Visualizations (I am not sure how widely this will be used)</li><li>KFP comes in 2 flavors<br>* KFP with Argo workflows as default version.<br>* KFP with Tekton (This is an additional option with contributions from IBM and RedHat)</li><li>Runs <br>* Adhoc manual runs<br>* Scheduled runs (Using Argo Workflows)</li><li>Metrics and Metadata UI have improved as well</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6U_zqg0h40Jz51jecodKtA.jpeg" /></figure><p><strong>Summary</strong>: <br><em>There is a renewed interest in Kubeflow from almost every major cloud services company (AWS, GCP, Azure, IBM, RedHat/OpenShift). In addition to these cloud providers, you can now run Kubeflow as an Operator or on Microk8s, MiniKF, Kind etc.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4ae90e1336dd" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[In Defense of Agile Methodologies]]></title>
            <link>https://paravatha.medium.com/in-defense-of-agile-methodologies-5aed57a60458?source=rss-498285cfc25d------2</link>
            <guid isPermaLink="false">https://medium.com/p/5aed57a60458</guid>
            <category><![CDATA[iterative-development]]></category>
            <category><![CDATA[scrum]]></category>
            <category><![CDATA[agile-methodology]]></category>
            <category><![CDATA[incremental-delivery]]></category>
            <category><![CDATA[agile]]></category>
            <dc:creator><![CDATA[Prasad Paravatha]]></dc:creator>
            <pubDate>Mon, 12 Apr 2021 01:47:40 GMT</pubDate>
            <atom:updated>2021-04-12T04:09:29.157Z</atom:updated>
            <content:encoded><![CDATA[<p>How Agile transformed Tech industry and super charged software delivery</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*12oxaIqJSw7vdRAl26xLuw.png" /><figcaption>Agile practices in the wild</figcaption></figure><p><strong>Background :</strong> I first came across Agile methodologies during my graduate school program (Distributed and Multimedia Information Systems at Heriot-Watt University in Edinburgh, UK) . <em>My first thoughts after learning the concepts such as </em><strong><em>Agile unified process (AUP), Extreme programming (XP), Test-driven development (TDD) and Pair programming was, why would companies/teams use anything other than these new software development practices</em>. <br></strong>Later, <strong><em>when I worked at PayPal, Yodlee and CareFirst BCBS, I could clearly see the benefits of using Agile methodologies in the real world</em></strong>. These companies had very mature Agile practices and used them very effectively to speed up development, get rid of inefficiencies and deliver features quickly to end users.</p><p><strong>Side note : </strong>If you are a developer/engineer, I highly recommend reading one of the best books about writing code in an Agile environment : <a href="https://www.goodreads.com/book/show/3735293-clean-code">Clean Code: A Handbook of Agile Software Craftsmanship</a> by Robert C. Martin (a proponent of <a href="https://en.wikipedia.org/wiki/Agile_software_development#The_Agile_Manifesto">agile software development</a>)</p><h3><strong>A case for Agile:</strong></h3><p><strong><em>Success stories</em></strong></p><ol><li><strong>PayPal</strong> transformed from a relatively small start-up to one the most innovative FinTech companies by <a href="https://www.paypalobjects.com/webstatic/en_US/mktg/pages/stories/pdf/paypal_transformation_whitepaper_sept_18_2015.pdf">embracing Agile at Enterprise level (White paper)</a>. Not into reading research papers?, You can watch these videos : <a href="https://www.youtube.com/watch?v=FaEWoyrK7Jo">short-video</a> and <a href="https://www.youtube.com/watch?v=FtzJjbEATKk">long-video</a>.</li><li><strong>Spotify</strong> even created their own version of Agile : <a href="https://www.atlassian.com/agile/agile-at-scale/spotify">The Spotify model</a></li><li>Not just Tech companies, but Media companies like <a href="https://www.agilealliance.org/resources/experience-reports/the-trials-and-tribulations-of-finding-the-right-agile-framework/"><strong>NPR</strong></a> and <a href="https://open.nytimes.com/scrum-swarm-sprint-how-to-take-the-agile-process-and-make-it-your-own-b6416793ff7e"><strong>NYTimes</strong></a> use Agile/Scrum at the enterprise level.</li><li><strong>CareFirst BCBS</strong> went through similar <a href="https://www.cio.com/cio100/detail/2932">transformation using Agile/SCRUM</a> practices.</li></ol><p><strong><em>Agile is as much of a mind-set as it is a software development methodology</em></strong></p><ol><li>In my experience, people stuck in old ways (often lazy thinking) tend to resist adopting to Agile practices.</li><li>The beauty of Agile is, its flexible and variant, and comes in different flavors such as Scrum, Kanban, Scrumban, SAFe etc. to suit your needs.</li></ol><p><strong><em>Agile and high performant teams</em></strong> <br>These days engineering teams and project management teams use the below offshoots of Agile almost everyday.</p><ol><li>Rapid prototyping and Proof of Concepts (RAD: Rapid-application development)</li><li>Code reviews (Pair programming).</li><li>Automated Testing (TDD:Test Driven Development and BDD : Behavior-driven development)</li><li>Modern DevOps (Continuous Integration, Continuous Delivery, Continuous Deployment)</li><li>Collaboration (Cross-functional teams)</li><li>Speed to market (Time boxing, Iterative development and incremental delivery)</li></ol><p><strong>Summary<br></strong>IMO, In the rapidly evolving technology sector, <a href="https://en.wikipedia.org/wiki/Agile_software_development#The_Manifesto_for_Agile_Software_Development">The Manifesto for Agile Software Development</a> is ever more imporant. In order for Agile practices to take root and produce results, the top leadership of that company need to push for them. Once you decide and define which Agile framework works for your company/teams (this could take several years), it changes work culture, empowers individual teams to make certain decisions and speeds up software delivery.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=5aed57a60458" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Feast: Setup your own ML Feature store on Kubernetes]]></title>
            <link>https://paravatha.medium.com/feast-setup-your-own-ml-feature-store-on-kubernetes-5b3193c2b62c?source=rss-498285cfc25d------2</link>
            <guid isPermaLink="false">https://medium.com/p/5b3193c2b62c</guid>
            <category><![CDATA[feature-engineering]]></category>
            <category><![CDATA[feature-store]]></category>
            <category><![CDATA[automl]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[kubernetes]]></category>
            <dc:creator><![CDATA[Prasad Paravatha]]></dc:creator>
            <pubDate>Tue, 30 Mar 2021 02:11:49 GMT</pubDate>
            <atom:updated>2021-03-30T02:36:23.470Z</atom:updated>
            <content:encoded><![CDATA[<h3>A promising cloud-based open-source ML Feature store solution!</h3><p><strong><em>History</em></strong><em>: Feast has been through several revisions in the past year. With the current version (0.9), its possible to setup end-to-end on a barebones k8s cluster.<br></em><strong><em>Feast team is currently working on version 0.10 to be released in April 2021 (which is expected to further simplify the architecture and the setup).</em></strong><em> There are companies around the world that are already using Feast or in the process of integrating.</em></p><p><strong>Background (</strong><a href="https://docs.feast.dev/"><strong>From Feast website</strong></a><strong>): </strong>Feast (<strong>Fea</strong>ture <strong>St</strong>ore) is an operational data system for managing and serving machine learning features to models in production.</p><p>As you can see below, even with the <a href="https://github.com/feast-dev/feast/blob/master/examples/minimal/minimal_ride_hailing.ipynb">basic use-case</a>, there is lot happening with in <strong>Feast</strong> in terms of technology stack.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XL192LAE78ZImX9IVOemVg.png" /></figure><p>Below <a href="https://docs.feast.dev/concepts/architecture"><strong>Architecture diagram</strong></a> provides additional technologies you could plug-in based on your requirements.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/913/1*UjTTX3C9S5K7sH_oYnHScg.png" /></figure><p><strong>Setup : <br></strong>I used a 6 node GKE cluster (version 1.18.16-gke.302) with GS bucket and Spark k8s operator. <br>I also tried using a barebones k8s cluster with Spark in standalone mode using local tmp staging/work folders attached to a k8s node.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*dHApPvAllpkLLFAHCcV4cA.png" /><figcaption>Jupyter+Kafka+Postgres+Redis+Spark+Grafana+Prometheus</figcaption></figure><p>After you create features, save them in the data-lake and fetch the features using a Jupyter notebook, you can check Prometheus metrics and Grafana dashboards!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*LnEXX_HsBheWEYvbSiOSbQ.jpeg" /></figure><p>As an optional setup, Feast integrates with Prometheus and Grafana to provide visualizations</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*10qKS9LkeFYkBViKCznqdA.png" /><figcaption>Grafana dashboard with Feast integration</figcaption></figure><p><strong>Summary and next steps: <br></strong>Feast came a long way from the initial versions in 2019. Till version 0.9 came along, it was closely tied to a particular cloud provider (GCP) and/or set of a technologies. <br>Feast team has been working hard to simplify a very complex use-case, de-couple technologies and make it cloud-provider agnostic. Their <a href="https://docs.google.com/document/d/1CjR3Ph3l65hF5bRuchR9u9WSoirnIuEb7ILY9Ioh1Sk/edit#heading=h.1vi5kan9y6j">RFC for version 0.10</a> looks very promising.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=5b3193c2b62c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Blind spots in ML : Data Engineers & Data Scientists]]></title>
            <link>https://paravatha.medium.com/blind-spots-in-ml-data-engineers-data-scientists-4ed6f397b39e?source=rss-498285cfc25d------2</link>
            <guid isPermaLink="false">https://medium.com/p/4ed6f397b39e</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[data-scientist]]></category>
            <category><![CDATA[cloud-computing]]></category>
            <category><![CDATA[data-engineer]]></category>
            <dc:creator><![CDATA[Prasad Paravatha]]></dc:creator>
            <pubDate>Thu, 04 Mar 2021 14:31:39 GMT</pubDate>
            <atom:updated>2021-03-04T14:31:39.956Z</atom:updated>
            <content:encoded><![CDATA[<h3>Blind spots in ML : Data Engineers &amp; Data Scientists</h3><p>Basic mis-understandings of each others work</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Vty9d0z9P41_2DHHMmrn0g.png" /></figure><p><strong>Background</strong>: For the past 3 years, I have been working at the intersection of Cloud (AWS, OpenShift/Kubernetes, Docker, Snowflake), Software Engineering (UI and Microservices), and Data Science (Python, Spark, H2O, R, SAS). Its been a great learning experience working with talented engineers in building <a href="https://aws.amazon.com/solutions/case-studies/discover-financial-services-case-study"><strong>AIR9 Data Science Platform</strong></a>.</p><p>This post is about my observations working with <strong>Data Engineers and Data Scientists/Analysts</strong>, and their blind-spots when it comes to Machine learning projects. <br><strong>Side note #1: </strong>There is an interesting <a href="https://www.nextgov.com/cio-briefing/2018/10/origins-job-title-data-scientist/152286/"><strong>back story </strong></a>to the word “data scientist”.<br><strong>Side note #2: </strong>IMO, Data Engineer is a simplistic term, Data Platform Engineer is more accurate.</p><ol><li><strong>Resource requirements : CPU &amp; RAM<br>Engineers:</strong> Why do you need instances for more than 1 hour?<br><strong>Scientists:</strong> Data science is an iterative process. It’s not a straight line, things go in circular and/or zigzag pattern. <br><strong>I’d side with Data Scientists on this as ML workloads are bursty and resource requirements fluctuate.</strong></li><li><strong>Size of Datasets : Number of columns and records<br>Scientists: </strong>My code worked last week on a similar instance with a very similar dataset of 1 million records, why is it not working now? Why is it taking longer to complete?<br><strong>Engineers: </strong>Are number of columns same in the dataset from last week?<br>Are the column types similar? is your dataset in compressed or uncompressed format?<br><strong>I’d side with Data Engineers on this as Data scientists tend to underestimate the resource requirements w.r.t datasets (</strong>Every tool has different memory requirements: h2o needs 3–4 times RAM the size of the dataset, SASStudio doesn’t need high RAM to dataset ratios).</li><li><strong>Dependency management:</strong> <strong>Packages and Extensions</strong><br><strong>Scientists: </strong>Why am I not allowed to download packages from the internet? Why my new packages are breaking my code?<br><strong>Engineers: </strong>Some of the Packages and Extensions have security vulnerabilities and backward compatibility issues. Circular/Cyclic dependencies are also a big factor in Python (and R) w.r.t to security and backward compatibility (Java/Scala handle this a bit better than other languages).<br><strong>I’d side with Data Engineers on this, but Data scientists are rapidly gaining knowledge on how to use them and work around this.</strong></li><li><strong>Coding best practices:<br>Engineers:</strong> Why did you hardcode this configuration? Why did you write this segment of code in this way?<br><strong>Scientists: </strong>I had to do it this way for data preparation, feature engineering and model tuning steps.<br><strong>I’d side with Data Scientists on this, Data Engineers are not knowledgable about the typical ML lifecycle.</strong></li></ol><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4ed6f397b39e" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[RayML using Kubernetes operator]]></title>
            <link>https://paravatha.medium.com/rayml-using-kubernetes-operator-7881c7328f4d?source=rss-498285cfc25d------2</link>
            <guid isPermaLink="false">https://medium.com/p/7881c7328f4d</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[distributed-systems]]></category>
            <category><![CDATA[kubernetes]]></category>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[cloud-computing]]></category>
            <dc:creator><![CDATA[Prasad Paravatha]]></dc:creator>
            <pubDate>Mon, 15 Feb 2021 00:48:25 GMT</pubDate>
            <atom:updated>2021-02-15T03:30:22.653Z</atom:updated>
            <content:encoded><![CDATA[<p>RayML + Kubernetes = Finally, a truly scalable Distributed ML solution</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*hrrH3qwtaOe0wLdp8bu8-w.png" /></figure><p><strong>Background (From Ray website)</strong>: <a href="https://ray.io/">Ray</a> is an open-source distributed execution framework that makes it easy to scale your applications and to leverage state of the art machine learning libraries. Ray provides a simple, universal API for building distributed applications (supports Python and Java API).</p><blockquote>I first came across RayML on <a href="https://softwareengineeringdaily.com/2020/07/24/ray-applications-with-richard-liaw/">Software Engineering daily podcast</a> in July 2020. <br>In September 2020, I attended <a href="https://events.linuxfoundation.org/ray-summit/">Ray Summit 2020</a> organized by <a href="https://www.anyscale.com/about">Anyscale</a> (A startup founded by the creators of<a href="http://ray.io/"> Ray</a> from the UC Berkeley<a href="https://rise.cs.berkeley.edu/"> RISELab</a>, the successor to the<a href="https://amplab.cs.berkeley.edu/about/"> AMPLab</a>, that created<a href="https://spark.apache.org/"> Apache Spark</a> and<a href="https://databricks.com/"> Databricks</a>).<br>In one of the Ray summit sessions about running on Ray on k8s, several attendees requested Anyscale/Ray team to revive the <strong>k8s operator</strong>. Which they did in the short span of 3 months and now it’s available in <strong>betaish</strong> phase.<br><strong>There is still engineering work being done for different use-cases. Anyscale/Ray team allocated time and resources to make enhancements to enable k8s as a cluster manager.</strong></blockquote><p><strong>Review: </strong>This post is a primarily about my experience with setup and running ML workloads using Ray k8s operator<strong>.</strong></p><p><strong>Benefits of using Ray k8s operator as a cluster manager:<br></strong>1. Provides ability to run <strong>production grade ML workloads</strong>.<br>2. Autoscaler will <strong>horizontally scale pods</strong>.<br>3. Dashboard provides <strong>CPU/Memory</strong> resource usage and logs.<br>4. <strong>Network and Security policies</strong> can be set at the namespace and/or cluster level.<br>5. Relatively easier <strong>dependency management</strong>, you can pick and choose versions and underlying package dependencies.<br>6. More <strong>efficient allocation and usage of resources</strong><br>7. Single tenant <strong>dedicated clusters</strong>.</p><p><strong>Setup : I used GKE cluster with 5 nodes (each with 2 cores &amp; 8 GB) and Helm to setup Jupyter pod.</strong></p><p><strong>Ray Dashboard: </strong>Ray dashboard dynamically refreshes summary (CPU, Memory, Disk) for head and worker pods. You can also see the status of each pod, jobs and very detailed logs in one place. Below, I have 1 head pod and 2 worker pods. (you can pick min and max workers when creating your single tenant cluster)</p><figure><img alt="Ray dashboard dynamically refreshes node/pod summary, status and logs" src="https://cdn-images-1.medium.com/max/1024/1*l4ZACXUtEYkKT4zfHqKeNw.png" /></figure><p><strong>Ray Workload: </strong>I tried a relatively simple hyper parameter tuning code with num_evaluations=20. The autoscaler kicked in and added a new worker pod dynamically.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*d7nOxfgAUxWzsRPkUlkf2w.png" /></figure><p><strong>Ray Dashboard with ML run: </strong>Below, I have <strong>1 head pod &amp; 3 worker pods </strong>while hyper parameter tuning code is running.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*y3YePlievp3fpdgOvY2LMQ.png" /></figure><p>After the run is completed. The cluster automatically scaled down to<strong> 2 worker pods</strong>.</p><p><strong>Summary: </strong><em>Ray is the </em><strong><em>buzziest </em></strong><em>new tech (</em><a href="https://github.com/ray-project/ray"><strong>14K stars on GitHub</strong></a><em>) in recent times, with a lot of potential as its built from ground up for Cloud computing and ML technologies. As Ray provides Java and Python API, it could be used beyond ML use-cases such as ETL and other Analytics workloads.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7881c7328f4d" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>