Top LinkedIn Content on MLOps for AI Development

Staff ML Engineer | Meta, Roku, Walmart | 1:1 @ topmate.io/MLwhiz

45,153 followers 1y

Few Lessons from Deploying and Using LLMs in Production Deploying LLMs can feel like hiring a hyperactive genius intern—they dazzle users while potentially draining your API budget. Here are some insights I’ve gathered: 1. “Cheap” is a Lie You Tell Yourself: Cloud costs per call may seem low, but the overall expense of an LLM-based system can skyrocket. Fixes: - Cache repetitive queries: Users ask the same thing at least 100x/day - Gatekeep: Use cheap classifiers (BERT) to filter “easy” requests. Let LLMs handle only the complex 10% and your current systems handle the remaining 90%. - Quantize your models: Shrink LLMs to run on cheaper hardware without massive accuracy drops - Asynchronously build your caches — Pre-generate common responses before they’re requested or gracefully fail the first time a query comes and cache for the next time. 2. Guard Against Model Hallucinations: Sometimes, models express answers with such confidence that distinguishing fact from fiction becomes challenging, even for human reviewers. Fixes: - Use RAG - Just a fancy way of saying to provide your model the knowledge it requires in the prompt itself by querying some database based on semantic matches with the query. - Guardrails: Validate outputs using regex or cross-encoders to establish a clear decision boundary between the query and the LLM’s response. 3. The best LLM is often a discriminative model: You don’t always need a full LLM. Consider knowledge distillation: use a large LLM to label your data and then train a smaller, discriminative model that performs similarly at a much lower cost. 4. It's not about the model, it is about the data on which it is trained: A smaller LLM might struggle with specialized domain data—that’s normal. Fine-tune your model on your specific data set by starting with parameter-efficient methods (like LoRA or Adapters) and using synthetic data generation to bootstrap training. 5. Prompts are the new Features: Prompts are the new features in your system. Version them, run A/B tests, and continuously refine using online experiments. Consider bandit algorithms to automatically promote the best-performing variants. What do you think? Have I missed anything? I’d love to hear your “I survived LLM prod” stories in the comments!

46 Comments

Greg Coquillo

Product Leader at Microsoft Azure AI & HPC | Former AWS/Amazon | Startup Investor | Linkedin Top Voice for AI, DS, Tech, and Innovation | Building futuristic AI data centers using the world’s most powerful supercomputers

228,418 followers 3mo

Stop building AI agents in random steps, scalable agents need a structured path. A reliable AI agent is not built with prompts alone, it is built with logic, memory, tools, testing, and real-world infrastructure. Here’s a breakdown of the full journey - 1️⃣ Pick an LLM Choose a reasoning-strong model with good tool support so your agent can operate reliably in real environments. 2️⃣ Write System Instructions Define the rules, tone, and boundaries. Clear instructions make the agent consistent across every workflow. 3️⃣ Connect Tools & APIs Link your agent to the outside world - search, databases, email, CRMs, internal systems - to make it actually useful. 4️⃣ Build Multi-Agent Systems Split work across focused agents and let them collaborate. This boosts accuracy, reliability, and speed. 5️⃣ Test, Version & Optimize Version your prompts, A/B test, keep backups, and keep improving - this is how production agents stay stable. 6️⃣ Define Agent Logic Outline how the agent thinks, plans, and decides step-by-step. Good logic prevents unpredictable behavior. 7️⃣ Add Memory (Short + Long Term) Enable your agent to remember past conversations and user preferences so it gets smarter with every interaction. 8️⃣ Assign a Specific Job Give the agent a narrow, outcome-driven task. Clear scope = better results. 9️⃣ Add Monitoring & Feedback Track errors, latency, failures, and real-world performance. User feedback is the fuel of improvement. 🔟 Deploy & Scale Move from prototype to production with proper infra—containers, serverless, microservices. AI agents don’t scale because of prompts, they scale because of architecture. If you get logic, memory, tools, and infra right, your agents become reliable, predictable, and production-ready. #AI

98 Comments

Brij kishore Pandey

AI Architect & Engineer | AI Strategist

719,174 followers 9mo

Training a Large Language Model (LLM) involves more than just scaling up data and compute. It requires a disciplined approach across multiple layers of the ML lifecycle to ensure performance, efficiency, safety, and adaptability. This visual framework outlines eight critical pillars necessary for successful LLM training, each with a defined workflow to guide implementation: 𝟭. 𝗛𝗶𝗴𝗵-𝗤𝘂𝗮𝗹𝗶𝘁𝘆 𝗗𝗮𝘁𝗮 𝗖𝘂𝗿𝗮𝘁𝗶𝗼𝗻: Use diverse, clean, and domain-relevant datasets. Deduplicate, normalize, filter low-quality samples, and tokenize effectively before formatting for training. 𝟮. 𝗦𝗰𝗮𝗹𝗮𝗯𝗹𝗲 𝗗𝗮𝘁𝗮 𝗣𝗿𝗲𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴: Design efficient preprocessing pipelines—tokenization consistency, padding, caching, and batch streaming to GPU must be optimized for scale. 𝟯. 𝗠𝗼𝗱𝗲𝗹 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗗𝗲𝘀𝗶𝗴𝗻: Select architectures based on task requirements. Configure embeddings, attention heads, and regularization, and then conduct mock tests to validate the architectural choices. 𝟰. 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗦𝘁𝗮𝗯𝗶𝗹𝗶𝘁𝘆 and 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Ensure convergence using techniques such as FP16 precision, gradient clipping, batch size tuning, and adaptive learning rate scheduling. Loss monitoring and checkpointing are crucial for long-running processes. 𝟱. 𝗖𝗼𝗺𝗽𝘂𝘁𝗲 & 𝗠𝗲𝗺𝗼𝗿𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Leverage distributed training, efficient attention mechanisms, and pipeline parallelism. Profile usage, compress checkpoints, and enable auto-resume for robustness. 𝟲. 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 & 𝗩𝗮𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻: Regularly evaluate using defined metrics and baseline comparisons. Test with few-shot prompts, review model outputs, and track performance metrics to prevent drift and overfitting. 𝟳. 𝗘𝘁𝗵𝗶𝗰𝗮𝗹 𝗮𝗻𝗱 𝗦𝗮𝗳𝗲𝘁𝘆 𝗖𝗵𝗲𝗰𝗸𝘀: Mitigate model risks by applying adversarial testing, output filtering, decoding constraints, and incorporating user feedback. Audit results to ensure responsible outputs. 🔸 𝟴. 𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗶𝗻𝗴 & 𝗗𝗼𝗺𝗮𝗶𝗻 𝗔𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻: Adapt models for specific domains using techniques like LoRA/PEFT and controlled learning rates. Monitor overfitting, evaluate continuously, and deploy with confidence. These principles form a unified blueprint for building robust, efficient, and production-ready LLMs—whether training from scratch or adapting pre-trained models.

27 Comments

Aishwarya Srinivasan

625,594 followers 10mo

Most ML systems don’t fail because of poor models. They fail at the systems level! You can have a world-class model architecture, but if you can’t reproduce your training runs, automate deployments, or monitor model drift, you don’t have a reliable system. You have a science project. That’s where MLOps comes in. 🔹 𝗠𝗟𝗢𝗽𝘀 𝗟𝗲𝘃𝗲𝗹 𝟬 - 𝗠𝗮𝗻𝘂𝗮𝗹 & 𝗙𝗿𝗮𝗴𝗶𝗹𝗲 This is where many teams operate today. → Training runs are triggered manually (notebooks, scripts) → No CI/CD, no tracking of datasets or parameters → Model artifacts are not versioned → Deployments are inconsistent, sometimes even manual copy-paste to production There’s no real observability, no rollback strategy, no trust in reproducibility. To move forward: → Start versioning datasets, models, and training scripts → Introduce structured experiment tracking (e.g. MLflow, Weights & Biases) → Add automated tests for data schema and training logic This is the foundation. Without it, everything downstream is unstable. 🔹 𝗠𝗟𝗢𝗽𝘀 𝗟𝗲𝘃𝗲𝗹 𝟭 - 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗲𝗱 & 𝗥𝗲𝗽𝗲𝗮𝘁𝗮𝗯𝗹𝗲 Here, you start treating ML like software engineering. → Training pipelines are orchestrated (Kubeflow, Vertex Pipelines, Airflow) → Every commit triggers CI: code linting, schema checks, smoke training runs → Artifacts are logged and versioned, models are registered before deployment → Deployments are reproducible and traceable This isn’t about chasing tools, it’s about building trust in your system. You know exactly which dataset and code version produced a given model. You can roll back. You can iterate safely. To get here: → Automate your training pipeline → Use registries to track models and metadata → Add monitoring for drift, latency, and performance degradation in production My 2 cents 🫰 → Most ML projects don’t die because the model didn’t work. → They die because no one could explain what changed between the last good version and the one that broke. → MLOps isn’t overhead. It’s the only path to stable, scalable ML systems. → Start small, build systematically, treat your pipeline as a product. If you’re building for reliability, not just performance, you’re already ahead. Workflow inspired by: Google Cloud ---- If you found this post insightful, share it with your network ♻️ Follow me (Aishwarya Srinivasan) for more deep dive AI/ML insights!

55 Comments

Pooja Jain

194,182 followers 2w

80% of ML success depends on the pipeline, not just the algorithm. Yet, when things go right, the model gets the glory. When they go wrong, the engineer gets the call. A retail team built a churn model. Six weeks of work. It went live. Stakeholders loved the weekly report. Three weeks later — a vendor feed silently changed column order. The pipeline ran green. The model predicted on garbage. Eleven days of wrong decisions. Zero alerts fired. A data contract at ingestion would have killed it in 30 seconds. This is the map they didn't have. 𝟭. 𝗜𝗻𝗴𝗲𝘀𝘁𝗶𝗼𝗻: 𝗧𝗵𝗲 𝗜𝗺𝗺𝘂𝘁𝗮𝗯𝗹𝗲 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻 ↳ Raw is Sacred: Land data as-is from APIs, DBs, and streams into the Bronze layer. ↳ Idempotency First: Design pipelines that can safely rerun without duplicating data. ↳ The Choice: Batch vs. Streaming is a business latency decision, not just a tech one. 𝟮. 𝗩𝗮𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻: 𝗧𝗵𝗲 𝗗𝗮𝘁𝗮 𝗖𝗼𝗻𝘁𝗿𝗮𝗰𝘁 ↳ Profiling: Run null and distribution scans before a single transformation. ↳ Contracts: Use written schemas to define shape, type, and ownership. ↳ Gatekeeping: Kill bugs at the source before they poison your Silver layer. 𝟯. 𝗦𝘁𝗼𝗿𝗮𝗴𝗲: 𝗧𝗵𝗲 𝗠𝗲𝗱𝗮𝗹𝗹𝗶𝗼𝗻 𝗝𝗼𝘂𝗿𝗻𝗲𝘆 ↳ Bronze: Raw, immutable landing zone for auditability. ↳ Silver: Cleaned, deduplicated, and standardized (the "Refining" stage). ↳ Gold: Aggregated, business-ready views optimized for BI and ML. 𝟰. 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: 𝗟𝗼𝗴𝗶𝗰 𝗼𝘃𝗲𝗿 𝗧𝗼𝗼𝗹𝗶𝗻𝗴 ↳ ELT Wins: Load cheap, then transform in-warehouse using modular models (like dbt). ↳ DAGs over Crons: Encode dependencies so you never guess what runs first. ↳ Clarity: Perfectly wrong dashboards come from bad logic, not bad tools. 𝟱. 𝗚𝗼𝘃𝗲𝗿𝗻𝗮𝗻𝗰𝗲: 𝗧𝗵𝗲 𝗧𝗿𝘂𝘀𝘁 𝗟𝗮𝘆𝗲𝗿 ↳ Lineage: Trace every number back to its source at the column level. ↳ Drift Monitoring: Watch for volume and freshness drops, not just "failures." 𝟲. 𝗦𝗲𝗿𝘃𝗶𝗻𝗴: 𝗧𝗵𝗲 𝗟𝗮𝘀𝘁 𝗠𝗶𝗹𝗲 ↳ Consistency: Define metrics once so they are the same in BI and ML. ↳ Accessibility: Provide optimized access via APIs or Semantic layers. These are 5 non-negotiables rules worth printing: → Raw zone is sacred. Never mutate it → Contract before code — schema, SLA, owner agreed first → Idempotency is day-one design, not an afterthought → Monitor drift, not just failures — silent wrong data is worse → Build with consumers, not for them What layer of the pipeline has caused you the most pain lately? Drop a comment below!👇 ♻️ Reshare if this resonates with you. 🔖 Save this map for your next architecture review! ~ Pooja Jain #data #engineering #intelligence #ai #business

61 Comments

Venkata Naga Sai Kumar Bysani

Data Scientist | 300K+ Data Community | 3+ years in Predictive Analytics, Experimentation & Business Impact | Featured on Times Square, Fox, NBC

239,718 followers 2mo

90% of ML projects never make it to production. Here's the 8-step framework that works. 𝐒𝐭𝐞𝐩 𝟏: 𝐃𝐞𝐟𝐢𝐧𝐞 𝐭𝐡𝐞 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐏𝐫𝐨𝐛𝐥𝐞𝐦 ↳ Start with WHY, not HOW ↳ Is ML even the right solution? ↳ Define success criteria upfront 𝐒𝐭𝐞𝐩 𝟐: 𝐃𝐚𝐭𝐚 𝐂𝐨𝐥𝐥𝐞𝐜𝐭𝐢𝐨𝐧 & 𝐄𝐱𝐩𝐥𝐨𝐫𝐚𝐭𝐢𝐨𝐧 ↳ Check data quality: missing values, duplicates, outliers ↳ EDA: distributions, correlations, patterns ↳ Document your data sources and limitations 𝐒𝐭𝐞𝐩 𝟑: 𝐅𝐞𝐚𝐭𝐮𝐫𝐞 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 ↳ Handle missing values (imputation, dropping) ↳ Encode categorical variables ↳ Create new features from domain knowledge ↳ This alone can improve performance by 20-30% 𝐒𝐭𝐞𝐩 𝟒: 𝐓𝐫𝐚𝐢𝐧-𝐓𝐞𝐬𝐭 𝐒𝐩𝐥𝐢𝐭 & 𝐕𝐚𝐥𝐢𝐝𝐚𝐭𝐢𝐨𝐧 ↳ Split: 70% train, 15% validation, 15% test ↳ Use stratified split for imbalanced data ↳ Never touch test data until final evaluation 𝐒𝐭𝐞𝐩 𝟓: 𝐌𝐨𝐝𝐞𝐥 𝐒𝐞𝐥𝐞𝐜𝐭𝐢𝐨𝐧 & 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 ↳ Start simple (logistic regression, decision tree) ↳ Try XGBoost, LightGBM, Random Forest ↳ Track experiments with MLflow or W&B 𝐒𝐭𝐞𝐩 𝟔: 𝐌𝐨𝐝𝐞𝐥 𝐄𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 ↳ Use appropriate metrics (F1, ROC-AUC, RMSE) ↳ Analyze errors: confusion matrix, feature importance ↳ Does 85% accuracy actually solve the business problem? 𝐒𝐭𝐞𝐩 𝟕: 𝐃𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭 ↳ Build API endpoint (FastAPI, Flask) ↳ Containerize with Docker ↳ Deploy to cloud (AWS, GCP, Azure) 𝐒𝐭𝐞𝐩 𝟖: 𝐌𝐨𝐧𝐢𝐭𝐨𝐫𝐢𝐧𝐠 & 𝐌𝐚𝐢𝐧𝐭𝐞𝐧𝐚𝐧𝐜𝐞 ↳ Track prediction accuracy over time ↳ Monitor for data drift and concept drift ↳ Retrain periodically with fresh data 𝐂𝐨𝐦𝐦𝐨𝐧 𝐏𝐢𝐭𝐟𝐚𝐥𝐥𝐬 𝐭𝐨 𝐀𝐯𝐨𝐢𝐝: ❌ Data leakage (using future info to predict past) ❌ Ignoring class imbalance ❌ Deploying without monitoring ❌ Optimizing metrics without business context 𝐏𝐫𝐨 𝐭𝐢𝐩: Your first end-to-end project will be messy, that's normal. Focus on completing the full cycle, then iterate. 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐬𝐭𝐚𝐫𝐭 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐌𝐋? Here are 5 resources I recommend: 1. Machine Learning by Andrew Ng - https://lnkd.in/diqSeD-k 2. Codebasics ML Playlist - https://lnkd.in/dBiYAeN7 3. Krish Naik ML Playlist - https://lnkd.in/dcpAS5gA 4. StatQuest with Joshua Starmer - https://lnkd.in/dhZ3aVhf 5. Sentdex ML Tutorials - https://lnkd.in/dCFPtDv8 Which step do you find most challenging? 👇 ♻️ Repost to help someone starting their ML journey

33 Comments

Dylan Anderson

Bridging the gap between data and strategy ✦ The Data Ecosystem Author ✦ Data & AI Leader ✦ Speaker ✦ R Programmer ✦ Policy Nerd

52,549 followers 1y

How do you get from an idea to a Machine Learning product? While many view machine learning as simply training models with Python code, the reality is far more complex and structured The ML development process is a systematic journey from business problem to deployed solution, requiring careful consideration at each stage to ensure technical delivery leads to business value. Here's the lifecycle broken down: 𝟭. 🔎 𝗠𝗼𝗱𝗲𝗹 𝗦𝗰𝗼𝗽𝗶𝗻𝗴 & 𝗗𝗮𝘁𝗮 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀 Set the foundation for success by defining clear objectives and ensuring data readiness Problem Definition – Define clear business problems and figure out the use case for ML Data Sourcing & Considerations – Consider data accessibility, regulatory requirements and permissions Data Ingestion – Establish reliable data pipelines that feed your model Data Preparation – Transform raw data into clean, analysis-ready formats through pipelines Exploratory Data Analysis – Conduct exploratory analysis to understand patterns before modelling 𝟮. 🧠 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 Build a functioning machine learning model based on your prepared data while factoring in reproducibility and performance Feature Engineering – Convert raw data into meaningful features your model can actually use Model Selection – Test multiple algorithmic approaches against your constraints Baseline Model Development – Develop simple baseline models before investing in complexity Version Control – Implement version control for code, data, AND experiments Model Training – Train models through constant iteration and cross-validation 𝟯. 🚀 𝗠𝗼𝗱𝗲𝗹 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 Bringing the model to production so it can deliver value throughout the organisation Model Evaluation & Validation – Validate performance through comprehensive testing frameworks Model Serialization & Packaging – Serialize and package models with all dependencies Resource Planning – Plan computational resources and scaling strategies Deployment Architecture Planning – Design deployment architecture considering reproducibility Business Integration – Integrate with business systems through well-designed APIs Model Registry – Maintain a registry of all model versions and metadata 𝟰. 🔄 𝗠𝗮𝗶𝗻𝘁𝗲𝗻𝗮𝗻𝗰𝗲 Ensures your deployed model continues to perform effectively over time and learn from new data Feedback Loops & Continuous Learning – Establish feedback loops to capture user interactions, helping build future model iterations Performance Tracking – Track business impact alongside operational costs to identify value creation Model Monitoring & Observability – Monitor for data drift and model degradation Check out my latest article on productionising a Machine Learning model (link in the comments) and let me know what you think!

2 Comments

Anurag(Anu) Karuparti

30,932 followers 2mo

𝐈 𝐡𝐚𝐯𝐞 𝐬𝐩𝐞𝐧𝐭 𝐭𝐡𝐞 𝐥𝐚𝐬𝐭 𝐲𝐞𝐚𝐫 𝐡𝐞𝐥𝐩𝐢𝐧𝐠 𝐄𝐧𝐭𝐞𝐫𝐩𝐫𝐢𝐬𝐞𝐬 𝐦𝐨𝐯𝐞 𝐟𝐫𝐨𝐦 "𝐈𝐌𝐏𝐑𝐄𝐒𝐒𝐈𝐕𝐄 𝐃𝐄𝐌𝐎𝐒" 𝐭𝐨 "𝐑𝐄𝐋𝐈𝐀𝐁𝐋𝐄 𝐀𝐈 𝐀𝐆𝐄𝐍𝐓𝐒". The pattern is always the same: Teams nail the LLM integration and think the hard part is done, then realize they have built 20% of what production actually requires. 𝐇𝐞𝐫𝐞 𝐢𝐬 𝐰𝐡𝐲 𝐞𝐚𝐜𝐡 𝐛𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐛𝐥𝐨𝐜𝐤 𝐦𝐚𝐭𝐭𝐞𝐫𝐬: Reasoning Engine (LLM): Just the Beginning • Interprets intent and generates responses • Without surrounding infrastructure, it is just expensive autocomplete • Real engineering starts when you ask: "How does this agent make decisions it can defend?" Context Assembly: Your Competitive Moat • Where RAG, memory stores, and knowledge retrieval converge • Identical LLMs produce vastly different results based purely on context quality • Prompt engineering does not matter if you are feeding the model irrelevant information Planning Layer: What to Do Next • Breaks goals into steps and decides actions before acting • Separates thinking from doing • Poor planning = agents that thrash or make circular progress Guardrails & Policy Engine: Non-Negotiable • Defines what APIs the agent can call, what data it can access • Determines which decisions require human approval • One misconfigured tool call can cascade into serious business impact Memory Store: Enables Continuity • Short-term state + long-term memory across interactions • Without it, every conversation starts from zero • Context window isn't memory it's just scratchpad Validation & Feedback Loop: How Agents Improve • Logging isn't learning • Capture user corrections, edge cases, quality signals • Best teams treat every interaction as potential training data Observability: Makes the Invisible Visible • When your agent fails, can you trace exactly why? • Which context was retrieved? What reasoning path? What was the token cost? • If you can not answer in under 60 seconds, debugging will kill velocity Cost & Performance Controls: POC vs Product • Intelligent model routing, caching, token optimization are not premature they are survival • Monthly bills can drop 70% with zero accuracy loss through smarter routing What most teams miss: They build top-down (UI → LLM → tools) when they should build bottom-up (infrastructure → observability → guardrails → reasoning). These 11 building blocks are not theoretical. They are what every production agent eventually requires either through intentional design or painful iteration. 𝐖𝐡𝐢𝐜𝐡 𝐛𝐥𝐨𝐜𝐤 𝐚𝐫𝐞 𝐲𝐨𝐮 𝐜𝐮𝐫𝐫𝐞𝐧𝐭𝐥𝐲 𝐮𝐧𝐝𝐞𝐫𝐢𝐧𝐯𝐞𝐬𝐭𝐢𝐧𝐠 𝐢𝐧? ♻️ Repost this to help your network get started ➕ Follow Anurag(Anu) Karuparti for more PS: If you found this valuable, join my weekly newsletter where I document the real-world journey of AI transformation. ✉️ Free subscription: https://lnkd.in/exc4upeq #GenAI #AIAgents

65 Comments

Pan Wu

Senior Data Science Manager at Meta

51,325 followers 1y

Machine learning models aren’t a “build once and done” solution—they require ongoing management and quality improvements to thrive within a larger system. In this tech blog, Uber's engineering team shares how they developed a framework to address the challenges of maintaining and improving machine learning systems. The business need centers on the fact that Uber has numerous machine learning use cases. While teams typically focus on performance metrics like AUC or RMSE, other crucial factors—such as the timeliness of training data, model reproducibility, and automated retraining—are often overlooked. To address these challenges at scale, developing a comprehensive platform approach is essential. Uber's solution involves the development of the Model Excellence Scores framework, designed to measure, monitor, and enforce quality at every stage of the ML lifecycle. This framework is built around three core concepts derived from Service Level Objectives (SLOs): indicators, objectives, and agreements. Indicators are quantitative measures that reflect specific aspects of an ML system’s quality. Objectives define target ranges for these indicators, while Agreements consolidate the indicators at the ML use-case level, determining the overall PASS/FAIL status based on indicator results. The framework integrates with other ML systems at Uber to provide insights, enable actions, and ensure accountability for the success of machine learning models. It’s one thing to achieve a one-time success with machine learning; sustaining that success, however, is a far greater challenge. This tech blog provides an excellent reference for anyone building scalable and reliable ML platforms. Enjoy the read! #machinelearning #datascience #monitoring #health #quality #SLO #SnacksWeeklyonDataScience – – – Check out the "Snacks Weekly on Data Science" podcast and subscribe, where I explain in more detail the concepts discussed in this and future posts: -- Spotify: https://lnkd.in/gKgaMvbh -- Apple Podcast: https://lnkd.in/gj6aPBBY -- Youtube: https://lnkd.in/gcwPeBmR https://lnkd.in/g6DJm9pb

Model Excellence Scores: A Framework for Enhancing the Quality of Machine Learning Systems at Scale uber.com

2 Comments

Shrey Shah

I teach AI assisted coding and agents | Applied AI | Cursor Ambassador | V0 Ambassador

16,742 followers 7mo

I've been building AI agents for the last 2.5 years and these 8 skills are all that matters to build production grade agents: These eight pillars separate hobby projects from production LLMs. ☑ Prompt engineering Write prompts like code. Use patterns, few‑shot examples, chain of thought. Keep them repeatable. Test variations fast. ☑ Context engineering Pull the right data at the right time. Blend database rows, memory chunks, tool results into the prompt. Trim noise and stay inside token limits. ☑ Fine‑tuning When prompts aren’t enough, adapt the model. Use LoRA or QLoRA with a clean data pipeline. Watch for overfit and keep the compute budget low. ☑ Retrieval augmented generation Add a vector store. Chunk documents, index them, retrieve the top hits. Feed the results through a stable template. ☑ Agents Move past single turn Q&A. Build loops that call APIs, manage state, and recover from failures. Design fallbacks for missing data. ☑ Deployment Wrap the model in a scalable API. Monitor latency, handle concurrency, and isolate crashes with containers. ☑ Optimization Apply quantization, pruning, or distillation. Benchmark speed versus accuracy. Fit the model to the hardware you have. ☑ Observability Log prompts, responses, token counts, latency. Spot drift early. Feed the metrics back into the next iteration. I’m Shrey Shah & I share daily guides on AI. If this helped, hit the ♻️ reshare button so someone else can level up their LLM game.

5 Comments

LinkedIn respects your privacy

MLOps for AI Development

Explore categories

MLOps for AI Development

More in MLOps for AI Development

More Artificial Intelligence topics

Explore categories