Another milestone unlocked. Cloudera has earned Amazon Web Services (AWS) AI Competency, recognizing our ability to deliver secure, scalable AI across hybrid environments where data lives. From agentic workflows to real-world outcomes, AI is happening now. Read the press release: https://bit.ly/4w4JdPN
About us
Cloudera is the only data and AI platform company that brings AI to data anywhere: in clouds, data centers, and at the edge. Cloudera delivers 100% of data in all forms–whether it is in Cloudera or anywhere in the entire data estate. The world’s largest organizations rely on Cloudera to fuel insights that boost bottom lines, safeguard against threats, and save lives. Learn more at Cloudera.com. --------------------------------------------------------------------------------- Recruitment Fraud Alert It has come to our attention that job seekers have been contacted about fake job opportunities with Cloudera from individuals fraudulently posing as Cloudera employees. These recruiting fraud schemes often include requests for personal information and payments. Be aware that Cloudera will never request a payment as part of its recruitment process. Additionally, Cloudera will never make a job offer without conducting an interview process. Any information submitted to Cloudera in relation to a job application should only be through our official career portal (https://www.cloudera.com/careers.html). Email communications from Cloudera will come from an email address ending in @cloudera.com. If you are the target of a recruiting scam, consider filing a report with law enforcement authorities. Cloudera is not responsible for fraudulent job offers and/or any claims, damages, expenses, or other inconvenience connected to recruiting scams.
- Website
-
https://www.cloudera.com
External link for Cloudera
- Industry
- Software Development
- Company size
- 1,001-5,000 employees
- Headquarters
- Santa Clara, California
- Type
- Privately Held
- Specialties
- Big Data, Cloud Computing, machine learning, cloud, Analytics, Artificial Intelligence, Databases, Open Source, Data Science, Open Source, data warehouse, Data Engineering, IoT, Data, Operational Database, Streaming, Edge to AI, AI, ML, Enterprise Data Cloud, Apache, cdp, hybrid cloud, generative ai, and kubernetes
Locations
Employees at Cloudera
Updates
-
Heading to the Data Innovation Summit Nordics? Join Cloudera in Stockholm, May 7–8, 2026, and discover how to turn data into real AI impact—from experimentation to production. See you there: https://bit.ly/4dzPwE2
-
Hey, did you know? 62% of government executives say data privacy and security concerns are blocking AI adoption. That’s why Cloudera’s Hilary Billingslea shares why private AI is becoming essential for defense agencies in Cyber Defense Magazine’s 2026 RSA edition. Find out how to overcome these barriers: https://bit.ly/3QoLfd0
-
Everyone says they’re “AI-ready.” But readiness isn’t about ambition—it’s about having full visibility and control over your data. And for most organizations, that foundation still isn’t there. Get the full picture in the Data Readiness Index: https://bit.ly/4eyD8ED
-
-
Cloudera reposted this
Many customers have told me that the model is no longer the bottleneck for enterprise AI inference; lot of those practitioners however say "The document understanding layer is!" - Context: Every inference call starts with context. In enterprise, that context is overwhelmingly unstructured — PDFs, filings, claims schedules, contracts. If the parse is wrong, the embedding is wrong, the retrieval is wrong, and the model generates a confidently wrong answer from a confidently wrong input. "Cloudera Enterprise AI Ecosystem" partner Pulse open-sourced PulseBench-Tab — a frontier benchmark that grades whether a model actually understood a table's structure (rowspans, colspans, nested merges), not just whether the extracted text reads okay • 1,820 human-annotated tables across 9 languages, 4 scripts • Real 10-Ks, government reports, corporate disclosures • Structures up to 1,000+ cells with deep merged-cell nesting • T-LAG: unified scoring for structural accuracy + OCR quality • 9 providers evaluated independently, in the open - Why care if you are deploying on private environments? The rest of the enterprise inference stack runs in tenant — Document parsing has been the forced exception: SaaS-gated, making regulated customers choose between accuracy and sovereignty. Open-source, VPC-deployable parsing at frontier quality closes that gap. Parse → embed → retrieve → generate now lives entirely inside the customer's perimeter. No egress of regulated data. - So what for some of the customers and their use cases? 1. #FinancialServices — 10-Ks, fund admin, bordereaux, claims schedules, actuarial reports. A misread merged cell quietly becomes a reserving error the underwriting agent then reasons from 2. #Healthcare — clinical trial tables, lab panels, remittance advice, EOBs. Manual abstraction is one of the largest line items in health data ops; VPC-deployable parsing means PHI never leaves the tenant 3. #Telecom — vendor interconnect billing, SLA tables, contract exhibits. Industry revenue leakage runs 1–3% of revenue, much of it buried in rate-table complexity. As #inference commoditizes, durable moats move one layer down — into the data-to-inference pipeline. End-to-end inference, from raw document to generated answer, can now run entirely inside the customer's VPC at frontier quality. No egress. No SaaS dependency. No structural errors silently corrupting the context window. That's the sovereign AI stack regulated enterprises have been asking for. Strong work by the Pulse team Sid Ritvik! — rigorous, open, independently evaluated. Credit to Dushyanth/ Moody at S&P Global for the methodology contributions! #EnterpriseAI #Cloudera #RAG #AgenticAI #SovereignAI #DocumentIntelligence
Introducing PulseBench-Tab: an open-source, frontier benchmark for table parsing. Table extraction benchmarks today mostly evaluate cell-level text matching or sequence alignment, which means they miss the structural relationships (rowspan, colspan, adjacency) that determine whether a table was actually understood. We built PulseBench-Tab to close that gap. The dataset contains 1,820 human-annotated tables across 9 languages and 4 scripts, drawn from real-world financial filings, government reports, and corporate disclosures. Tables range from simple grids to complex structures with over 1,000 cells and deep merged-cell nesting. Alongside the dataset, the Pulse research team developed T-LAG, a new scoring metric that evaluates structural accuracy and OCR quality in a single unified score. Incredibly proud of our research and engineering team for building the industry's most accurate document extraction model, and for doing the work to prove it rigorously in the open. Pulse runs in some of the most demanding environments in the world, including Fortune 50 technology companies, top-ten global private equity firms, the largest global insurance firms, and many of the fastest-growing AI startups. We've built Pulse for the highest quality document understanding, not traditional OCR. A lot of the novel research work has been around incorporating vision-language models for production document parsing, and the platform has evolved into a horizontal document intelligence layer used across finance, insurance, healthcare, legal, energy, and technology. Thank you to Dushyanth Sekhar and Moody Hadi of S&P Global's Enterprise Data Organization for their academic contributions to the benchmark methodology. We evaluated 9 providers on the full dataset, independently. Dataset, evaluation code, research paper, and granular per-language and per-complexity results are all in the comments below.
-
-
AI is changing the threat landscape faster than organizations can keep up. In this episode of #TheAIForecast, Theresa Payton, former White House CIO, explains how AI is accelerating cyber threats and why security must be an enterprise-wide priority. Listen here: https://bit.ly/4cKjao9
-
What does it take to break barriers in STEM and inspire the next generation? Join Dr. Jeanette Epps and Mary Wells on April 30, 11am ET as they share insights on leadership, innovation, and the power of representation. Save your spot: https://bit.ly/4uSt4vT
-
-
IT leaders are confident, but 79% still lack the data access needed to move AI forward. Our 2026 Data Readiness Index shows why ROI is lagging: silos, poor visibility, and inconsistent governance. The differentiator? Trusted, accessible data. More in Forbes: https://bit.ly/48B5NVS
-
Secure, scalable streaming is key to enterprise AI. Cloudera and Conduktor help organizations simplify Kafka, strengthen governance, and deliver trusted real-time data for AI. See how this partnership turns streaming data into a competitive advantage ⬇️ #ClouderaPartners