Sign in to view Jan’s full profile
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Jan’s full profile
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Greater London, England, United Kingdom
Sign in to view Jan’s full profile
Join with email
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
6K followers
500+ connections
Sign in to view Jan’s full profile
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Jan
Join with email
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
View mutual connections with Jan
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Sign in to view Jan’s full profile
or
Already on LinkedIn? Sign in
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Articles by Jan
-
How to make a success story of your data science team.
How to make a success story of your data science team.
Data science resounds throughout every industry and has reached the mainstream media. I no longer have to explain what…
39
2 Comments -
Organising the Zoopla HackathonMay 21, 2019
Organising the Zoopla Hackathon
Zoopla just ran a successful 2 day hackathon and it all started with 3 people and a dream. Wouldn’t it be great if we…
60
3 Comments -
How to build a Recommendation Engine quick and simpleApr 8, 2019
How to build a Recommendation Engine quick and simple
Part 1: an introduction, how to get to production in a week and where to go after that This article is meant to be a…
57
5 Comments -
Location Location LocationApr 1, 2019
Location Location Location
How to create geographic area embeddings using Machine Learning and a little black magic wizardry. The Zoopla.
31
3 Comments -
Rendezvous Architecture for Data Science in ProductionMar 2, 2019
Rendezvous Architecture for Data Science in Production
Part 1: The real challenge in data science It is impossible to miss how the data field gained some new buzzwords. It…
26
3 Comments
Activity
6K followers
-
Jan Teichmann reposted thisJan Teichmann reposted thisFuture Anthem unveils Build Mode: the gambling industry’s first #AI Platform-as-a-Service. Build Mode is a PaaS licence for customers and partners, opening Anthem’s industry-scale infrastructure to data teams - removing the need for expensive multi-year in-house builds. It offers an end-to-end capability across sports betting, casino, lottery and bingo, delivered on real-time infrastructure already operating at scale, leveraging Amazon Web Services (AWS) and Databricks technologies. Read the full press release here: https://hubs.la/Q03-WGSC0
-
Jan Teichmann shared thisHeading to #ICE in a few days and looking forward to talking about the real-world application of AI and data in gaming. Reach out to arrange a meeting, and remember that places are available for #FutureAnthem's AI Masterclass sessions. Sign up here: https://lnkd.in/dky2PedF
-
Jan Teichmann shared thisany product designers looking for a new opportunity? I can certainly vouch for Kristof as an amazing person to work with!Jan Teichmann shared thisHello everyone 👋 I’m looking for an awesome Senior Product Designer to join my team at Ometria ATM 🕵️♂️ I thought I’d make a quick video to say hello and share a bit more info about what I’m after. You can apply here: https://lnkd.in/gAedgiSe Please like/ share with your networks or contacts to get the word out! TY! ❤️ #productdesign #hiring #hiringdesigners #ux #ui
-
Jan Teichmann shared thisIf you are attending the AI Summit in London, why not come to my presentation: Delivering the Last Mile of Inference: Getting the Right Models to the Right Places Thursday, September 23, 2021 2:55 PM to 3:20 PM Scaling Intelligence - Headline Stage https://lnkd.in/dj6MQawF #datascience #ai #machinelearning
-
Jan Teichmann shared thisMy team is looking for a Senior Data Engineer to join us and grow the team. In this role you are working with Trainline’s real-time data product platform and team to develop exiting data products. Apply via the link below: https://lnkd.in/eHixYcR #data #hiring #jobs #dataengineering #kafkastreams #scala
-
Jan Teichmann shared thisBridging the divide between Data Engineering and Data Science: how to combine Python and Scala in Spark Pipelines. https://lnkd.in/ggXvDwK
-
Jan Teichmann shared thisWe are having a reunion - one year after Data Science Pioneers was released. And what a year it turned out to be! Join us on April 29th in some reflection and discussion of the past, present and future of Data Science. https://lnkd.in/eyJ4YKu
-
Jan Teichmann shared thisJan Teichmann shared thisWe are excited to share the schedule for the Applied AI & DevOps meetup, in partnership with QuantumBlack, happening on March 31st. Please take a look below at our meetup page, to see the full itinerary for the evening, and to RSVP. https://lnkd.in/erShSxS We look forward to hearing from Lim Hoang, Alex Spanos and Jan Teichmann. We do hope you can join us! #chisquareanalytics #chisquaregroup #ai #devops #machinelearning #meetup #virtualmeetings #events #onlineevents
-
Jan Teichmann liked thisJan Teichmann liked thisHuge thanks to Databricks. We had a blast sponsoring the end of year bash tonight. What a fantastic FY26. Looking forward to working together this year. Great to see you all. Good to see you Michael Green & Richard Shaw
-
Jan Teichmann liked thisJan Teichmann liked thisHad a wonderful three days at ICE, including hosting one of Future Anthem's AI masterclasses. A big thanks to all our clients & contacts for taking the time to connect.
-
Jan Teichmann reacted on thisJan Teichmann reacted on thisWOW! What more can I say. I'm heading back home after our latest away day in Manchester. The team is about 100 strong now and they are incredible. I leave feeling proud of what have achieved together and also pretty tired! This week we planned and started to execute the 2026 strategy. We kicked the year off with our CKO. Followed by the CKO den. Where the team competed to win a prize of £1000 for the best AI augmented solution that pushed our business forward. I was BLOWN AWAY!! So much innovation. Finally we celebrated 2025 with our annual awards. This time awarded with a champagne reception under a trex at Manchester Museum. Then ate and danced the night away. Thanks to everyone at Advancing Analytics. As always. We are hiring.
-
Jan Teichmann liked thisJan Teichmann liked thisJust got back from a quick but brilliant stop at ICE Barcelona 2026! Big thanks to Christopher Conroy and the team at Future Anthem for having me on their AI Masterclass panel yesterday. We discussed the launch of their new "Build Mode" platform – which is powering personalisation across the global gaming industry – and how they're using the Databricks Lakehouse to deliver real-time insights to customers and partners. 💡 One thing really struck me during the conversation: we're at an inflection point... ⚡ The days of needing 50 engineers just to build your data platform? They're behind us. The plumbing problem is solved. 🏅 Future Anthem has built something genuinely world-class on top of Databricks, and it's proof that your competitive advantage isn't about managing data anymore – it's about how quickly you can turn it into experiences that keep players coming back! Cc. Finn Cormican, Ash Akhan
-
Jan Teichmann reacted on thisJan Teichmann reacted on thisFuture Anthem unveils Build Mode: the gambling industry’s first #AI Platform-as-a-Service. Build Mode is a PaaS licence for customers and partners, opening Anthem’s industry-scale infrastructure to data teams - removing the need for expensive multi-year in-house builds. It offers an end-to-end capability across sports betting, casino, lottery and bingo, delivered on real-time infrastructure already operating at scale, leveraging Amazon Web Services (AWS) and Databricks technologies. Read the full press release here: https://hubs.la/Q03-WGSC0
-
Jan Teichmann liked thisJan Teichmann liked thisICE is coming up quickly, and I’m looking forward to a busy few days of conversations, and Future Anthem's AI Masterclass sessions. If you’re attending, feel free to get in touch to book a meeting, and secure your spot at our AI Masterclasses, which will cover a range of essential topics: https://lnkd.in/eqZYJ-E5
-
Jan Teichmann liked thisJan Teichmann liked thisFuture Anthem is focused helping our customers win in the Moments that Matter. If you want to win the race to real time personalisation with Netflix-style experiences in a World Cup year, please DM me to set something up at ICE. To help our customers win the race in a changing, complex world, we're hosting a series of practical AI Masterclass sessions around personalisation - spaces are limited, so register soon! https://lnkd.in/db4Grf-8
Experience & Education
-
City University London
****** ** ********** ******* undefined undefined
-
-
********** ** **********
****** ** ******* ******* undefined
-
View Jan’s full experience
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Publications
-
A Reward-driven Model of Darwinian Fitness
In Proceedings of the 7th International Joint Conference on Computational Intelligence
In this paper we present a model that, based on the principle of total energy balance (similar to energy conservation in Physics), bridges the gap between Darwinian fitness theories and reward-driven theories of behaviour. Results show that it is possible to accommodate the reward maximization principle underlying modern approaches in behavioural reinforcement learning and traditional fitness approaches. Our framework, presented within a prey-predator model, may have important consequences in…
In this paper we present a model that, based on the principle of total energy balance (similar to energy conservation in Physics), bridges the gap between Darwinian fitness theories and reward-driven theories of behaviour. Results show that it is possible to accommodate the reward maximization principle underlying modern approaches in behavioural reinforcement learning and traditional fitness approaches. Our framework, presented within a prey-predator model, may have important consequences in the study of behaviour.
Other authors -
-
Models of aposematism and the role of aversive learning
City University London
See publicationThe thesis will identify open questions of interest around aposematism. In the second chapter the thesis will focus on the perspective of the prey. The introduction of a game theoretical model of co-evolution of defence and signal will be followed by an adaptation of the model for finite populations. In finite populations, investigating the co-evolution of defence and signalling requires an understanding of natural selection as well as an assessment of the effects of drift as an additional…
The thesis will identify open questions of interest around aposematism. In the second chapter the thesis will focus on the perspective of the prey. The introduction of a game theoretical model of co-evolution of defence and signal will be followed by an adaptation of the model for finite populations. In finite populations, investigating the co-evolution of defence and signalling requires an understanding of natural selection as well as an assessment of the effects of drift as an additional force acting on stability. In the third chapter the thesis will adopt the perspective of the predator. It will introduce reinforcement learning as an normative framework of rational decision making in a changing environment. An analysis of the consequences of aposematism in combination with aversive learning on the predator’s diet and energy intake will be followed by a lifetime model of optimal foraging behaviour in the presence of aposematic prey in the fourth chapter. In the last chapter I will conclude that the predator’s aversive learning process plays a crucial role in the form and stability of aposematism. The introduction of temporal difference learning allows for a better understanding of the specific details of the predator’s role in aposematism and presents a way to take the discipline forward.
-
The Evolutionary Dynamics of Aposematism: a Numerical Analysis of Co-Evolution in Finite Populations
Cambridge University Press
The majority of species are under predatory risk in their natural habitat and targeted by predators as part of the food web. During the evolution of ecosystems, manifold mechanisms have emerged to avoid predation. So called secondary defences, which are used after a predator has initiated prey-catching behaviour, commonly involve the expression of toxins or deterrent substances which are not observable by the predator. Hence, the possession of such secondary defence in many prey species comes…
The majority of species are under predatory risk in their natural habitat and targeted by predators as part of the food web. During the evolution of ecosystems, manifold mechanisms have emerged to avoid predation. So called secondary defences, which are used after a predator has initiated prey-catching behaviour, commonly involve the expression of toxins or deterrent substances which are not observable by the predator. Hence, the possession of such secondary defence in many prey species comes with a specific signal of that defence (aposematism). This paper builds on the ideas of existing models of such signalling behaviour, using a model of co-evolution and generalisation of aversive information and introduces a new methodology of numerical analysis for finite populations. This new methodology significantly improves the accessibility of previous models.
Other authors -
-
The Application of Temporal Difference Learning in Optimal Diet Models
Journal of Theoretical Biology
An experience-based aversive learning model of foraging behaviour in uncertain environments is presented. We use Q-learning as a model-free implementation of Temporal Difference learning motivated by growing evidence for neural correlates in natural reinforcement settings. The predator has the choice of including an aposematic prey in its diet or to forage on alternative food sources. We show how the predator's foraging behaviour and energy intake depends on toxicity of the defended prey and…
An experience-based aversive learning model of foraging behaviour in uncertain environments is presented. We use Q-learning as a model-free implementation of Temporal Difference learning motivated by growing evidence for neural correlates in natural reinforcement settings. The predator has the choice of including an aposematic prey in its diet or to forage on alternative food sources. We show how the predator's foraging behaviour and energy intake depends on toxicity of the defended prey and the presence of Batesian mimics. We introduce the precondition of exploration of the action space for successful aversion formation and show how it predicts foraging behaviour in the presence of conflicting rewards which is conditionally suboptimal in a fixed environment but allows better adaptation in changing environments.
Other authors -
Recommendations received
-
LinkedIn User
“I was fortunate enough to work with Jan for over a year at Trainline. Jan is highly intelligent, practical, hardworking and cares deeply about producing high quality work. Anyone working with Jan will likely learn new ideas and concepts (in and out of data science) through interacting with him. ”
8 people have recommended Jan
Join now to viewView Jan’s full profile
-
See who you know in common
-
Get introduced
-
Contact Jan directly
Explore more posts
-
Juan Felipe Amaya Ramirez
Amazon Web Services (AWS) • 1K followers
I might be a bit late to the party, but Spark Declarative Pipelines look seriously impressive. What really caught my attention is how similar this feels to the problem DBT is trying to solvev but pushed further into the Spark ecosystem and, ultimately, lakehouse architectures. The idea of defining data workflows at a higher level of abstraction while letting Spark handle the execution details feels like a big step forward for data engineering productivity. Definitely planning to run a couple of POCs to explore its potential in real-world scenarios. Curious to hear if anyone here has already tried it! what’s your experience so far? https://lnkd.in/efvmGyQq
11
-
Sam McCormick
Senior data scientist with 5+… • 576 followers
Very proud to have worked on developing the first open source modelling suite for Marketing Mix Models (MMM), which we at Mutinex hope will lay the groundwork for a more democratic and transparent MMM landscape. The new validation suite enables practitioners to rigorously test and compare MMM models in a consistent, open framework. It's a step toward greater accountability, better standards, and shared progress in marketing science. We’re excited to share it with the community - contributions and feedback are more than welcome! https://lnkd.in/dmWgKUd2 https://lnkd.in/dWJPtr2m
10
2 Comments -
Corey Imparato
RecruitSeq • 10K followers
"My manager told me, 'Look for exciting roles. If something hits, we're sad to see you go, but we totally get it.'" Heard this from a very talented ML scientist facing funding cuts. It's an uncomfortable truth about job security in science right now. Many roles, especially in biotech, have unclear runways. Funding dictates everything. Your employer knows it. If they're a good manager like this one, they expect you to explore options. It's not disloyal, it's realistic. Prioritize your career stability. Always keep an eye out for what's next. Don't wait until the last minute. Who else has noticed this shift? #Hiring #JobSearch #Biotech #ScienceCareers #CareerGrowth
12
-
Yacine Jernite
Hugging Face • 2K followers
The Hugging Face research team has released some of the most extensive and accessible resources on training language models - and they do it with the Least Nonsense Ever, which I really truly appreciate 🤗 Their latest guidebook ties much of it together, and I really recommend at least the intro to anyone regardless of technical background to get a better sense of many of the important questions underlying AI development AND governance: https://lnkd.in/ewnX9YCX
96
-
Muhammad Arham
AgileOne • 3K followers
𝗥𝗲𝘁𝗵𝗶𝗻𝗸𝗶𝗻𝗴 𝗦𝟯 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻𝗶𝗻𝗴 — 𝗔 𝗠𝗶𝘀𝘁𝗮𝗸𝗲 𝗜 𝗛𝗮𝗱 𝘁𝗼 𝗨𝗻𝗹𝗲𝗮𝗿𝗻 Early in my current role, I followed what seemed like a good practice: partitioning S3 data by year/month/day. Our structure looked neat and tidy: s3://bucket/data/year=2024/month=01/day=01/data.csv It made perfect sense at the time. The folders were easy to browse. Everyone was happy - until they weren’t. Then came a simple request: "Can we pull all data from Jan 1, 2024, to Feb 10, 2024?" The query engine had to generate dozens of OR clauses, failed to prune partitions efficiently, and even hit query length limits in Athena. Performance? Not great. Debugging? Even Worse. We were burning time trying to make the engine play nice with a structure it was never built for. 𝐓𝐡𝐞 𝐛𝐞𝐭𝐭𝐞𝐫 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡? Partition by full date: dt=2024-01-01 This format: - Keeps partitions in natural, lexical order (ISO 8601) - Supports simple range queries - Prunes faster and scales cleaner Sure, filtering an entire year now looks like: WHERE dt >= '2024-01-01' AND dt <= '2024-12-31' instead of WHERE year = '2024' But that’s a pretty small tradeoff. Most queries span ranges and not fixed months or years. Optimizing for a rare shortcut isn’t worth slowing down daily usage. This one change significantly improved performance, simplified our queries, and made our data lake genuinely easier to work with. If you're still doing year=2024/month=01/day=01, take this as your sign to rethink it. #DataEngineering #CloudData #S3 #AWS #Partitioning
20
1 Comment -
Hasan Geren
ProcurePro • 2K followers
I’ve been using dbt and dlt a lot lately, especially spending a fair amount of time optimising my pipelines to make better use of historical data. That led us to write a post on dbt Snapshots and Slowly Changing Dimensions, with an end-to-end implementation using: - dlt (from dltHub) for incremental ingestion from the 11-Seven Fuel Price API - DuckDB for storage and interactive analytics - dbt-core (from dbt Labs) for transformations and SCDs logic 🔗 Full post: https://lnkd.in/gdttmyvX 📂 Github repository: https://lnkd.in/g56Gm8PW This is the 4th post of our dbt in Action series and coming up next: - dbt Semantic Layer and MetricFlow You may like this series if you're learning dbt or looking to strengthen your skills with hands-on examples! #dbt #dlt #duckDB #dataengineering #PipelineToInsights
90
11 Comments -
Rajesh M.
Paycom • 565 followers
For Data engineer, working on Streaming projects really gives great experience and the depth of understanding the technical challenges is bit bitter but makes you really strong which something I have came across last week. In simple words, the story is last year I have built the Streaming Framework (will provide more insights on this in my next post) and now I have to updrade the Spark from 3.3.0 to 3.5.2 in which the query processing mode trigger.Once is deprecated, so instead need to use the trigger.Available. Diff b/w trigger.Once & trigger.AvailableNow? Trigger once processes all available data at the time the query is started in a single micro-batch. After processing this initial batch, the stream terminates. Available now also processes all data available at the time of query startup. However, unlike Trigger.Once, it processes this data in multiple micro-batches instead of one, respecting rate limits and available resources. After processing all the initially available data, the query stops. Great, so what's the issue here? Well, I have been going through all kind tests on processing the query bassed on checkpoint mechanism with different offset scenarios. In one of the case when no offsets are available to process when comparted to previous run, then the Graceful shutdown process is not happening (I have implemented the graceful shutdown process to handle network issue failures or any such) in OnQueryProgress, so once the query is processed successfully then shutdown will be initiated. What made this happen in AvailableNow scenario? Spark checks the source at the start of the query and determines that the last committed offset (from the checkpoint) is already the latest offset available in the source, it concludes that there is no new data to process within its defined scope. In this scenario, Spark will directly initiate query termination because its designated task is already fulfilled. Therfore, onQueryTerminated will be initiated without onQueryProgress. But in trigger.Once scenario, even if there are no new records, Trigger.Once will still perform a batch execution (even if it's an empty batch). This allows the query to report its progress, using onQueryProgress before it signals termination. After reporting the progress for the batch, the query terminates, potentially calling onQueryTerminated afterwards. In summary, the trigger.Once will always perform a batch execution (even an empty one) means it will always produce an onQueryProgress event (with potentially 0 input rows), whereas Trigger.AvailableNow might skip directly to termination if no data needs processing within its scope. Therefore, I made changes to handle the graceful shutdown process in onqueryTermination as well. Finally, what I am trying to say is the diff b/w the trigger once and available now is not only how the data process happens from single batch to micro-batches respectively but also internally how the streaming query executes.
22
-
Delphine Rabiller
FreeAgent • 1K followers
We've just published a deep dive into how we're re-architecting our data platform automations at FreeAgent as we migrate our data pipelines to Dagster! We detail the three main strategies we explored—Schedules, Asset Sensors, and Declarative Automation—and explain why we ultimately made Declarative Automation our default, asset-centric approach. It's simplified our architecture and given us the control we need for complete, condition-based asset materialization. Read the full post to learn about our decision-making process and how you can apply these techniques to your own pipelines. https://lnkd.in/ex3PsX5r
19
-
Stefano Barzaghi
880 followers
It was great to get more clarity on dbt Fusion last night at the dbt Global Circuit event in London, hosted by The Information Lab. Hearing directly from Richard Persaud of dbt Labs really helped connect the dots, especially how Fusion builds on the SDF Labs acquisition and introduces a completely reimagined Rust-based engine, while keeping the same authoring layer we know from dbt Core. 🧠 What stood out: – Compilation that actually understands SQL, not just preprocesses it – Real-time syntax validation and error surfacing in VS Code – Smarter orchestration that knows what truly needs to run – A faster, smoother developer experience The concept of SQL comprehension is especially promising, moving dbt beyond templating toward deeper awareness of what the code actually does: ✔️ Catch subtle errors before runtime ✔️ Understand column types and query logic statically ✔️ Unlock stronger testing, lineage, and debugging They’ve confirmed that dbt Core will remain supported, but it’s clear that Fusion is where future development is headed. The success of Fusion will probably depend on how dbt Labs stewards the ecosystem from here. Really appreciated the open conversations and transparency, and excited to see how this evolves in the coming months. Big thanks to Edward Hayter, James Charnley, and Nathan Purvis for putting together such a great event! #dbtFusion #SQLComprehension #AnalyticsEngineering #DataEngineering #ModernDataStack #dbt #LondonData
20
1 Comment -
Nathan Chang
Faculty of Computing & Data… • 1K followers
Excited to share our recent work on LIBRAG, an existing RAG-based search system built for the Boston Public Library’s Digital Commonwealth archive. When we took over the project, the core RAG pipeline was functional, but retrieval relied primarily on semantic similarity, meaning structured constraints like time period or material type were only implicitly captured in embeddings rather than explicitly enforced at retrieval time. This sometimes led to semantically relevant but contextually incorrect results. Our team addressed this by adding LLM-driven metadata extraction to translate natural-language queries into structured filters applied directly during retrieval, which doubles context recall (236% increase) while keeping context precision nearly unchanged, significantly improving result reliability. Huge thanks to my teammates Penny (Pei-Chen) Lin, Saksham Goel, and Elinor H., and to our PM Aditya Kocherlakota, TPM Ananya Singh, and our amazing professor Ali Nahvi, PhD for the great guidance. Special thanks to our client, Eben English for the domain expertise, feedback, and collaboration throughout the project. Check it out! https://lnkd.in/ekNYNyRN
45
4 Comments -
Luiz Felipe Mendes
iWish Gifts • 5K followers
NotebookLM is an awesome tool for research and study. I think it is turning to be almost as a new interface for Search, we have been using Google as it is for years but sometimes our questions and what we are looking for are more specific or more complex than a websearch. Examples of usage: - Opening a coffee shop in Brazil -> It will give you many sources and information that you can filter or deep dive into. - Planning a trip - Learning a new skill This new way of addin sources expand our hability of using it as a "new Search" system. https://lnkd.in/g95dWSVe
15
1 Comment -
Steven Knopf
John Lewis Partnership • 2K followers
Scaling data pipelines in a major retail environment like the John Lewis Partnership requires a smart architectural approach. Our latest blog post from Nishad Sohoni reveals how the JLP data engineering teams use a 'building block' methodology to create reusable, maintainable, and robust data solutions. Read the full post here: https://lnkd.in/eyGmZABE #DataEngineering #SoftwareArchitecture #DataPipelines #TechBlog #JohnLewisPartnership
8
-
Kanishka Thakur
Titanbay • 1K followers
💡 Will MCP servers change the data landscape forever? I had to find out what they were capable of, by testing out the dbt Labs MCP server! I used it to analyse my dbt models built on the Northwind dataset. My prompt was simple: "Can you look at my dbt models and give me some ideas for important metrics that I could look into and suggest some models for reporting purposes?" The output (see a snippet in the screenshot!) provided a solid data model analysis, pinpointed key entities, and suggested crucial business metrics and even SQL for a mart_customer_ltv model (along with many others!). This really enhances the ability to explore and understand data landscapes, I'm very excited to see what the future brings! Have you tried out any MCP servers with LLMs? What has your experience been like? More information in the comments below 👇 #MCP #AI #dbt #DataModeling
17
3 Comments -
Godfrey Leung, PhD
Collinson • 1K followers
"Everything should be made as simple as possible (, but not simpler)" One common theme/advice across the talks at the #DSFGameOn event today, is to "start simple", including productionisation of LLM/GenAI or agentic AI applications As a former theoretical physicist turned data scientist, it's always something I strive for in data science/ML projects In the end, if you can solve your problems using simple solutions, why bother "wasting" your time and effort to build something complex that is not just more expensive, but also difficult to manage, maintain or understand? ;) @DataScienceFestival
21
-
Ryan H.
Capital One • 10K followers
https://lnkd.in/efVu3FTa The pipeline I mainly work with is a combination of Step functions and Glue jobs. Earlier this week one of the jobs was having issues. I spoke with the people testing it and it was taking a long time but also had a boto error (which I forget now but was basically a connection error for S3). Anyway this job was taking 10+ hours and after some modifications it should only take 20 minutes. The job was basically 1) read S3 path for gzipped files 2) loop over files and unzip 3) multipart upload to another place If you have a loop inside a Glue job where the step is independent of other steps just doing pure file I/O, ETL or whatever you can modify the Step function definition to be a map state and then it'll run these in parallel. You could also do this a few other ways but it'd require using asyncio or something and this is probably simpler with fewer code changes. Also, changing the number of workers for a Glue job like this won't impact the job. Most of the workers would just be waiting.
5
-
Mikołaj Sędek
Limango Polska (A member of… • 4K followers
So far still one of the best PySpark programming guides ever published, I recommend it to anyone working with the framework - I especially like the ideas of separiting complex chains into smaller objects and making filters readable with simple objects definitions, what I would personally add: - write all your data processing functions as pure functions that are easy to chain and test - separate all IO operations to separate functions or (if you prefer) OOP objects to separate difficult to test side-effects from your pure functions - create utils packages and reuse across projects - it’s DRY and useful (something you can’t achieve with SQL) - functional programming approach wins with OOP in at least several major aspects - testing is easier and functional composition is easier to maintain plus more open to changes than inheritance patterns - keep your documentation close to code - use typing, docstrings - it’s much easier then to document whole projects using GitHub Copilot or similar AI tools if all your functions are documented with human readable docstrings - if you keep your processing functions simple and add docstrings - writing unit tests with pytest, Copilot, and AI models like Claude 3.7 works as a charm - pro tip: use well structured prompts #PySpark #CodingGuide #ETL
142
7 Comments -
Mikkel Dengsøe
SYNQ • 9K followers
We’ve been experimenting with SQLMesh for a while here at SYNQ, and I’m really impressed with the thought that the team at Tobiko has put into the DataOps workflows. Here are a few of my favourites. 🏗️ SQLMesh’s virtual data environments are a novel approach to DataOps. Instead of duplicating massive datasets for each environment, SQLMesh creates views that reference materialized tables. This makes dev, staging, and test environments low-cost, fast to spin up, and easy to manage. When you start developing, SQLMesh creates a dev schema that mirrors production but keeps references to stored data. This avoids re-materializing upstream tables, as only new tables are materialized. Deploying a change simply updates the production schema’s views to point to the latest version, while old versions remain available for rollbacks until cleanup. 🔄 Incremental models are also much easier to reason about. SQLMesh handles things like lookback windows and missing intervals out of the box, and stores model state persistently. That means fewer full refreshes and less custom logic to manage edge cases. At SYNQ, we’re seeing more teams adopt SQLMesh for this flexibility and then come to us for the observability layer, adding anomaly monitoring to catch ‘unknown unknowns’, getting a single pane view of glass into the data health, and managing it all from SQLMesh model properties. Some interesting details have gone into the integration we built, including how we map virtual data environments to actual tables as their state changes. We’ve written more about that integration here: ➡️ SQLMesh overview: https://lnkd.in/drXNP86j ➡️ How we map SQLMesh virtual data environments to tables: https://lnkd.in/dm6eT5gW
50
6 Comments -
Oleg Efimov
Bumble • 2K followers
2025 was my first full year in the UK — less about major professional leaps, more about settling in and laying foundations. But looking back, a few things stood out. The highlight was publishing a four-part series on MySQL data types. These articles tackle the subtle pitfalls I still get questions about from colleagues — and honestly, quirks that trip me up too after 15+ years of working with MySQL: • Date & Time: https://lnkd.in/dmEM7tcn • Numeric types: https://lnkd.in/dQFZW2aF • SET & ENUM: https://lnkd.in/dsmfRrJ4 • String types: https://lnkd.in/dV3NM-vk I hope you find them useful! Feel free to pass them on to anyone who might benefit from a refresher. What MySQL quirk has surprised you most in production? I'd love to hear — drop a comment below. Over the year I completed three AWS badges (Cloud Practitioner, Networking Core, Compute) — focused upskilling that complements platform work in hybrid environments. This is a good addition to my on-premises and GCP experience and I plan to strength this by getting certifications for both GCP and AWS in 2026. On the community side, PHP turned 30 in June. 🐘 For many of us, it was the first language we used to build something real — and it's still evolving. Thanks to everyone who read, commented, or connected this year. Your feedback shaped these pieces. #PlatformEngineering #MySQL #DatabaseDesign #AWS #GCP #PHP #TechCommunity #2025Rewind -- Get your 2025 LinkedIn Rewind! Go to rewind.coauthor.studio
40
-
Sina Salek
Geodesic Labs • 1K followers
"AI is finally showing up in the productivity numbers." I wrote recently about the UK productivity narrative and felt unconvinced. On the ground, most non-tech organisations I meet are still doing painfully manual work, or catching up on the digital basics. Then I read a new Financial Times piece by Erik Brynjolfsson arguing the AI productivity take-off is now visible in US data [1]. The core idea is simple: output stayed strong while measured labour input got revised down, which mechanically looks like productivity growth. Here’s the part that actually reconciles my scepticism with the hype. In the last month or so, there really has been a step change in what AI can do when the work has cheap, automated verification. Software is the obvious example. If the output can be tested, compiled, linted, benchmarked, or checked against a spec, you can move from "prompting" to delegation with guardrails. That is not just faster typing. It is compressing whole work loops. And you do not need everyone doing it for the macro stats to twitch. If a small cohort of "power users" start automating end-to-end workstreams, it can plausibly show up in the aggregates while the median firm still feels stuck in spreadsheet land. So I think the question is less "is AI boosting productivity" and more: Are we seeing broad adoption, or concentrated adoption? If it is concentrated, the next bottleneck is not the models. It is process redesign, data plumbing, and getting the average employee to the point where AI is a daily tool, not a toy. [1] https://lnkd.in/ehRBFyWb #Productivity #AI #GenAI #DigitalTransformation #UK
6
1 Comment -
Zoë Turner
The Strategy Unit • 722 followers
I've been trying out AI to help with coding and whilst it's fast and efficient, it's too keen to be helpful and it's irritating. I want interactions that acknowledge the limits of my questions. Something that says "that doesn't exist", "you shouldn't do that", but it looks like AI can't do that. 🤖 versus 👩💻 In Quarto you can use code to bring content into a page from another page https://lnkd.in/e5G6KpUc and I've been looking for a way to bring in a paragraph instead of whole pages. I used Copilot which nicely quotes these Quarto documentation pages and sometimes Stackoverflow, but when I asked it about this idea of parts of a document it gave me {{< embed file#subheader >}} type code. Looking for anything on "embed" returned nothing. 👩💻I think it's made up so I asked if it was made up. 🤖 It cheerfully acknowledged it had and repeated the previous code from Quarto. It made me realise you can get really far with AI in both code and text but if we want to push the limits of our knowledge and understanding, for the moment at least, we still need human intelligence.
28
18 Comments
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More