Building or running data infrastructure is an important part of 55% of 372 data engineers’ jobs, according to the “2020 Kaggle Machine Learning & Data Science Survey.” These data engineers are supporting data science applications as well as other use cases. Data engineers are actually a bit more likely (58%) to be analyzing and understanding data in order to influence decisions as part of their job.
Data scientists focus on analysis, which is not as important for machine learning (ML) engineers. Still, there are many similarities between the 2,421 data scientists and 937 machine learning (ML) engineers in the Kaggle survey, with about the same percentage improving ML models, as well as building/running a ML service to improve a product or service.
At 18%, data engineers are more than twice as likely as data scientists to use cloud-based software and APIs as their primary tool to analyze data. They also exhibited a greater likelihood to analyze data in the cloud. Local development environments like Jupyter Notebooks are most likely to be used by all the job roles we reviewed. Basic statistical software, which is defined as spreadsheets, is very popular among software engineers. This is a reminder that just because they know Python doesn’t mean developers will use data tools for data science.



Feature image via Pixabay.

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

























































