In spring 2018, JetBrains polled over 1,600 people involved in Data Science and based in the US, Europe, Japan, and China, in order to gain insight into how this industry sector is evolving. Here's what we learned.
We distributed the survey via targeted ads on Facebook, Twitter, and LinkedIn. We screened respondents by excluding those who replied "I am not involved in data analysis." We collected 400 complete and valid responses from the US, Japan, and China. To represent Europe, we used quotas for select European countries to collect a set of responses which also totaled 400.
Some bias is likely present as JetBrains users may have been more willing on average to complete the survey.
The raw survey data are available for your perusal.
Number of respondents: 1282
According to the very first table, a relatively small percentage of the respondents work on model deployment in production mode. This correlates with multiple market research studies which report that the majority of enterprises are just starting to explore machine learning and deep learning.
They have small teams working on PoCs*, and model deployment in production still needs to be addressed. But I expect this type of activity to become more and more visible within the next few years when more and more businesses will proceed from PoCs to production deployments.
*PoCs — Proof of concepts
Number of answers: 1522
Number of answers: 1522
15% of data scientists are going to adopt or migrate to C++ in the next 12 months. This is probably due to performance issues.
Number of answers: 1522
Most respondents believe that Python will remain on top for the next 5 years.
Overall, people tend to choose the language they use. Of those who don’t use a language they think will dominate, most want to start using it. Half of those who believe Kotlin will dominate are planning to adopt it in the nearest future.
No surprises with the programming languages. Traditional data scientists are some of the most likely to still use R, there are plenty of statistics libraries for R. The new generation of data scientists are choosing Python.
When it comes to high-performance data analytics, I’d expect to see C/C++ in the picture. Currently, we are observing that many HPC techniques and tools are being adopted and re-used for high-performance data analytics and deep learning.
Kotlin adopters
7% of data analysts using a programming language want to adopt Kotlin for Data Science in the nearest future.
Number of answers: 60
Number of answers: 112
Kotlin Learning
Vitaly Khudobakhshov
Product Manager, JetBrains
Kotlin is a general-purpose language running on the Java virtual machine. It is concise and easily integrates with popular data processing frameworks such as Hadoop and Spark.
Kotlin is statically typed and uses type inference that increases its reliability. These features all make Kotlin a handy instrument for data engineering and data science.
Thomas Nield has assembled a helpful collection of Kotlin resources for data science on his Github.
If you are new to Kotlin and are considering it for your next language, start from learning the basic syntax.
If you are already familiar with Java, you may want to have a play with Kotlin Koans.
Number of answers: 1477
One third of those who say they work with big data don’t use any big data tools. Conversely, a third of those who do NOT work with Big data DO use some big data tools. Still, this self-identification does correlate with formal factors.
Number of answers: 1522
Number of answers: 1666
Number of answers: 1666
Number of answers: 1666
Number of answers: 1666
Number of answers: 1666
78% data science specialists perform computations on local machines.
Number of answers: 527
We received 77 responses from people who don’t use any programming languages and aren’t about to adopt any (5% of all data analysts who responded).
Number of answers: 77
Number of answers: 43
These respondents use spreadsheet editors more often than average, and most of them work in non-IT industries. They also tend to use data analysis tools less often.
Number of answers: 924
Number of answers: 918
Answers:
1 = not at all
5 = a great deal
Number of answers: 924
Answers:
1 = not at all
5 = a great deal
Number of answers: 1666
Number of answers: 733
Number of answers: 933
Demographics
Number of answers: 1666
Number of answers: 1666
Number of answers: 1666
Number of answers: 924
This question was directed to professionals, that is, people professionally involved in data science or data analysis and working full-time or part-time.
Work environment and employment
Number of answers: 1014
Number of answers: 1086
Number of answers: 1666
Number of answers: 917
PyCharm Professional Edition is a Python IDE that enables Data Scientists and Web developers to become far more productive.
It offers in-depth Python code analysis and integrates with various libraries, frameworks, and tools. PyCharm's scientific tools are designed specifically with professional data analysts in mind and include a scientific development mode, integration with conda, code cells, Jupyter Notebook support, and much more. There is first-class support available for SQL databases as well.
Datalore is an intelligent web application for data analysis and visualization for Python, with built-in tools and libraries for machine learning all included.
The smart Python code editor helps users write better code with suggestions, autocompletion, and syntax highlighting. Incremental recalculation enables dependencies between multiple computations to be followed, so users don’t have to track which parts of the code were affected by recent edits. And there is access to the extended data storage and high-performance computational resources (including GPU instances) for an enhanced exploration experience.
If you have any questions or suggestions,
please contact us at surveys@jetbrains.com.