Data Collection and Use Policy

This document describes how JetBrains handles the JetBrains AI Service usage related data.

The JetBrains AI Service can collect two types of data related to the usage of AI features: behavioral and detailed data. Both of these types of data collection are fully controlled by the user.

The data from JetBrains AI Service is sent to third-party language model providers (such as OpenAI), which means said data is also processed on those providers’ servers (and according to their policies); neither the user nor JetBrains has control over this third-party data processing. JetBrains does not work with large language model providers that use customer data for training models, but providers can store data for other purposes such as abuse/misuse monitoring. Please check the list of engaged third-party language model providers and the documents describing how they handle the data here.

Behavioral Data Collection

Behavioral data collection includes such data as:

Types of AI features used.
Rates of acceptance for suggestions from different AI features.
Performance data (such as the amount of time it took to generate AI suggestions).
User feedback on the quality of results produced by different AI features.

This type of data does not include any personally identifiable data, or any source code files or fragments from the user’s project.

This data is used by various teams at JetBrains for analyzing product usage, improving product features, and training machine learning (ML) models that control the behavior of different product features (for example, controlling the automatic activation of ML features). It is not used for training ML models that generate code or text, or another type of data from which outputs could be extracted.

Collection of this type of data is controlled by the standard data sharing settings (see the product documentation for details). It is enabled by default in EAP builds and disabled by default in release builds.

Detailed Data Collection

Detailed data collection includes full data about the interactions with large language models. This means the full text of inputs sent by the IDE to the large language model and its responses, including source code snippets.

Access to the collected data will be restricted only to the teams at JetBrains that specifically work on large language model development and integration. This data will be analyzed to understand product usage and identify opportunities for improvement. It will not be used for training any ML models that generate code or text, or revealed in any form to any other users.

We will also implement a retention policy for this data; it will be stored only for a limited amount of time not exceeding 30 days.

Collection of this type of data is enabled only based on explicit approval of users, and is controlled in the product settings.

If the user does not opt in to detailed data collection, the inputs will be sent directly to the LLM provider and processed according to their data collection and use policy, and the outputs will be sent directly to the user IDE. The inputs and outputs will not be persistently stored on JetBrains servers.

Last modified: 16 January 2025