Industry: Software development

JetBrains products used: TeamCity

Organization Size: 1000+

Country: International

How the IntelliJ Infrastructure Team Cuts Build Execution Time by 30% With TeamCity

JetBrains relies on TeamCity to power CI/CD for its world-class IDEs, enabling 700+ developers to run thousands of daily builds efficiently. We spoke with the IntelliJ Infrastructure team about how they streamline workflows and ensure fast, reliable releases.

Introduction

JetBrains develops some of the world’s most popular IDEs, including IntelliJ IDEA, PyCharm, and WebStorm. Behind the scenes, a dedicated infrastructure team ensures that hundreds of developers can efficiently build, test, and release these products.

At the heart of this process is TeamCity, JetBrains’ own CI/CD solution, which enables build and test automation and scalable infrastructure management.

We spoke with the IntelliJ Infrastructure team to learn how they use TeamCity to manage CI/CD pipelines for 700–800 developers, orchestrate thousands of daily builds, and optimize their workflows to ensure fast and reliable releases.

The challenge: Scaling CI/CD for hundreds of developers

With a large team of developers constantly pushing changes, maintaining fast, reliable, and scalable CI/CD pipelines is not a simple task. The IntelliJ Infrastructure team needs a solution that can handle massive scale, with the ability to run thousands of builds per day without overwhelming available resources. They also require smart automation to minimize manual intervention while ensuring the delivery of high-quality code.

Additionally, support for diverse infrastructure environments, including on-premises, AWS, macOS, Linux, and Windows agents, is crucial. To maintain high code quality, the team also has to manage hundreds of thousands of automated tests, all while minimizing flaky test failures.

TeamCity, a powerful CI/CD solution developed by JetBrains, has proven to be the perfect solution to tackle these challenges head-on.

“We build IDEs and everything related to them in TeamCity, like internal services, services for statistics, etc. I’m very used to TeamCity and feel like it’s a hammer in my hand: you can do anything with it.”

— Dmitrii Panov, IntelliJ Infrastructure Technical Lead

How TeamCity helps the IntelliJ Infrastructure team

Safe Push

The greater IntelliJ team’s process currently involves each developer pushing code as soon as it’s ready, without accumulating all of the changes in one branch and waiting for it to be tested together.

To support this workflow, the IntelliJ Infrastructure team implemented the Safe Push mechanism, which is essentially a set of composite build configurations in TeamCity. For users, the process is pretty straightforward: they just push the changes via their IDEs and that’s it.

Behind the scenes, a dedicated internal service analyzes the Safe Push change set and uses the TeamCity REST API to trigger the builds required to test the changes. It also restarts failed builds to retry flaky tests.

By making it possible to reuse successful builds , TeamCity helps significantly speed up Safe Pushes and reduce their costs, as the Safe Push Tests build can contain up to 700,000 tests. The build reuse feature reuses artifacts, dependencies, and test results instead of building everything from scratch. This improves build efficiency, speeds up CI/CD pipelines, and reduces resource consumption.

Saving resources by reusing builds

Build reuse is one of the TeamCity features the team finds particularly useful. Here’s what a typical TeamCity build chain looks like.

Example of a build chain in TeamCity

When a test chain doesn’t pass on the first attempt, the Safe Push mechanism retries it twice more to make sure there is actually a problem.

Let's consider a typical push to see how build reuse optimizes retries. On the first attempt, all 329 builds in the chain are executed. 319 pass, but 10 fail, so the chain is re-run. This time, only the 10 failing builds are run, and the rest are reused. 6 more pass, but some are left. So on the third attempt the remaining 4 need to be run, while 325 are reused.

This drastically reduces the load on build agents and significantly accelerates retry times. Instead of running three full attempts, each taking roughly 3 hours, build reuse cuts down execution time by about 30%.

Support for massive scale with seamless performance

The IntelliJ team runs 158,000+ builds for over 180 unique users per day, making efficiency critical. Thanks to TeamCity’s build reuse capabilities:

  • 75% of builds don’t need a full re-run when a test fails.
  • Retry times are cut down by an average of 30%, significantly reducing bottlenecks.
  • Developers can continuously merge without blocking each other, as over 90 Safe Push tests can run at the same time.

TeamCity Grafana dashboard

Kotlin DSL

The IntelliJ Infrastructure team manages TeamCity via the Kotlin DSL. With the Kotlin DSL , you get all of the benefits of a complete programming language coupled with the strengths of a DSL designed to build pipelines as code.

Currently, the TeamCity instance contains over 2,000 projects, 15,000 build configurations, and almost 300 VCS roots. The team cannot imagine having to manage such a huge instance using anything other than the Kotlin DSL.

“We also have almost 350 unit tests just for the Kotlin settings themselves. I wouldn’t want to have that experience in YAML.”

— Dmitrii Panov, IntelliJ Infrastructure Technical Lead

Agent management

TeamCity’s powerful agent management capabilities make scaling CI/CD pipelines easy and efficient. Unlike YAML-based CI/CD systems that require complex workarounds – such as anchors, go-to statements, and manual retry logic – TeamCity simplifies this process with built-in agent orchestration.

For teams using AWS , TeamCity seamlessly integrates with cloud infrastructure, allowing agents to be automatically launched and managed based on demand. For example, you can configure TeamCity to spin up the exact number of agents needed in any given situation and shut them down when idle, ensuring optimal resource utilization.

This level of precise agent control is far more intuitive compared to other solutions, where external plugins may introduce compatibility issues and require additional troubleshooting.

At scale, TeamCity handles massive workloads, supporting queues of 10,000+ builds and managing up to 5,000 build agents dynamically. One standout feature is TeamCity’s ability to work with preemptible AWS instances. If an instance is reclaimed by AWS, TeamCity can automatically restart the build on another agent, preventing disruptions and ensuring smooth execution.

Metrics that the team tracks

When managing a CI/CD platform for hundreds of developers with tens of thousands of builds per day, it’s important to keep track of the metrics that matter the most. TeamCity provides a number of diagnostic tools and indicators out of the box. Since TeamCity can also export metrics in the Prometheus format, you can build custom dashboards for more advanced use cases.

“TeamCity provides us with some basic building blocks, say, statistics for builds or tests. We can then build something more sophisticated on those basic structures.”

— Aleksei Smirnov, Software Developer on the .NET Infrastructure team

Here are the main metrics that the IntelliJ Infrastructure team keeps a close eye on to optimize their CI/CD process with TeamCity.

Number of builds per day

The team monitors around 158,000 builds daily to understand workload and performance. While the number of builds itself is not something that the team pays very close attention to, it’s still an important metric for predicting the workload on build agents.

Concurrency of Safe Pushes

At any given time, 80–90 Safe Push checks are running simultaneously. It’s important for the team to track how long each safe push check takes – the faster it executes, the faster the build runs.

Build reuse efficiency

The team analyzes how many builds are reused during retries, reducing the need for redundant execution and saving resources.

Test execution

There are almost 700,000 tests within one composite build in the IntelliJ TeamCity project.

Successful test runs within one composite build in TeamCity

The team tracks over 700,000 tests per Safe Push chain to identify any failures. The team also measures test flakiness, automatically retrying failed tests and muting unstable ones for further investigation.

One of the metrics that the team is trying to improve is the number of attempts required for a successful run. This will help them save resources and time for each run.

Agent utilization

The team monitors over 5,000 AWS and physical build agents, optimizing resource allocation with features like build reuse, which helps reduce build time by about 30%.

Things that could be improved in TeamCity

While the IntelliJ Infrastructure team uses TeamCity daily for a large number of tasks, and notes that it helps them solve many problems, there are a few things they would like to see added to it.

Team-specific functionality

While TeamCity allows access to all build chains and projects within an organization, developers often need a more focused view of their relevant builds. For instance, a developer working on PyCharm should ideally see only PyCharm-specific builds, tests, and configurations without having to navigate through unrelated projects.

A streamlined dashboard, displaying only relevant build configurations, tests, and muted tests, would significantly improve usability. The team has attempted to address this issue by creating a Grafana-based solution, but a more integrated and user-friendly approach within TeamCity would be ideal.

Muted test management

The team feels that TeamCity would greatly benefit from a mute management system. Given the high volume of automated tests, some tests inevitably need to be muted. While TeamCity provides functionality for muting tests, it’s still very easy to mute a test and forget about it, which complicates the work of other colleagues who need to use it.

A dedicated mute management system, where teams can easily view, manage, and track muted tests relevant to their projects, would help ensure better ownership and transparency.

Conclusion

The IntelliJ Infrastructure team has mastered CI/CD efficiency at scale with TeamCity, streamlining builds, optimizing resource usage, and significantly reducing execution times. By leveraging Safe Push, build reuse, and automated test retries, they’ve cut build execution times by 30%, ensuring that 700–800 developers can push changes seamlessly.

The IntelliJ Infrastructure team's story proves that TeamCity is not just a CI/CD tool – it’s the backbone of large-scale software development.