Back

How often do we change our sbt builds

I spend a lot of time thinking about build tools and popular setups, and what changes are more likely to impact developers positively.

In one of these ramblings, one question popped up: "How often do we change our builds?"

It's not easier to answer this question, data on the topic is scarce. So I challenged myself to come up with some data to satisfy my curiosity.

Next I show some of the data I collected from Scala projects using sbt, the most popular Scala build tool. I would love to see how these numbers change across build tools --- I'm sure they do.

Methodology

  1. Clone the repo and follow installation instructions if required.
  2. Run a script that detects changes to all *.sbt files across all commits.
  3. Get three data points: total commits, commits that modified the build and percentage of commits that changed the build.

The source code of the script is available in the following Gist. Feel free to run it in your project.

Disclaimer

There are two limitations with this approach:

Results

I collected the data in Airtable. It's sorted in descending order. The results vary between ~37% and 0.43%.


Conclusions

To my suprise, the projects seemed to be clustered in two groups in a homonogenous way:

  1. A group of 16 projects covering the range of 15-37% of modifications to the build.
  2. A second group of 16 projects covering the range 0.43-8.7% modifications to the build.

It's surprising that the division is so perfect... After all, I picked the projects randomly[^p]. It seems that there's no clear-cut percentage of build modifications in a project.

The data points analyzed are not enough to draw meaningful conclusions. However, we can safely assert that the developers (on average) of 50% of popular open-source Scala projects change their build at least 15%.

If your project belongs to the second group, changes in the build files are scarce and therefore unlikely to affect your developer workflow. If you are in the first group, especifically if you are working on a codebase like circe, your developer workflow is slown down by reloading the build after every change.

The builds of these projects are all complex and, therefore, 15% of changes in the build are bottle-necked by sbt which does not provide fast reload times (let alone fast compilation times aside). Modifications to these builds slow down developers significantly if every change to the build takes at least 15s [^h].

I don't have any particular advice for maintainers of such projects except for identifying why so many build changes are required and trying to outsource them to external tools. Most of the build-related problems can be solved outside of sbt.

[^h]: Note that the data only shows committed changes to sbt build files. I don't know how many changes were required before the commit got merged. Maybe a good approximation would be two or three local changes per one committed change, on average?

[^p]: The analyzed projects have medium-to-large size and they have been sampled from the local Scala community build in my computer (they contain both libraries and applications). It's likely there's still a bias.

Integrate Bloop with Bazel/Pants

Integrate Bloop with Pants/Bazel

What is the future of build tools such as Pants and Bazel in the Scala community? Can we accelerate the adoption rate by integrating these tools with the existing tooling ecosystem?

Tools 101

Bloop is a Scala build server that compiles, tests and runs Scala fast.

Pants and Bazel are scalable language-agnostic build systems. They support Scala and also need to compile, test and run Scala fast.

Why would we want Bazel and Pants to integrate with Bloop? How could such an integration work given the seemingly competing goals of both tools?

This article answers these questions and summarizes my excellent discussions with Natan (contributor to the Bazel Scala rules @ Wix) and Stu, Danny and Win (core maintainers of Pants @ Twitter) during Scala Days 2019.

Motivation

There are three main arguments to motivate the integration.

#1: Straight-forward integration with editors

Adoption of build tools is limited by how well they integrate with existing developer tools. For example, how well you can use them from an editor.

Currently, Pants and Bazel only support IntelliJ IDEA via their custom IDEA plugins. These integrations are difficult to build, test and maintain.

Bloop provides Bazel and Pants a quick way to integrate with the vast majority of editors used in the Scala community: IntelliJ IDEA via the built-in BSP support in intellij-scala and VS Code, Vim, Emacs, Sublime Text 2, and Atom via Metals.

The integration is easy to build, test and maintain, it relieves build tool maintainers from implementing specific editor support for users and allows sharing improvements in editors support across build tools.

#2: Faster local developer workflows

Bazel and Pants promise reproducible builds. Reproducibility is a key property of builds. It gives developers confidence to deploy their code and reason about errors and bugs.

To make compilation reproducible, incremental compilation is disabled and build requests trigger a full compilation of a target foo every time:

  1. One of the build inputs of foo is modified (such as a source)
  2. Users ask for build tool diagnostics or a compilation of foo

A best practice in Bazel and Pants is to create fine-grained build targets of a handful of sources. Fine-grained build targets help reduce the overhead of full compiles: they compile faster, increase parallel compilations and enable incremental compilation at the build target level.

However, even under ideal compilation times of 1 or 2 seconds per compiled build target, there are scenarios where instant feedback cannot be achieved:

  1. Language servers such as Metals that forward diagnostics from Bazel and Pants will take 1 or 2 seconds at best to act on diagnostics, making the slowdown noticeable to users.
    • Metals also needs class files/semanticdb files to provide a rich editing experience (go to definition, find all references).
  2. Common scenarios such as changing a binary API can trigger many compilations downstream that take a long time to finish, slowing down even more the build diagnostics in the editor.

An integration with Bloop speeds up local developer workflows by allowing local build clients (such as editors) to trigger incremental compiles while isolating these compiles completely from Bazel or Pants.

In practice, this means build clients such as Metals can use Bloop to receive build diagnostics fast (in the order of 50-100ms) and collect class files in around 400-500ms, meaning developers feel instant feedback from the build tool.

And Bloop guarantees compilation requests from Bazel and Pants will:

  • Trigger a full compile per build target (same output for same input)
  • Never conflict with other client actions
  • Can be reused by clients that want fast, incremental compiles

(These guarantees are unlocked by the latest Bloop v1.3.2 release.)

Integrating with Bloop brings Bazel/Pants users the best of both "worlds":

  1. Bazel and Pants can still offer reproducible builds to users with no cache pollution. The cache engine in Bazel and Pants only gets to "see" class files produced by full compilations.
  2. Developers sensitive to slow feedback in the editor can opt-in for incremental compiles from their editor in a local machine. In case of rare incremental errors, they can trigger a compilation from the build tool manually to restore a clean state.
  3. Developers that don't want to compromise build reproducibility to get faster workflows can enable a Bloop configuration setting to keep using full compiles from their editor, while still getting faster compiles than they would if they used the compilation engine from Pants or Bazel.

#3: State-of-the-art compilation engine

Currently, the Scala rules in Bazel and Pants implement their own compilation engine, interface directly with internal Scala compiler APIs and have a high memory and resource usage footprint because they spawn a JVM server that cannot be reused by external build clients.

The advantages of using Bloop to compile Scala code are the following:

  1. Speed. Bloop implements a compilation engine that:
    1. is the fastest to this date
    2. has been tweaked to have the best performance defaults
    3. uses build pipelining to speed up full build graph compilations
    4. is benchmarked in 20 of the biggest Scala open source projects
    5. is continuosly improved and maintained by compiler engineers
  2. Supports pure compilation. Bloop can recompile build targets from scratch if it's told to do so by the build tools.
  3. Minimal use of resources. Bloop can be reused by any local build client, including those from other build tools and workspaces.
  4. Lack of maintenance. The compilation engine doesn't need to be maintained by neither the Bazel nor the Pants team.
  5. Simple integration. The integration is done via the Build Server Protocol, which requires only a few hundred lines of code and is decoupled from any change in the compiler binary APIs.

How to integrate with Bloop

There are several ways to integrate Bloop and Pants/Bazel with varying degrees of functionality.

Which integration is the best ultimately depends on what clients want/don't want to give up and what are the key motivations behind the integration. The move from one integration oto another one can be done gradually.

Barebone integration: only generating .bloop/

Bloop loads a build by reading Bloop configuration files from a .bloop/ directory placed in the root workspace directory.

A configuration file is a JSON file that aggregates all of the build inputs Bloop needs to compile, test and run. It is written to a directory in the file system to simplify access and caching when the build tool is not running but other clients are. Every time a configuration file in this directory changes, the Bloop server automatically reloads its build state.

A barebone integration is the simplest Bloop integration: Pants or Bazel generate Bloop configuration files to a .bloop directory. Whenever there is a change in a build target, Bazel or Pants regenerate its configuration file again.

Here's a diagram illustrating the barebone integration.

Note that:

  1. There are several clients talking to Bloop manned by developers
  2. The build tool and Bloop use different compilers/state
  3. Bazel/Pants write configuration files, Bloop only reads them
  4. .bloop is the workspace directory where files are persisted

Pros

  1. Easy to prototype (Danny and I implemented it in Pants in 4 hours)
  2. Out-of-the-box integration with Metals and CLI (motivation #1)

Cons

  1. Requires writing all configurations to a .bloop/ in the workspace.
  2. The Bloop compiles are not integrated with those of the build tool. This implies that this solution doesn't satisfy users that want:
    • A faster developer workflow (motivation #2)
    • A state-of-the-art compilation engine (motivation #3)

    because the build tool and Bloop use their own compilers.

BSP integration: generating .bloop/ and talking BSP

To enable a solution that not only provides the possibility of using Bazel/Pants from any editor but also has a faster developer workflow than the status quo, we need to look at ways we can enable Bloop to do the heavy-lifting of compilation.

In a way, Bazel and Pants become build clients to the BSP build server in Bloop:

  1. A compile in Bazel or Pants maps to a compile request to Bloop
  2. Bazel and Pants receive compilation logs and class files from Bloop

The following diagram illustrates how the architecture looks like:

We can see that Bazel / Pants no longer own compilers and that they instead communicate with the Bloop server via BSP. To implement that, the build tools can use bsp4j, a tiny Java library that implements the protocol and allows the client to listen to all results/notifications from the build tool.

There are, however, different ways Bazel or Pants can offload compilation to Bloop. Let's illustrate both of them with a simple build.

The straight-forward mechanism to offload compilation is to let the Bazel/Pants build tool drive the compilation itself.

Upon the first compilation of a target C, the build tool would:

  1. Make sure there is an open BSP connection with the Bloop server.
  2. Visit C, find dependency B is not compiled.
  3. Visit B, find dependency A is not compiled.
  4. Visit A, no more dependencies, then:
    1. Generate configuration file for A
    2. Send Bloop compile request for A to write class files
  5. Come back to B, no more uncompiled dependencies, then:
    1. Generate configuration file for B
    2. Send Bloop compile request for B to write class files
  6. Come back to C, no more uncompiled dependencies, then:
    1. Generate configuration file for C
    2. Send Bloop compile request for C to write class files

(The build tool can safely visit a build graph in parallel.)

This mechanism works if one wants the build tool to own and control the way compilations are run, but it's slower than letting Bloop compile a subset of the build graph on its own, where Bloop can (among other actions):

  • Start the compilation of a project before its dependencies are finished (e.g. start compiling B right after A is typechecked). This is the so-called build pipelining.
  • Compile faster by populating symbols from in-memory stores instead of reading class files from the file system.
  • Amortize the cost of starting a compilation by compiling a list of build targets at the same time.

The build tool could benefit from all of these actions by just changing how it maps compilation requests to the Bloop BSP server:

  1. Make sure there is an open BSP connection with the Bloop server.
  2. Visit C, find dependency B has no config.
  3. Visit B, find dependency A has no config.
  4. Visit A, no more dependencies, then generate config for A
  5. Come back to B, no more dependencies, generate config for B
  6. Come back to C, no more dependencies, generate config for C
  7. Send a Bloop compile request from C.
    • Bloop will start compiling the build graph in the background.
    • After building a target, Bloop sends a notification to client.
  8. Visit B, find dependency A is not compiled.
  9. Visit A, wait for Bloop's end notification for A.
  10. Come back to B, wait for Bloop's end notification for B.
  11. Come back to C, wait for Bloop's end notification for C.

Right after receiving the notifications from the server, the build tool will find all the compilation products written in the classes directory specified in the configuration file. Meaning the build tool can immediately start evaluating tasks that depend on compilation products for that project.

(Sbt will offload compilation to Bloop by following this strategy in the next Bloop release.)

Pros

  1. Out-of-the-box integration with Metals and CLI (motivation #1)
  2. A faster local developer workflow (motivation #2)
  3. A state-of-the-art compilation engine that compiles the build graph as fast as possible for the build tool, with a simple protocol that decouples the build tools from compiler internals

Cons

  1. Not as straight-forward to implement as the first shallow integration, but still doable and abstracted away from compiler internals.
  2. Requires writing all configurations to a .bloop/ in the workspace.

Manual binary dependency

It is possible (but discouraged) to use Bloop's compilation engine via a library dependency and interface directly with Bloop internal compiler APIs. However, most of the nice performance advantages of using Bloop will be lost as those are implemented in how the schedluling of build targets is implemented.

Pros

  1. Can yield some compile speedups if the internals are used correctly

Cons

  1. No out-of-the-box integration with Metals and other clients (motivation #1)
  2. Same local developer workflow as now (motivation #2)
  3. Difficult to implement and maintain (similar situation as the status quo)
    • Bloop compiler APIs change frequently
    • Bloop compiler APIs do not promise binary compatibility

CI compatibility

The CI doesn't pose any integration problems for Bloop. When Bazel runs compilation in the build farm, the Bloop Launcher will open a connection with a Bloop server and start compiling, in a similar way to how the current rules Scala in Bazel or Pants work.

Conclusion

This document motivates an integration with Bloop, explains why build tools such as Pants and Bazel would like to integrate with it and what are the consequences to their users.

This document intentionally goes into not only ideas but also implementation details to show how a full end-to-end integration from Bazel or Pants to Bloop is possible and can be implemented. Despite a few minor improvements missing in the latest Bloop release, build tool engineers could implement an integration that works tomorrow while solving fundamental problems present today.

Overload methods with extra parameter lists

Overload methods with parameter lists

Have you ever wondered if you can enrich a method that you need to implement in a class to get more information from the call site?

For example, let's say you have method debug in a logger interface AbstractLogger. Can we implement the logger interface and at the same time overload debug with another version that takes more parameters every time the users of our API call debug?

In fact, can we do this without breaking binary compatibility in the interface that defines debug and ensuring that the users of our API can only call the enriched method?

I asked myself this question three days ago and I came up with a solution that I think it's worth a short explanation in this blog post. My use case was triggered by a feature I wanted to add to bloop (a fast compilation server for Scala).

What is the Problem?

It's always amazed me how verbose the debug log level can be under tools such as sbt. This verbosity typically stands in the way of finding the cause for a resolution or compilation misbehavior. Running debug on the sbt shell would dump more than 20.000 logs in my screen, enough to overflow my terminal buffer and lose potentially important debug logs on the way.

I've found myself often in this scenario. It feels like you're trying to find a needle in a haystack. It can be better if you're lucky enough to know the shape of the debug messages you're after (you can grep), but this is rarely the case.

I wanted bloop users to have a better time debugging a compilation or testing issues by narrowing down the scope of the debug logs with filters. bloop test my-app --debug test would only dump debug logs related to the test task, instead of all the other debug messages in unrelated tasks.

The logging infrastructure in bloop implements several third-party Logger interfaces and aggregates them in an abstract class BloopLogger (for simplicity we'll extend only one: AbstractLogger).

// A third-party logger interface (in our classpath)
abstract class AbstractLogger {
  def debug(msg: String): Unit
  def info(msg: String): Unit
  def error(msg: String): Unit
}

// The logger interface that we use in all the bloop APIs
abstract class BloopLogger extends AbstractLogger

// One simple implementation of a bloop logger
class SimpleLogger extends BloopLogger {
  override def debug(msg: String): Unit = println(s"Debug: $msg")
  override def info(msg: String): Unit = println(s"Info: $msg")
  override def error(msg: String): Unit = println(s"Error: $msg")
}

We'd like to add an enriched version of debug that looks like debug(msg: String)(implicit ctx: DebugContext), where DebugContext identifies the context where debug is called. (We decide to make the parameter implicit, but there's no reason why you shouldn't be able to make it explicit.)

Third-party logging APIs are frozen and cannot be modified, so we cannot change the original debug method signature in AbstractLogger.

Besides, we don't want to add an special method debugWithFilter that we would need to teach to all Bloop project contributors. We would spend a lot of time telling contributors that they must not use the normal debug, but debugWithFilter because bla.

What we really want is to shadow the original debug method with the "enriched" debug method so that only the latter can be used by default.

So, wrapping up, we don't only want to overload a method, but also shadow it, and we want to do that without changing the public API of the interfaces we implement. It looks hard but let's persevere.

How do we go about implementing this feature?

A First Approach

We know beforehand the compiler will tell us there's some kind of ambiguity between the two debug methods, but bear with me and let's write the simplest possible solution: let's add the new debug method in SimpleLogger and then use it in a small main method.

// The debug context that we want to pass implicitly
sealed trait DebugContext
object DebugContext {
  case object Ctx1 extends DebugContext
  case object Ctx2 extends DebugContext
}

// The logger interface that we use in all the bloop APIs
abstract class BloopLogger extends AbstractLogger {
  def debug(msg: String)(implicit ctx: DebugContext): Unit
}

// One simple implementation of a bloop logger
final class SimpleLogger extends BloopLogger {
  override def debug(msg: String): Unit = println(s"Debug: $msg")
  override def info(msg: String): Unit = println(s"Info: $msg")
  override def error(msg: String): Unit = println(s"Error: $msg")

  override def debug(msg: String)(implicit ctx: DebugContext): Unit =
    println(s"Debug: $msg")
}

// The application that is the use site of our logger API
object MyApp {
  def main(args: Array[String]): Unit = {
    val logger = new SimpleLogger
    logger.debug("This is a debug message")
  }
}

When we compile the above code, the compiler emits the following error:

ambiguous reference to overloaded definition,
  both method debug in class SimpleLogger of type (msg: String)(implicit ctx: DebugContext)Unit
  and  method debug in class SimpleLogger of type (msg: String)Unit
  match argument types (String)

(Scastie link to runnable code here.)

Can we work around this ambiguous reference? Let's consider all our possibilities.

If we try to change the call-site to select the most specific debug method (logger.debug("This is a debug message")(DebugContext.Ctx1)), the error persists.

If we try to move the new debug definition to the implementation class, then it won't be usable by APIs using BloopLogger, which the rest of our codebase does because we have several implementations.

It looks like everything is lost. But this is the moment when knowing or intuiting that the ambiguity checker inside the compiler relies on the linearization order saves your day.

A Solution that Relies on Class Linearization

First off, what's the linearization order? There are a few good resources in Internet, such as this one, that explain it well. But let me oversimplify and say that you can think of the linearization order as the order with which Scala will initialize an instance of a given class and all its super classes (or traits).

When Scala looks up the definition of a member, it relies on the linearization order to pick the first unambiguous candidate. A quick example:

trait A
trait B extends A { def foo: String }
trait C
class D extends C with B

// The linearization order to find `foo` is `D -> C -> B`
(new D).foo

The same procedure happens when Scala checks for ambiguous references and emits errors such as the ones we got before. As this example illustrates, the compiler will not exhaustively look for definitions of foo in all transitive super classes, it stops at the first search hit.

This insight means that we can modify our previous example such that our enriched debug method is always found first. This way, we dodge the ambiguous reference to overloaded debugs.

// We make `DebugLogger` private at the logging package level to avoid undesired users
private[logging] abstract class DebugLogger extends AbstractLogger {
  protected def printDebug(msg: String): Unit
  override def debug(msg: String): Unit = printDebug(msg)
}

// The logger interface that we use in all the bloop APIs
abstract class BloopLogger extends DebugLogger {
  def debug(msg: String)(implicit ctx: DebugContext): Unit
}

// One simple implementation of a bloop logger
final class SimpleLogger extends BloopLogger {
  override def info(msg: String): Unit = println(s"Info: $msg")
  override def error(msg: String): Unit = println(s"Error: $msg")
  override protected def printDebug(msg: String): Unit =
    println(s"Debug: $msg")
  override def debug(msg: String)(implicit ctx: DebugContext): Unit =
    printDebug(s"$msg ($ctx)")
}

The trick to make the previous code work is defining the implementation of the simple debug method in DebugLogger and making BloopLogger extend DebugLogger. We have introduced a printDebug to avoid inter-dependencies between the two debug methods, as they will cause other reference errors.

Once we have defined the method we want to shadow in a super class of the class we want to support in our API (in this case BloopLogger), the logger implementations only need to define the enriched debug method.

Users of this API will not be able to call the simple debug unless they do an upcast to the third-party logger AbstractLogger. This is intended -- the goal is to have a good default, not to make it completely impossible to call the simple debug, so make sure that it still has a sensible implementation.

In Bloop's case, the third-party logger API is never exposed and I recommend doing so in your application or library if you can.

With the above code, compiling our simple MyApp fails compilation with a could not find implicit value for parameter ctx: DebugContext, which confirms us that Scala is successfully selecting the right method.

We can fix it by passing the context either implicitly or explicitly.

object MyApp {
  def main(args: Array[String]): Unit = {
    println(
      "Running demo application for https://jorge.vican.me/post/overload-methods-with-more-parameter-lists/")
    implicit val ctx: DebugContext = DebugContext.Ctx1
    val logger = new SimpleLogger
    logger.debug("This is a debug message")
  }
}

Complete Scastie Example

Conclusion

Overloading a method inherited from a third-party class is possible in Scala. It requires a little bit of gymnastics, but once we're familiar with the technique we can apply it in many other scenarios.

The same technique works with new explicit parameters (instead of implicit parameters). They key point is that we can overload methods by adding extra parameter lists to their definition, playing with the linearization order and defining the methods in the right place.

Profiling and reducing compile times of typeclass derivation

Profiling and reducing compilation times of typeclass derivation

I have recently published a blog post in the official Scala website about my recent work on scalac-profiling.

scalac-profiling is a new compiler plugin to complement my recent work on the compiler statistics/sampling infrastructure merged in Scala 2.12.5 and available from then on.

In the blog post I talk about compilation performance, typeclass derivation, the expensive price of derivation via implicits and how to scalac-profiling to speed up the compile times of a Bloop module by 8x.

Read more in scala-lang.

Integrate Bloop with Bazel/Pants

Integrate Bloop with Pants/Bazel

What is the future of build tools such as Pants and Bazel in the Scala community? Can we accelerate the adoption rate by integrating these tools with the existing tooling ecosystem?

Tools 101

Bloop is a Scala build server that compiles, tests and runs Scala fast.

Pants and Bazel are scalable language-agnostic build systems. They support Scala and also need to compile, test and run Scala fast.

Why would we want Bazel and Pants to integrate with Bloop? How could such an integration work given the seemingly competing goals of both tools?

This article answers these questions and summarizes my excellent discussions with Natan (contributor to the Bazel Scala rules @ Wix) and Stu, Danny and Win (core maintainers of Pants @ Twitter) during Scala Days 2019.

Motivation

There are three main arguments to motivate the integration.

#1: Straight-forward integration with editors

Adoption of build tools is limited by how well they integrate with existing developer tools. For example, how well you can use them from an editor.

Currently, Pants and Bazel only support IntelliJ IDEA via their custom IDEA plugins. These integrations are difficult to build, test and maintain.

Bloop provides Bazel and Pants a quick way to integrate with the vast majority of editors used in the Scala community: IntelliJ IDEA via the built-in BSP support in intellij-scala and VS Code, Vim, Emacs, Sublime Text 2, and Atom via Metals.

The integration is easy to build, test and maintain, it relieves build tool maintainers from implementing specific editor support for users and allows sharing improvements in editors support across build tools.

#2: Faster local developer workflows

Bazel and Pants promise reproducible builds. Reproducibility is a key property of builds. It gives developers confidence to deploy their code and reason about errors and bugs.

To make compilation reproducible, incremental compilation is disabled and build requests trigger a full compilation of a target foo every time:

  1. One of the build inputs of foo is modified (such as a source)
  2. Users ask for build tool diagnostics or a compilation of foo

A best practice in Bazel and Pants is to create fine-grained build targets of a handful of sources. Fine-grained build targets help reduce the overhead of full compiles: they compile faster, increase parallel compilations and enable incremental compilation at the build target level.

However, even under ideal compilation times of 1 or 2 seconds per compiled build target, there are scenarios where instant feedback cannot be achieved:

  1. Language servers such as Metals that forward diagnostics from Bazel and Pants will take 1 or 2 seconds at best to act on diagnostics, making the slowdown noticeable to users.
    • Metals also needs class files/semanticdb files to provide a rich editing experience (go to definition, find all references).
  2. Common scenarios such as changing a binary API can trigger many compilations downstream that take a long time to finish, slowing down even more the build diagnostics in the editor.

An integration with Bloop speeds up local developer workflows by allowing local build clients (such as editors) to trigger incremental compiles while isolating these compiles completely from Bazel or Pants.

In practice, this means build clients such as Metals can use Bloop to receive build diagnostics fast (in the order of 50-100ms) and collect class files in around 400-500ms, meaning developers feel instant feedback from the build tool.

And Bloop guarantees compilation requests from Bazel and Pants will:

  • Trigger a full compile per build target (same output for same input)
  • Never conflict with other client actions
  • Can be reused by clients that want fast, incremental compiles

(These guarantees are unlocked by the latest Bloop v1.3.2 release.)

Integrating with Bloop brings Bazel/Pants users the best of both "worlds":

  1. Bazel and Pants can still offer reproducible builds to users with no cache pollution. The cache engine in Bazel and Pants only gets to "see" class files produced by full compilations.
  2. Developers sensitive to slow feedback in the editor can opt-in for incremental compiles from their editor in a local machine. In case of rare incremental errors, they can trigger a compilation from the build tool manually to restore a clean state.
  3. Developers that don't want to compromise build reproducibility to get faster workflows can enable a Bloop configuration setting to keep using full compiles from their editor, while still getting faster compiles than they would if they used the compilation engine from Pants or Bazel.

#3: State-of-the-art compilation engine

Currently, the Scala rules in Bazel and Pants implement their own compilation engine, interface directly with internal Scala compiler APIs and have a high memory and resource usage footprint because they spawn a JVM server that cannot be reused by external build clients.

The advantages of using Bloop to compile Scala code are the following:

  1. Speed. Bloop implements a compilation engine that:
    1. is the fastest to this date
    2. has been tweaked to have the best performance defaults
    3. uses build pipelining to speed up full build graph compilations
    4. is benchmarked in 20 of the biggest Scala open source projects
    5. is continuosly improved and maintained by compiler engineers
  2. Supports pure compilation. Bloop can recompile build targets from scratch if it's told to do so by the build tools.
  3. Minimal use of resources. Bloop can be reused by any local build client, including those from other build tools and workspaces.
  4. Lack of maintenance. The compilation engine doesn't need to be maintained by neither the Bazel nor the Pants team.
  5. Simple integration. The integration is done via the Build Server Protocol, which requires only a few hundred lines of code and is decoupled from any change in the compiler binary APIs.

How to integrate with Bloop

There are several ways to integrate Bloop and Pants/Bazel with varying degrees of functionality.

Which integration is the best ultimately depends on what clients want/don't want to give up and what are the key motivations behind the integration. The move from one integration oto another one can be done gradually.

Barebone integration: only generating .bloop/

Bloop loads a build by reading Bloop configuration files from a .bloop/ directory placed in the root workspace directory.

A configuration file is a JSON file that aggregates all of the build inputs Bloop needs to compile, test and run. It is written to a directory in the file system to simplify access and caching when the build tool is not running but other clients are. Every time a configuration file in this directory changes, the Bloop server automatically reloads its build state.

A barebone integration is the simplest Bloop integration: Pants or Bazel generate Bloop configuration files to a .bloop directory. Whenever there is a change in a build target, Bazel or Pants regenerate its configuration file again.

Here's a diagram illustrating the barebone integration.

Note that:

  1. There are several clients talking to Bloop manned by developers
  2. The build tool and Bloop use different compilers/state
  3. Bazel/Pants write configuration files, Bloop only reads them
  4. .bloop is the workspace directory where files are persisted

Pros

  1. Easy to prototype (Danny and I implemented it in Pants in 4 hours)
  2. Out-of-the-box integration with Metals and CLI (motivation #1)

Cons

  1. Requires writing all configurations to a .bloop/ in the workspace.
  2. The Bloop compiles are not integrated with those of the build tool. This implies that this solution doesn't satisfy users that want:
    • A faster developer workflow (motivation #2)
    • A state-of-the-art compilation engine (motivation #3)

    because the build tool and Bloop use their own compilers.

BSP integration: generating .bloop/ and talking BSP

To enable a solution that not only provides the possibility of using Bazel/Pants from any editor but also has a faster developer workflow than the status quo, we need to look at ways we can enable Bloop to do the heavy-lifting of compilation.

In a way, Bazel and Pants become build clients to the BSP build server in Bloop:

  1. A compile in Bazel or Pants maps to a compile request to Bloop
  2. Bazel and Pants receive compilation logs and class files from Bloop

The following diagram illustrates how the architecture looks like:

We can see that Bazel / Pants no longer own compilers and that they instead communicate with the Bloop server via BSP. To implement that, the build tools can use bsp4j, a tiny Java library that implements the protocol and allows the client to listen to all results/notifications from the build tool.

There are, however, different ways Bazel or Pants can offload compilation to Bloop. Let's illustrate both of them with a simple build.

The straight-forward mechanism to offload compilation is to let the Bazel/Pants build tool drive the compilation itself.

Upon the first compilation of a target C, the build tool would:

  1. Make sure there is an open BSP connection with the Bloop server.
  2. Visit C, find dependency B is not compiled.
  3. Visit B, find dependency A is not compiled.
  4. Visit A, no more dependencies, then:
    1. Generate configuration file for A
    2. Send Bloop compile request for A to write class files
  5. Come back to B, no more uncompiled dependencies, then:
    1. Generate configuration file for B
    2. Send Bloop compile request for B to write class files
  6. Come back to C, no more uncompiled dependencies, then:
    1. Generate configuration file for C
    2. Send Bloop compile request for C to write class files

(The build tool can safely visit a build graph in parallel.)

This mechanism works if one wants the build tool to own and control the way compilations are run, but it's slower than letting Bloop compile a subset of the build graph on its own, where Bloop can (among other actions):

  • Start the compilation of a project before its dependencies are finished (e.g. start compiling B right after A is typechecked). This is the so-called build pipelining.
  • Compile faster by populating symbols from in-memory stores instead of reading class files from the file system.
  • Amortize the cost of starting a compilation by compiling a list of build targets at the same time.

The build tool could benefit from all of these actions by just changing how it maps compilation requests to the Bloop BSP server:

  1. Make sure there is an open BSP connection with the Bloop server.
  2. Visit C, find dependency B has no config.
  3. Visit B, find dependency A has no config.
  4. Visit A, no more dependencies, then generate config for A
  5. Come back to B, no more dependencies, generate config for B
  6. Come back to C, no more dependencies, generate config for C
  7. Send a Bloop compile request from C.
    • Bloop will start compiling the build graph in the background.
    • After building a target, Bloop sends a notification to client.
  8. Visit B, find dependency A is not compiled.
  9. Visit A, wait for Bloop's end notification for A.
  10. Come back to B, wait for Bloop's end notification for B.
  11. Come back to C, wait for Bloop's end notification for C.

Right after receiving the notifications from the server, the build tool will find all the compilation products written in the classes directory specified in the configuration file. Meaning the build tool can immediately start evaluating tasks that depend on compilation products for that project.

(Sbt will offload compilation to Bloop by following this strategy in the next Bloop release.)

Pros

  1. Out-of-the-box integration with Metals and CLI (motivation #1)
  2. A faster local developer workflow (motivation #2)
  3. A state-of-the-art compilation engine that compiles the build graph as fast as possible for the build tool, with a simple protocol that decouples the build tools from compiler internals

Cons

  1. Not as straight-forward to implement as the first shallow integration, but still doable and abstracted away from compiler internals.
  2. Requires writing all configurations to a .bloop/ in the workspace.

Manual binary dependency

It is possible (but discouraged) to use Bloop's compilation engine via a library dependency and interface directly with Bloop internal compiler APIs. However, most of the nice performance advantages of using Bloop will be lost as those are implemented in how the schedluling of build targets is implemented.

Pros

  1. Can yield some compile speedups if the internals are used correctly

Cons

  1. No out-of-the-box integration with Metals and other clients (motivation #1)
  2. Same local developer workflow as now (motivation #2)
  3. Difficult to implement and maintain (similar situation as the status quo)
    • Bloop compiler APIs change frequently
    • Bloop compiler APIs do not promise binary compatibility

CI compatibility

The CI doesn't pose any integration problems for Bloop. When Bazel runs compilation in the build farm, the Bloop Launcher will open a connection with a Bloop server and start compiling, in a similar way to how the current rules Scala in Bazel or Pants work.

Conclusion

This document motivates an integration with Bloop, explains why build tools such as Pants and Bazel would like to integrate with it and what are the consequences to their users.

This document intentionally goes into not only ideas but also implementation details to show how a full end-to-end integration from Bazel or Pants to Bloop is possible and can be implemented. Despite a few minor improvements missing in the latest Bloop release, build tool engineers could implement an integration that works tomorrow while solving fundamental problems present today.

Overload methods with extra parameter lists

Overload methods with parameter lists

Have you ever wondered if you can enrich a method that you need to implement in a class to get more information from the call site?

For example, let's say you have method debug in a logger interface AbstractLogger. Can we implement the logger interface and at the same time overload debug with another version that takes more parameters every time the users of our API call debug?

In fact, can we do this without breaking binary compatibility in the interface that defines debug and ensuring that the users of our API can only call the enriched method?

I asked myself this question three days ago and I came up with a solution that I think it's worth a short explanation in this blog post. My use case was triggered by a feature I wanted to add to bloop (a fast compilation server for Scala).

What is the Problem?

It's always amazed me how verbose the debug log level can be under tools such as sbt. This verbosity typically stands in the way of finding the cause for a resolution or compilation misbehavior. Running debug on the sbt shell would dump more than 20.000 logs in my screen, enough to overflow my terminal buffer and lose potentially important debug logs on the way.

I've found myself often in this scenario. It feels like you're trying to find a needle in a haystack. It can be better if you're lucky enough to know the shape of the debug messages you're after (you can grep), but this is rarely the case.

I wanted bloop users to have a better time debugging a compilation or testing issues by narrowing down the scope of the debug logs with filters. bloop test my-app --debug test would only dump debug logs related to the test task, instead of all the other debug messages in unrelated tasks.

The logging infrastructure in bloop implements several third-party Logger interfaces and aggregates them in an abstract class BloopLogger (for simplicity we'll extend only one: AbstractLogger).

// A third-party logger interface (in our classpath)
abstract class AbstractLogger {
  def debug(msg: String): Unit
  def info(msg: String): Unit
  def error(msg: String): Unit
}

// The logger interface that we use in all the bloop APIs
abstract class BloopLogger extends AbstractLogger

// One simple implementation of a bloop logger
class SimpleLogger extends BloopLogger {
  override def debug(msg: String): Unit = println(s"Debug: $msg")
  override def info(msg: String): Unit = println(s"Info: $msg")
  override def error(msg: String): Unit = println(s"Error: $msg")
}

We'd like to add an enriched version of debug that looks like debug(msg: String)(implicit ctx: DebugContext), where DebugContext identifies the context where debug is called. (We decide to make the parameter implicit, but there's no reason why you shouldn't be able to make it explicit.)

Third-party logging APIs are frozen and cannot be modified, so we cannot change the original debug method signature in AbstractLogger.

Besides, we don't want to add an special method debugWithFilter that we would need to teach to all Bloop project contributors. We would spend a lot of time telling contributors that they must not use the normal debug, but debugWithFilter because bla.

What we really want is to shadow the original debug method with the "enriched" debug method so that only the latter can be used by default.

So, wrapping up, we don't only want to overload a method, but also shadow it, and we want to do that without changing the public API of the interfaces we implement. It looks hard but let's persevere.

How do we go about implementing this feature?

A First Approach

We know beforehand the compiler will tell us there's some kind of ambiguity between the two debug methods, but bear with me and let's write the simplest possible solution: let's add the new debug method in SimpleLogger and then use it in a small main method.

// The debug context that we want to pass implicitly
sealed trait DebugContext
object DebugContext {
  case object Ctx1 extends DebugContext
  case object Ctx2 extends DebugContext
}

// The logger interface that we use in all the bloop APIs
abstract class BloopLogger extends AbstractLogger {
  def debug(msg: String)(implicit ctx: DebugContext): Unit
}

// One simple implementation of a bloop logger
final class SimpleLogger extends BloopLogger {
  override def debug(msg: String): Unit = println(s"Debug: $msg")
  override def info(msg: String): Unit = println(s"Info: $msg")
  override def error(msg: String): Unit = println(s"Error: $msg")

  override def debug(msg: String)(implicit ctx: DebugContext): Unit =
    println(s"Debug: $msg")
}

// The application that is the use site of our logger API
object MyApp {
  def main(args: Array[String]): Unit = {
    val logger = new SimpleLogger
    logger.debug("This is a debug message")
  }
}

When we compile the above code, the compiler emits the following error:

ambiguous reference to overloaded definition,
  both method debug in class SimpleLogger of type (msg: String)(implicit ctx: DebugContext)Unit
  and  method debug in class SimpleLogger of type (msg: String)Unit
  match argument types (String)

(Scastie link to runnable code here.)

Can we work around this ambiguous reference? Let's consider all our possibilities.

If we try to change the call-site to select the most specific debug method (logger.debug("This is a debug message")(DebugContext.Ctx1)), the error persists.

If we try to move the new debug definition to the implementation class, then it won't be usable by APIs using BloopLogger, which the rest of our codebase does because we have several implementations.

It looks like everything is lost. But this is the moment when knowing or intuiting that the ambiguity checker inside the compiler relies on the linearization order saves your day.

A Solution that Relies on Class Linearization

First off, what's the linearization order? There are a few good resources in Internet, such as this one, that explain it well. But let me oversimplify and say that you can think of the linearization order as the order with which Scala will initialize an instance of a given class and all its super classes (or traits).

When Scala looks up the definition of a member, it relies on the linearization order to pick the first unambiguous candidate. A quick example:

trait A
trait B extends A { def foo: String }
trait C
class D extends C with B

// The linearization order to find `foo` is `D -> C -> B`
(new D).foo

The same procedure happens when Scala checks for ambiguous references and emits errors such as the ones we got before. As this example illustrates, the compiler will not exhaustively look for definitions of foo in all transitive super classes, it stops at the first search hit.

This insight means that we can modify our previous example such that our enriched debug method is always found first. This way, we dodge the ambiguous reference to overloaded debugs.

// We make `DebugLogger` private at the logging package level to avoid undesired users
private[logging] abstract class DebugLogger extends AbstractLogger {
  protected def printDebug(msg: String): Unit
  override def debug(msg: String): Unit = printDebug(msg)
}

// The logger interface that we use in all the bloop APIs
abstract class BloopLogger extends DebugLogger {
  def debug(msg: String)(implicit ctx: DebugContext): Unit
}

// One simple implementation of a bloop logger
final class SimpleLogger extends BloopLogger {
  override def info(msg: String): Unit = println(s"Info: $msg")
  override def error(msg: String): Unit = println(s"Error: $msg")
  override protected def printDebug(msg: String): Unit =
    println(s"Debug: $msg")
  override def debug(msg: String)(implicit ctx: DebugContext): Unit =
    printDebug(s"$msg ($ctx)")
}

The trick to make the previous code work is defining the implementation of the simple debug method in DebugLogger and making BloopLogger extend DebugLogger. We have introduced a printDebug to avoid inter-dependencies between the two debug methods, as they will cause other reference errors.

Once we have defined the method we want to shadow in a super class of the class we want to support in our API (in this case BloopLogger), the logger implementations only need to define the enriched debug method.

Users of this API will not be able to call the simple debug unless they do an upcast to the third-party logger AbstractLogger. This is intended -- the goal is to have a good default, not to make it completely impossible to call the simple debug, so make sure that it still has a sensible implementation.

In Bloop's case, the third-party logger API is never exposed and I recommend doing so in your application or library if you can.

With the above code, compiling our simple MyApp fails compilation with a could not find implicit value for parameter ctx: DebugContext, which confirms us that Scala is successfully selecting the right method.

We can fix it by passing the context either implicitly or explicitly.

object MyApp {
  def main(args: Array[String]): Unit = {
    println(
      "Running demo application for https://jorge.vican.me/post/overload-methods-with-more-parameter-lists/")
    implicit val ctx: DebugContext = DebugContext.Ctx1
    val logger = new SimpleLogger
    logger.debug("This is a debug message")
  }
}

Complete Scastie Example

Conclusion

Overloading a method inherited from a third-party class is possible in Scala. It requires a little bit of gymnastics, but once we're familiar with the technique we can apply it in many other scenarios.

The same technique works with new explicit parameters (instead of implicit parameters). They key point is that we can overload methods by adding extra parameter lists to their definition, playing with the linearization order and defining the methods in the right place.

Profiling and reducing compile times of typeclass derivation

Profiling and reducing compilation times of typeclass derivation

I have recently published a blog post in the official Scala website about my recent work on scalac-profiling.

scalac-profiling is a new compiler plugin to complement my recent work on the compiler statistics/sampling infrastructure merged in Scala 2.12.5 and available from then on.

In the blog post I talk about compilation performance, typeclass derivation, the expensive price of derivation via implicits and how to scalac-profiling to speed up the compile times of a Bloop module by 8x.

Read more in scala-lang.