8-Dec

Test data generators in Kotlin

Not too long ago I came across this pattern that helped me with a longstanding problem I have had when writing tests: generating valid test data without spending half of your day instantiating domain models. Now, I’ve had this idea before, but I’ve never actually ended up testing it. That changes today, because from now this is easily my go to method of creating valid test data.

5 min read

By Bendik Solheim

December 8, 2023

Just looking for the actual pattern? Click here to open an interactive Kotlin playground!

Now, there are of course many ways of generating test data, and I believe I’ve tried most of them in the search of the perfect one. They all have their problems though, and while I won’t deny that this method comes with some caveats as well, so far it seems this is easily outweighed by it’s advantages. The biggest advantages, as I see it, are:

My tests are as free from setup code as possible. The focus is on testing actual behavior.
I don’t have to spend time coming up with valid values in my tests. This is solved elsewhere, and my head can focus on validating behavior.

Before we start, we need a couple of classes to test. I know it’s a cliche, but let’s model a person with contact details:

data class Person(
  val name: String,
  val age: Int,
  val contactDetails: ContactDetails
) {
  companion object // needed for generators later
}

data class ContactDetails(
  val email: String,
  val phoneNumber: String
) {
  companion object // needed for generators later
}

Not too complex, but still: instantiating a new Person for each and every test get’s tiresome

val person = Person(
  name = "Random Name",
  age = 37,
  contactDetails = ContactDetails(
    email = "a@random.email",
    phoneNumber = "+4799999999"
  )
)

The problem with this becomes apparent when you have to write ten tests in a row, and each of them requires one or more Person instances. How do you come up with plausible, but different, values each time? Or do you copy you setup code around, so you always test with the same values? Both are valid, but bothersome and/or problematic approaches.

Let’s instead go completely crazy and use randomness! With the help from two Kotlin features we can end up with a nice, readable API. Let’s write a random `Person` generator!

fun ContactDetails.Companion.generate(random: Random): Sequence<ContactDetails> = sequence {
  while (true) {
    val localPart = String.generate(random, minLength = 1, maxLength = 20)
    val domainName = String.generate(random, minLength = 1, maxLength = 20)
    val tld = String.generate(random, minLength = 1, maxLength = 5)
    val email = "$localPart@$domainName.$tld"
    val phoneNumber = "+47" + (1..8).map { random.nextInt(1, 10) }.joinToString("")
    yield(ContactDetails(
      email = email,
      phoneNumber = phoneNumber
    ))
  }
}

fun Person.Companion.generate(random: Random): Sequence<Person> = sequence {
  while (true) {
    val firstName = String.generate(random, minLength = 3, maxLength = 10)
    val lastName = String.generate(random, minLength = 3, maxLength = 10)
    yield(Person(
      name = "$firstName $lastName",
      age = random.nextInt(1, 100),
      contactDetails = ContactDetails.generate(random).first()
    ))
  }
}

String.generate is left out for the sake of simplicity. There are many ways to implement it, and if you need some inspiration you can check here.

Now, there are some things going on here, so let’s go through and explain it. Both functions follows the same pattern where you extend the Companion object with a function called `generate`, which accepts and instance of a Kotlin Random. You could let the function itself create an instance of Random, but injecting it gives you the control: instead of injecting a Random.Default, you can seed it with Random(mySeed) and get predictable values.

A generator returns a Sequence. Sequences in Kotlin are lazy, so even though you can spot a while (true) in there this is not an endless loop: if you ask for one instance, you generate one, and if you ask for ten, you generate ten.

Now, generating the actual values is where things get really interesting. There is not really a right or wrong here, it all depends on your domain. Can an age be between 0 and 100? Does your system only handle ages from 18 and above? Is a negative age OK? Do you care about the content of a name at all, or is «@17'!-_y» perfectly fine? The values you generate should be valid in your domain, and what’s valid is different from system to system.

With these generators in place, you can generate test data this way:

val ten_people = Person.generate(random).take(10).toList()

OK cool, but I sometimes want specific values and you only give me random ones >:|

Fear not, I’ve got something for you here as well. There are multiple ways of achieving this. Let’s say we want a random person, but we need the age to be exactly 40. If you have a data class, you could of course use the copy function:

val fortyYearOldPerson = Person.generate(random).take(1).copy(age = 40)

But you don’t always have data classes. Also, it’s not very elegant or cool. Instead, you could add an optional parameter to your generator:

fun Person.Companion.generate(random: Random, age: Int? = null): Sequence<Person> = sequence {
  while (true) {
    val firstName = String.generate(random, minLength = 3, maxLength = 10)
    val lastName = String.generate(random, minLength = 3, maxLength = 10)
    yield(Person(
      name = "$firstName $lastName",
      age = age ?: random.nextInt(1, 100),
      contactDetails = ContactDetails.generate(random).first()
    ))
  }
}

Both of these works just fine in one-off situations and if you only need to override a small number of properties. But it quickly gets tiresome if you need specific values for multiple properties. To fix this, we can use another Kotlin concept: function literals with receiver. Let’s rewrite our generators based on this pattern!

class ContactDetailsSpec(random: Random) {
  private val localPart = String.generate(random, minLength = 1, maxLength = 20)
  private val domainName = String.generate(random, minLength = 1, maxLength = 20)
  private val tld = String.generate(random, minLength = 1, maxLength = 5)
  var email = "$localPart@$domainName.$tld"
  var phoneNumber = "+47" + (1..8).map { random.nextInt(1, 10) }.joinToString("")
}

fun ContactDetails.Companion.generate(random: Random, init: ContactDetailsSpec.() -> Unit = {}): Sequence<ContactDetails> = sequence {
  val spec = ContactDetailsSpec(random)
  spec.init()
  
  while (true) {
    yield(ContactDetails(
      email = spec.email,
      phoneNumber = spec.phoneNumber
    ))
  }
}

class PersonSpec(random: Random) {
  private val firstName = String.generate(random, minLength = 3, maxLength = 10)
  private val lastName = String.generate(random, minLength = 3, maxLength = 10)
  var name = "$firstName $lastName"
  var age = random.nextInt(1, 100)
  var contactDetails = ContactDetails.generate(random)
}

fun Person.Companion.generate(random: Random, init: PersonSpec.() -> Unit = {}): Sequence<Person> = sequence {
  val spec = PersonSpec(random)
  spec.init()
  
  while (true) {
    yield(Person(
      name = spec.name,
      age = spec.age,
      contactDetails = spec.contactDetails.first()
    ))
  }
}

Each generator gets its own "spec" (or specification, if you want. Abbreviations ftw.) helper class which holds the random generated values, and the generators are extended with a... weird (?) looking parameter. If you want to understand this weird syntax, the Kotlin documentation is an excellent guide. If not, just think of it as a lambda parameter that get’s called on an instance of a specific type. With this setup, we can generate test data with either completely random values, or the specific ones we need.

val completelyRandomPerson = Person.generate(random)

val fortyYearOldPerson = Person.generate(random) {
  age = 40
}

val fortyYearOldPersonWithPhoneNumber = Person.generate(random) {
  age = 40
  contactDetails = ContactDetails.generate(random) {
    phoneNumber = "+4711111111"
  }
}

And there you have it!

A closing note about tests and randomness

Before I started writing this blog post, I had a talk with a colleague of mine about the concept. His immediate response was somewhere along the lines of «wait what, you use random data in your tests??».

And his reaction was probably justified. Unit tests and randomness aren’t exactly the two most likely friends. You want predicable tests, and randomness isn’t exactly known to be predicable.

There are, however, a few reasons why I believe this won’t be an actual problem.

First of all, it’s important that you take care when you implement your generators. The values should probably not be _completely_ random. If you have strict requirements and validation in place to enforce certain values at the boundaries of your application, you need to account for this when you generate data for the _insides_ of your boundaries. _You_ control the amount of randomness, and can even model dependencies between properties in your generators if you need to. If you take care and write good quality generators, the randomness should be an assurance, and not a source of problems.

Second, there are ways of making those rare build failures actually useful. In our current code base, we have created a JUnit extension which can be added to a test file. This extension generates a new Kotlin `Random`, and stores the seed in a file. This way, if a test fails, you have the actual seed that made it fail and can consistently replicate the error locally. BAM. Your random build errors just turned useful.