22-Dec

Development

Gather, Aggregate, Decide, Act

8 min read

·

By Simen Endsjø

·

December 22, 2022

Most software processes can be distilled down to a couple of discrete steps:

  1. Gather data
  2. Aggregate to something manageable
  3. Examine the data and decide what actions must be taken
  4. Execute the operations

These steps, or phases, are true for both the smallest and largest processes. A process here is defined as “a series of actions or steps taken in order to achieve a particular end,” and this description holds true for both small pure functions and large business workflows.

More often than not, people don’t actually reflect on these issues and just interleave all the steps. There are, however, good reasons why you might want to extract these to distinct phases, or at least identify them and make use of their properties.

Before we go into details for the steps, we look at how a typical function might be constructed and identify where the boundaries between each steps.

  void sendVotingReminder(int userId, string whenCannotVote) {
    // userId and message is part of the gather phase, but we also need some more data
    var user = getUser(userId);

    // We might also need to calculate some data
    var name = user.firstName + " " + user.lastName;

    // Then we need to decide what to do
    string msg;
    if (user < 18) {
      // We use null as a "don't send anything" message
      msg = null;
    } else if (!user.canVote) {
      msg = "Hi " + name + " " + whenCannotVote;
    } else {
      msg = "Hi " + name + ". Remember to vote!";
    }

    // Now decisions have been made, we need to take action
    if (msg == null) {
      return;
    } else {
      sendMail(msg);
    }
  }

I wrote the function that way to make each step easily distinguishable, although people tend to shorten functions to avoid redundancy and rather writes something like the following.

  void sendVotingReminder(int userId, string whenCannotVote) {
    var user = getUser(userId);
    if (user >= 18) {
      sendMail("Hi " + user.firstName + " " + user.lastName + " " +
              (user.canVote ? ". Remember to vote!" : whenCannotVote));
    }
  }

While it’s a lot terser, the different steps are interleaved, making it more difficult to extract just a portion of the process, identifying the business rules, and so on. As the process becomes more complex, this can often result in difficult to understand code.

Now that we’ve looked at a concrete example, it’s time to describe the steps in more detail.

Gather/collect data: The process of retrieving all the data you need in order to decide what to do. These values can come from function parameters, or they can be fetched from external sources as databases and remote APIs. For many processes, it can even be a combination of data from several sources.

Aggregate/calculate data: When we have all the data we need, we need to massage it to make it more manageable by combining datasets, filtering, converting, cleaning, calculating and so on. We do this to make the next step easier easier to read, write and reason about.

Decide: Once data is in a format that’s easy to process, we can look at it and decide what to do. This is the business rules, e.g. “if this then that”. This is the actual important part, while the other steps is just necessary cruft to make the decision possible and to put it in effect.

Act/execute: Given that we have decided what to do, we need to actually perform the operation. This typically involves writing to databases, sending emails, calling external APIs and so on.

Describing the above steps should come as no surprise, and most experienced developers will probably go “well, duh!”. Many will probably also state that it’s not that simple in the real world as we need to do error handling, performance optimization and so on – And I totally agree. This post is to reflect on the distinct phases in a process and their properties to help us develop, refactor and test such processes.

We could describe any process with the following function:

  let makeProcess (gather : 'a -> 'b) (aggregate : 'b -> 'c) (decide : 'c -> 'd) (act : 'd -> 'e) : 'a -> 'e =
    gather >> aggregate >> decide >> act
: let makeProcess (gather : 'a -> 'b) (aggregate : 'b -> 'c) (decide : 'c -> 'd) (act : 'd -> 'e) : 'a -> 'e =
:     gather >> aggregate >> decide >> act;;
: val makeProcess:
:   gather: ('a -> 'b) ->
:     aggregate: ('b -> 'c) ->
:     decide: ('c -> 'd) -> act: ('d -> 'e) -> ('a -> 'e)

While most type systems don’t allow us to describe many properties of these functions, we can discuss them in prose.

gather: Often a mix of pure and impure; things that is already fetched (like parameters) are pure, while things we need to fetch from other sources is impure. In general, we’ll say that this step is impure. On the other hand, it will never mutate any data, only read data. You should be able to call this function with all possible parameters, ignoring all results, and the world still looks exactly the same.

aggregate: Combines values to a more actionable format. Given the same data, it will always return the same result. The step is pure, and can be memoized for instance. This is why I like to think of gather and aggregate as distinct phases. Pure functions are easy to reason about and test, so the more you’re able to encode as pure functions, the better.

decide: Only looks at the aggregated data and never writes any data, and is thus a pure function. This is also where most domain logic/business rules reside. Reducing this to a single pure step makes it trivially testable. Nothing has to be mocked, and the core of the domain becomes very understandable. As this is the main part that is of interest to the outside, keeping it pure, separate, tested and documented is great for communicating with users of the system.

act: Performs the decided operations and is definitely not pure. This is the only part of the process which mutates data. It will only use data which is added by the prior decision step, and it will execute the effect.

To summarize: gather : queries the world, no effects on the world aggregate: pure – no contact with the world decide: pure – no contact with the world act: doesn’t query the world, only executes decided effects, no side-effects

Testing gather requires us to mock the sources it fetches data from. But it might be easier to test gather >> aggregate rather than gather alone, and that’s fine – testing aggregate alone doesn’t always give much benefit. Similarly, testing act requires us to mock the sources which is mutated. Testing decide is “trivial” as it doesn’t read or write to the outside world.

Since aggregate and decide are pure functions, you might just have one function which does both, or you might not have any of them at all… A function which doesn’t do anything and just returns the value passed into it is called the identity function. We can use this to “skip” steps where we don’t need to look at or change the data

You can run gather >> aggregate >> decide until hell freezes over, and you won’t have had any effect on the world! This is a really nice property.

Let’s look at some silly examples to show that our makeProcess is able to describe regular functions.

// We decide that + should be performed
let myadd = makeProcess id id (+) id
myadd 1 2 // Returns 3
  let perform = makeProcess id id id
  let add a b = a + b
  perform add 1 2 // Returns 3
  let mysum = makeProcess id id id (List.fold (+) 0)
  mysum [1 .. 3] // Returns 6
  let const' x = makeProcess (fun _ -> x) id id id
  let const'' x = makeProcess id (fun _ -> x) id id
  let const''' x = makeProcess id id (fun _ -> x) id
  let const'''' x = makeProcess id id id (fun _ -> x)
: let const' x = makeProcess (fun _ -> x) id id id
:   let const'' x = makeProcess id (fun _ -> x) id id
:   let const''' x = makeProcess id id (fun _ -> x) id
:   let const'''' x = makeProcess id id id (fun _ -> x);;
: val const': x: 'a -> ('b -> 'a)
: val const'': x: 'a -> ('b -> 'a)
: val const''': x: 'a -> ('b -> 'a)
: val const'''': x: 'a -> ('b -> 'a)

Let’s look at how we could split these steps out of sendVotingReminder, but first we need to convert it to F#.

  let sendVotingReminder (userId : int) (whenCannotVote : string) =
    // gather
    let user = getUser userId

    // aggregate
    let name = user.firstName + " " + user.lastName;

    // decide
    let msg =
      if (user < 18)
      then null
      else if (not user.canVote)
      then sprintf "Hi %s %s" name whenCannotVote
      else sprintf "Hi %s. Remember to vote!" name

    // act
    if (isNull msg)
    then ()
    else sendMail msg

Encoding the possible decisions as a closed set is good both for documentation and robustness.

  type Action =
    | NoActionBecauseUserTooYoung
    | SendCannotVoteMessage of message : string
    | SendReminder of message : string

Remember that act shouldn’t query the outside world, so the only information it has available is what is available in Action. We could drop the NoActionBecauseUserTooYoung by using an Option if we need 0 or 1 action, or support 0 to many actions by returning a list of actions.

Sometimes it makes sense to let aggregate return more information to decide like the fact that a user is too young. But having a “no-op” case is often a very useful feature (like the NullObject pattern in OOP, the identity function or empty for Monoid), so we’ll leave it in.

  let sendVotingReminder (userId : int) (whenCannotVote : string) =
    // gather
    let user = getUser userId

    // aggregate
    let name = user.firstName + " " + user.lastName;

    // decide
    let action =
      if (user < 18)
      then NoActionBecauseUserTooYoung
      else if (not user.canVote)
      then SendCannotVoteMessage (sprintf "Hi %s %s" name whenCannotVote)
      else SendReminder (sprintf "Hi %s. Remember to vote!" name)

    // act
    match action with
    | NoActionBecauseUserTooYoung ->
      ()
    | SendCannotVoteMessage message ->
      sendMail msg
    | SendReminder message ->
      sendMail msg

We can start by creating inner functions for the parts we wish to extract

  type Gathered =
      { user : User
        whenCannotVote : string
      }

  type Aggregated =
      { user : User
        whenCannotVote : string
        fullname : string
      }

  let sendVotingReminder (userId : int) (whenCannotVote : string) =
    let gather (userId : int) (whenCannotVote : string) : Gathered =
      { getUser userId; whenCannotVote }

    let aggregate (gathered : Gathered) : Aggregated =
      let name = user.firstName + " " + user.lastName;
      { gathered.user; gathered.whenCannotVote; name }

    let decide (aggregated : Aggregated) : Action =
      if (aggregated.user < 18)
      then NoActionBecauseUserTooYoung
      else if (not aggregated.user.canVote)
      then SendCannotVoteMessage (sprintf "Hi %s %s" aggregated.user.fullname aggregated.user.whenCannotVote)
      else SendReminder (sprintf "Hi %s. Remember to vote!" aggregated.user.fullname)

    // act
    let act (action : Action) : unit =
      match action with
      | NoActionBecauseUserTooYoung ->
        ()
      | SendCannotVoteMessage message ->
        sendMail msg
      | SendReminder message ->
        sendMail msg

    gather userId whenCannotVote
    |> aggregate
    |> decide
    |> act

This is still the same function, and we can now reduce it to just its parts

  type Gathered =
      { user : User
        whenCannotVote : string
      }

  type Aggregated =
      { user : User
        whenCannotVote : string
        fullname : string
      }

  type Action =
    | NoActionBecauseUserTooYoung
    | SendCannotVoteMessage of message : string
    | SendReminder of message : string

  let gather (userId : int) (whenCannotVote : string) : Gathered =
    { getUser userId; whenCannotVote }

  let aggregate (gathered : Gathered) : Aggregated =
    let name = user.firstName + " " + user.lastName;
    { gathered.user; gathered.whenCannotVote; name }

  let decide (aggregated : Aggregated) : Action =
    if (aggregated.user < 18)
    then NoActionBecauseUserTooYoung
    else if (not aggregated.user.canVote)
    then SendCannotVoteMessage (sprintf "Hi %s %s" aggregated.user.fullname aggregated.user.whenCannotVote)
    else SendReminder (sprintf "Hi %s. Remember to vote!" aggregated.user.fullname)

  let act (action : Action) : unit =
    match action with
    | NoActionBecauseUserTooYoung ->
      ()
    | SendCannotVoteMessage message ->
      sendMail msg
    | SendReminder message ->
      sendMail msg

  let sendVotingReminder = makeProcess gather aggregate decide act

Just looking at the types, we can pretty much guess what’s going on. It’s pretty easy to describe decide to business users, and pretty easy to test in isolation. It’s actually pretty easy to test each part in isolation as necessary if the impure steps accepts functions for communication with their dependencies.

We’ll look at a final example with just the end result. We create an API which returns dummy data for our example.

  type User =
      { userId: int
        firstName: string
        lastName: string
      }

  type Profile =
      { address: string
      }

  type Post =
      { userId: int
        published : DateTime
      }

  let getUser (userId : int) : User =
      { userId    = userId
        firstName = sprintf "firstname %d" userId
        lastName  = sprintf "lastname %d" userId
      }

  let getProfile (userId : int) : Profile =
      { address = sprintf "address for %d" userId }

  let getPosts () : Post list =
      [
          { userId = 1
            published = DateTime.Today
          }
      ]

Now we’re ready to build our process, and the first step is to gather all the data needed.

  let gather (userId : int) =
    let user = getUser userId
    let profile = getProfile userId
    let posts = getPosts ()
    (user, profile, posts)

After all data is gathered, we need to process it. It is often useful to create a new structure to hold our information. This is pure, so given the same arguments, it will always return the same result, and it will never have any effects on the outside world.

  type TodayDigestInfo = { userId: int; fullname: string; address: string; numBlogsToday: int }
  let aggregate ((user, profile, blogs) : (User * Profile * Post list)) =
    { userId = user.userId
      fullname = sprintf "%s, %s" user.lastName user.firstName
      address = profile.address.ToUpper()
      numBlogsToday = blogs |> Seq.filter (fun b -> b.userId = user.userId && b.published.Date = DateTime.Today) |> Seq.length
    }

When we have our data, we’re ready to make decisions about what to do. Making the decision, the important business logic, is pure, and all possible outcomes are typed in the result of the function.

  type Action =
    | SendCongratulationCard of name : string * address: string * message : string
    | ShameUser of userId : int * why : string

  let dailyDigest (info : TodayDigestInfo) : Action =
    if info.numBlogsToday = 0
    then ShameUser (info.userId, "Booo. You didn't write any posts!")
    else SendCongratulationCard (info.fullname, info.address, (sprintf "You wrote %d posts" info.numBlogsToday))

Pure functions doesn’t actually “do” anything, so given our decisions, we need to modify the world. Everything we need to execute the decisision should be stored in the data passed to our execute function from the decision.

  let executeAction (action : Action) =
    match action with
    | SendCongratulationCard (name, address, message) ->
        sprintf "UPS.sendCard %A %A %A" name address message
    | ShameUser (userId, why) ->
        sprintf "Shaming %A -- %A" userId why

And finally, we’ll create our process. Our process will then have the type userId: int -> actionResult: string

  let sendDailyDigest = makeProcess gather aggregate dailyDigest executeAction

Let’s test our code

sendDailyDigest 1
: sendDailyDigest 1;;
: val it: string =
:   "UPS.sendCard "lastname 1, firstname 1" "ADDRESS FOR 1" "You wrote 1 posts""
sendDailyDigest 2 // "Shaming 2"

All this might look like complete overkill, and in many cases it is. But recognizing that processes, from the smallest + function, to the largest business processes, all share the same general steps with the same properties is powerful knowledge. It makes it easier to extract parts that can be reused by other processes, parts that should be tested more thoroughly and so on.

In many cases you only want to extract a single part for some reason, like the business logic. The important thing is to remember that these are common boundaries that are often quite natural to extract and often yields some benefits as processes becomes more complex. Just having these distinct blocks in functions can be beneficial as it’s easier to reason about and reduces spaghetti code.