Migrating your data with sanity exec

Ever wished you chose a slightly different data structure for your structured content? Let's learn how to fix it!

Sanity is a great way to manage your content. You define your schema by focusing on the content, not how it's going to be presented. That makes your content super flexible when you're say, redesigning your site, or coming up with new ways to present it.

However, sometimes those redesigns and new platforms uncover a a new need, or you realize you didn't make the very best design decisions up front. It happens to the best of us.

But by the time you realize your schema shortcoming, you already have lots of content, and updating your fields manually isn't something you have the time (or patience) to do. So what are your options?

Changing your content in Sanity

Luckily, Sanity is made by developers who are just as bad at structuring data as you are. Therefore, they made two ways for you to safely change your data, without too much fuzz.

The first way to change your content does require you to halt any editing while your change your content. You download a copy of your dataset, change it in your editor however you like (like search replacing strings) and upload a new copy of your dataset. Any changes made in the meantime will be lost – but if you're the sole editor, or you're few enough to tell people to take a break for a few while you do your work, this is a totally viable approach.

You can read more about that approach in the docs, but this article is going to focus on the second, much more powerful way to change your content – sanity exec.

Meet sanity exec

Sanity ships with a very nice utility called sanity exec, which makes it possible to write and run scripts that change your dataset directly, without setting up a ton of boilerplate to authenticate first. You can change data while people are actively working on the content as well, and base your changes on GROQ lookups, for instance!

I'm going to share one of the migrations we did on this very site, and use it as an example.

Our use case

We have a lot of content on this site, both in English and Norwegian. We wanted a way to set the correct <html lang /> tags and make the content filterable by language. Unfortunately, we didn't have a field for that.

We started by adding a field to our schema type, like so:

{
  title: "Language",
  name: "language",
  type: "string",
  options: {
    list: [
      { title: "English", value: "en-US" },
      { title: "Norwegian (Bokmål)", value: "nb-NO" },
      { title: "Norwegian (Nynorsk)", value: "nn-NO" },
    ],
  },
}

So now we have a place to save this information, and all new content we create will require you to fill it out.

Creating the boilerplate

To get started, let's create a new file in your studio folder called called migrations/fill-out-languages.js, and give it some empty methods for us to fill out later.

import sanityClient from "part:@sanity/base/client";

const client = sanityClient.withConfig({ 
  apiVersion: '2021-08-21'
});

// Get the remaining documents you want to change
const fetchDocuments = () => null; // TODO

// Given the documents you want to change, 
// define what changes you want to perform
const buildPatches = () => null; // TODO

// Start saving to the database
const createTransactions = () => null; // TODO

// Commit changes to the database
const commitTransaction = () => null; // TODO

// Orchestrate the migration
const migrateNextBatch = () => null; // TODO

The strategy will be – create a GROQ query that specifies the n first elements we want to change, create the change we want to perform, perform that change, and loop this until our GROQ query doesn't return any elements.

Fetching documents

First, let's create a query that finds the first 100 posts that doesn't have a language specified.

const fetchDocuments = () => {
  return client.fetch(
    `*[_type == 'post' && !defined(language)][0...100] 
    { _id, _rev, 'plaintextContent': pt::text(content) }`
  );
}

If you're not very familiar with GROQ, there's a lot of great resources out there for you to learn it. I'd recommend reading the official documentation and the GROQ cheat sheet!In this query, I specify that I want all documents that has the type post, and that has a language field that is not defined. I then limit it to the first 100 records (that's the [0...100] part), and fetch the two fields _id, _rev and the plaintext content of our post.

Note - if you haven't seen the pt::text function before, that's fine - it's not a widely used feature of GROQ. You can read more about it (and similar functions) here.

Creating the change set

The next thing we want to implement, is to specify what changes we want to do with all of those documents we want to change. We do that by turning our documents into objects, with a predefined set of fields. Let's look at the code.

const buildPatches = (docs) => {
  return docs.map((doc) => ({
    id: doc._id,
    patch: {
      set: {
        language: figureOutLanguage(doc.plaintextContent)
      },
      ifRevisionID: doc._rev
    }
  }));
}

We set the ID of the document we want to change, and specify the change – or patch – we want to apply to that document.

In this particular case, we want to set the language field to the correct value. You can also unset fields, by passing them as an array to the unset property.

We also specify the revision ID, which makes the transaction retry if a document has been changed since it was fetched. This is pretty cool, and makes these kinds of changes safe to do in a live studio environment with active editors. If somebody just edited a piece of content in the second between fetching records and processing them, we'll just try again the next round of fetching documents!

You might have been wondering how that figureOutLanguage function works. In our particular use case, I was able to just write a "simple" regular expression checking for the Norwegian special characters æ, ø and å. If they were found, we assume that the text is in Norwegian, and if they were not, we assume the content is in English. This won't be a 100 % match, but it will match enough records that a quick manual scan of all titles and languages will let us find the mismatches quickly enough. I asked Twitter what their preferred methods were, and got some really interesting approaches – like using the Google Translate APIs, or searching for the most common words – and if I hadn't done the migration by then, I probably would have followed their advice instead. However, quick and dirty did the job!

Migrating records recursively 😱

The next function we want to implement is the one that will orchestrate calling the functions we've implemented so far, until we're out of records.

Let's look at some code:

const migrateNextBatch = async () => {
  const documentsToUpdate = await fetchDocuments();
  if (documentsToUpdate.length === 0) {
    console.info("Migration is finished!")
    return null;
  }
  
  const patches = await createPatches(documentsToUpdate);
  
  console.info(`Migrating ${documentsToUpdate.length} document(s)…`);
  
  const transaction = createTransaction(patches);
  await transaction.commit();
  return migrateNextBatch();
}

We fetch the documents, check if there are more documents to migrate, and if not, exit.

If there are more documents left to migrate, we create the changes (or patches) we want to run.

Now, we have to create what's called a transaction. A transaction is a way to bundle several changes together, so that if one of them fail, we roll back the changes of all the changes bundled together. It's an old school database concept, so it's fine if you haven't encountered it before.

Now, to create these transactions, we are going to create a transaction through the sanity client, add all our changes to that transaction, and then return our transaction. There are several ways to do that, but I'm going to use a for loop.

const createTransaction = (patches) => {
  const transaction = client.transaction();
  for (const patch in patches) {
    transaction.patch(patch.id, patch.patch);
  }
  return transaction;
}

Then, when we're done creating this transaction, we can wait for all of these transactions to finish by awaiting transaction.commit().

Note, that if you ever want to do a "dry run" of your migration (you probably should), you only need to comment out the transaction commit part of your code!

All together now

That's it - you're basically finished! Here's the complete script:

import sanityClient from "part:@sanity/base/client";

const client = sanityClient.withConfig({ 
  apiVersion: '2021-08-21'
});

// Get the remaining documents you want to change
const fetchDocuments = () => {
  return client.fetch(
    `*[_type == 'post' && !defined(language)][0...100] 
    { _id, _rev, 'plaintextContent': pt::text(content) }`
  );
}

// Given the documents you want to change, 
// define what changes you want to perform
const buildPatches = (docs) => {
  return docs.map((doc) => ({
    id: doc._id,
    patch: {
      set: {
        language: figureOutLanguage(doc.plaintextContent)
      },
      ifRevisionID: doc._rev
    }
  }));
}

// Start saving to the database
const createTransaction = (patches) => {
  const transaction = client.transaction();
  for (const patch in patches) {
    transaction.patch(patch.id, patch.patch);
  }
  return transaction;
}

// Commit changes to the database
const commitTransaction = (transaction) => transaction.commit();

// Orchestrate the migration
const migrateNextBatch = async () => {
  const documentsToUpdate = await fetchDocuments();
  if (documentsToUpdate.length === 0) {
    console.info("Migration is finished!")
    return null;
  }
  
  const patches = await createPatches(documentsToUpdate);
  
  console.info(`Migrating ${documentsToUpdate.length} document(s)…`);
  
  const transaction = createTransaction(patches);
  await commitTransaction(transaction);
  return migrateNextBatch();
}

// Finally, let's kick off our migration!
try {
  await migrateNextBatch();
} catch (e) {
  console.error("Something failed", e);
}

Running the migration

We've now written a great migration script. But how do we run it? Luckily, that's the easy party.

$ sanity exec migrations/fill-out-languages.js --with-user-token

With any luck, this will run successfully, and data will be updated without disturbing your editors. Pretty cool!

Changing your data in Sanity shouldn't be a scary thing to do, and luckily, it isn't. The sanity exec method lets us create really nice looking migration scripts that in large part are reusable between migrations.

Thanks for reading!