FactoryBot with an External API Data Source

No Model, No Problem!

Header image by Alexander Sinn on Unsplash

Rails and RSpec are great, however… fixtures are, well, fixed. While they can be great for one-off stuff that doesn’t change, working with an unruly set of fixtures feels cumbersome compared to the freedom that comes with factories. FactoryBot allows great flexibility while using your models to generate nice little sets of test data that can be defined inline with the tests. There is more up front setup, but the pay-off is like picking that perfect piece of low hanging fruit that fills your soul with its sweet juice. Maybe I drink too much FactoryBot Kool-Aid. But what happens when there is no model or class to build a factory from?

Where was I… oh right, I was recently working on a project where we had an application that connected to a rather large GraphQL API to gather and translate the data. Then it would store some of it and send some to different applications that need to consume the data. We started with fixtures but that quickly made me curse the software engineering gods. I had too many use cases and special data situations I needed to validate.

None of these data points from the API had any class or model representation in our code as it was all transitional data. Because of this a FactoryBot solution wasn’t immediately used or easy to transition to, but it is definitely what I wanted to use.


Refactor All the Things!

Feeling the need for a refactor, I iterated through several different solutions from creating simple inline classes to more complicated solutions, none of it felt right. There was a lovely write-up from thoughtbot that covered how non-ORM backed data structures could be accomplished, but I wanted an easier way to maintain the attributes since we didn’t control the API. I also wanted something that would be usable by any JSON API (or really any kind of API data structure).

Given these constraints and the constraints of the glorious FactoryBot, some patterns emerged. If you want to look at the code in this example, you can check it out here on github.


The API Data

Let's start with the data coming from the external source.

-- CODE language-json --{
 "data": {
   "user": {
     "emails": [
       {
         "email": "monkey@banana.com",
         "is_primary": true
       },
       {
         "email": "monkey@ape.com",
         "is_primary": false
       }
     ],
     "id": "12345",
     "name": "Monkey"
   }
 }
}

Nothing special here, the piece to point out is how the API is storing email addresses. What if our internal user data structure has one email per user. We would need to translate this. So what might that test look like?

The Tests

After evaluating the external data source, I had an idea of what the test should look like in an ideal world. I wanted the tests to feel like the data is being used like any other factory in the system.

-- CODE language-ruby line-numbers --RSpec.describe User, type: :model do
 describe '.import' do
   subject { described_class.import(user) }
   let(:email_primary) { FactoryBot.build(:api_email, is_primary: true) }
   let(:email) { FactoryBot.build(:api_email, is_primary: false) }
   let(:emails) { [email, email_primary] }
   let(:user) { FactoryBot.build(:api_user, emails: emails) }
   it { is_expected.to be_an_instance_of(User) }
   it 'creates a record' do
     expect { subject }.to change(User, :count).by 1
   end
   it 'set email' do
     expect(subject.email).to eq(email_primary.email)
   end
   it 'set name' do
     expect(subject.name).to eq(user.name)
   end
 end
end

In the RSpec describe block on line 1 you can see we are testing our User Model and then the class level method import. Because of this the setup data will all be the expected incoming data from the external source.

The `api_user` and `api_email` factories on lines 4 - 7 are where the data structures are built. Using `FactoryBot.build` over `create` here since the data is being passed into the import method. Not only that, there is nowhere to persist this data.

These factories give us some powerful tools for our tests. First they let us specify the exact data we need to focus on (primary email), while letting the less important data be defined for us (name).

This would allow us to easily come up with different tests where there may not be a primary email address, or no email addresses at all.

The Factories

Here are the happy little factories.

-- CODE language-ruby line-numbers --FactoryBot.define do
 factory :api_user, parent: :json_base do
   json_reference_string do
     '
       {
         "emails": [],
         "id": "397542",
         "name": "Monkey"
       }
     '
   end
   sequence(:id, &:to_s)
   emails { [association(:api_email)] }
   name { Faker::Name.name }
 end
end


-- CODE language-ruby line-numbers --FactoryBot.define do
 factory :api_email, parent: :json_base do
   json_reference_string do
     '
       {
         "email": "monkey@banana.com",
         "is_primary": true
       }
     '
   end
   email { Faker::Internet.safe_email }
   is_primary { true }
 end
end

While these mostly look like regular factories there are three main differences. The most glaring difference is the definition of a `json_reference_string` on line 3 of both. While it does look like any other attribute, this one is special. This is telling the factory what is valid and what isn’t valid for this factory.

The reason I wanted this reference string is so I could very easily copy and paste from Insomnia or whatever JSON structure and paste it here. This allows for a quick change of the definition based on any API changes.

The next difference, which is not apparent here, is that there is no backing `APIUser` or `APIEmail` model/class. Which leads to the last important piece.

Establish the factory as a JSON factory by setting its parent to `json_base` on line 2 of both. This base is what allows the `json_reference_string` to be special and fills the need for a model.

All Your Base are Belong to Us

This is where the magic happens.

-- CODE language-ruby line-numbers --FactoryBot.define do
 factory :json_base do
   skip_create
   initialize_with { new(attributes, json_reference_string) }
   transient do
     json_reference_string { nil }
   end
 end
end

Starting at the top, the parent factory has some special FactoryBot built-in stuff going on. First on line 3 is the `skip_create`. Since our class doesn’t have a `save` method as there is nowhere to persist, this tells FactoryBot not to try and call it. Of course we could implement a `save` method, but that since we can skip it, let’s do that.

The `initialize_with` on line 4 defines how the JSONBase class will be called. We will use `new` passing in the attributes and the `json_reference_string`. FactoryBot gives us all the defining attributes in a nice little hash. That said, it is the transient wrapper around `json_reference_string` on line 5 that pulls that one out of the attributes hash and allows it to be passed by itself to JSONBase. While there are other ways to do this, I think this is the clearest to get across intent. It also allows us to assign a default value of nil which we will use in a bit.

-- CODE language-ruby line-numbers --class JSONBase
 def initialize(args, json_reference_string)
   raise JSONBaseError, 'Missing json_reference_string' if json_reference_string.nil?
   @keys = JSON.parse(json_reference_string).keys
   undefined_attributes = args.keys - keys.map(&:to_sym)
   raise JSONBaseError, "#{undefined_attributes} not defined in json_reference_string" if undefined_attributes.present?
   keys.each do |key|
     instance_variable_set("@#{key}", args.fetch(key.to_sym, nil))
   end
 end
 attr_reader :keys
 def method_missing(method)
   return instance_variable_get("@#{method}") if respond_to_missing?(method)
   raise NoMethodError, method
 end
 def respond_to_missing?(method, _include_all = nil)
   keys.include?(method.to_s)
 end
end
class JSONBaseError < StandardError; end

Moving onto JSONBase, the initializer on line 2 takes in the attributes and `json_reference_string`. The first thing we do is verify there is a reference string on line 3, otherwise this class is useless.

Next we set our local keys instance variable to the keys in the reference string on line 4. Yes, the values of those keys are never used. They are there in the child factories for quick copy and pasting from our source JSON data. In this class we throw them out as we are really only concerned with the keys. The attributes in the child factories will set the values.

Since the `json_reference_string` is the source of truth for what is allowed, we verify that no additional keys are defined in the attributes being passed in on lines 5 & 6.

The last thing we do in the initializer is crawl over the keys setting all the attributes on lines 7 - 9. If anything is not in the attributes we default to nil.

Now… I usually avoid meta programming like I avoid technical interviews (the plague… THE PLAGUE!). I am usually the first to say, “there are better ways to do this,” like my teacher Mrs. MacNemare used to do with her finger shaking at me. However there are always exceptions (how perfectly self-serving of me), for instance libraries, which I consider this an extension of FactoryBot.

Anyways... we implement method_missing on line 12, but I don’t want to just let any method calls, I only want to implement getters for the attributes defined in the `json_reference_string`. Using the `respond_to_missing?` method to only allow those keys.

There is a lot going on here, but I put the factory and the JSONBase class in one file so it is self-contained. That way any adjustments can be made here. Also I didn’t want to worry about where to put or how to load the JSONBase class since it is only used for this and should live somewhere in the spec directory.

But, But, OpenStruct

But Kevin… Why didn’t you just use OpenStruct. It could solve all the missing method stuff without having to create your own custom class. And since we are using OpenStruct to convert the JSON into a usable object in our production code, that would make sense. But a great darkness lies down that path.

OpenStruct won’t tell me if I do something wrong. If I add a variable to the attributes of a factory that doesn’t belong, OpenStruct will be like, “yup, I like that”.

Second, and far worse, OpenStruct will respond to anything. So if I typed user.supercalifragilisticexpialidocious it would be like “yup, that is nil”. While this may not seem like a big deal, if someone mis-types `emails` for `email` and it only ever returns nil, that is a bigger deal.


Enhance

On top of what you see above I have added the following features which you can find in the example repo. This includes things like additional allowed attributes for duck typing. How to handle factory testing with `FactoryBot.lint`. Allowing both object and hash access to attributes. Lastly camel and snake casing for the `json_reference_string`.

It would be nice to use the GraphQL interface to establish the available params so everything is automatically kept up to date. I would need to make something that is specific to that particular interface then. But something to possibly explore in the future.


Conclusion

This is one solution to build test data using a familiar system like FactoryBot while there are no backing classes/models for the data structure. It requires some setup, but the pay-off is a more flexible system that can grow as the involved applications change over time. Hopefully it is straightforward enough that it is easily accessible for all to adopt.