Using DynamoDB for Dynamic Configuration

Kevin Wu

Software Engineer

people reacted
5-minute Read,

Posted on March 28, 2019

When it comes to coding, who doesn't love reliability and flexibility? Here's how we use dynamic configuration to improve performance without extra operational costs.

Deploying code has a cost. It can take time, may require application restart, and (hopefully) has safeguarding processes that add friction. That’s where dynamic configuration comes into play. As the name suggests, dynamic configuration is the ability to change the behavior of a system on the fly. This is incredibly useful for things like feature flags, dev-ops switches, network routing, and customizing behavior for different customers. Most companies will have dynamic configuration of some kind, and some even develop their own systems for it such as Netflix’s Archaius or Twitter’s ConfigBus.

In this blog post, I’ll talk about one particular tool we’ve used for dynamic configuration since very early on at Amplitude. We lovingly refer to it as DynConf.

What is DynConf?

At its core, DynConf is a wrapper around a DynamoDB table call that supports getting string key-value pairs. The wrapper simply adds a layer of local caching with a periodic refresh on any fetched keys, and also provides some type casting and defaulting of the fetched string values.

Besides the fantastic wordplay opportunity, there were several reasons for implementing DynConf this way:

  1. Reliability: Because it’s just a wrapper, DynConf inherits all the robustness of the DynamoDB for free.

  2. Code Simplicity: There was existing code to communicate with DynamoDB, making DynConf very simple to write, use, and think about.

  3. __ Operational Simplicity:__ We didn’t have to worry about being able to scale or manage any clusters of hosts.

  4. Flexibility: With creativity, a generic key-value store can be adapted for almost every dynamic configuration use case (albeit, not optimally).

DynConf Diagram-minThe initial design of DynConf was pretty simple.

Overall, we valued the simplicity and low operational overhead for the very small team at the time.

How is it Used?

At Amplitude, DynConf settings usually fall into one of the following patterns

  • General toggles for behavior:
  • Numeric configuration:
  • Targeted toggles
  • Customized configuration (depending on access pattern)

Here are some of the most valuable use cases we were able to cover using these patterns.

Feature Flags for New Behavior

General toggles in DynConf are extremely helpful when releasing a new feature or behavior since it allows quick and easy rollback without needing to wait several minutes for a re-deploy.

We also often use targeted toggles to enable behavior for internal dogfooding and early beta testing.

Multi-stage Rollouts

Migrating or upgrading services often involves staged rollout of behavior, where deploying each time may be disruptive.

For example, if the load is a concern for a new service, we use a numeric configuration to control the rate at which requests are redirected to the new service and observe performance metrics under production load.

Tuning Performance Configuration

When working with performance, it isn’t always clear what the best cutoffs or settings will be, especially for production load and data patterns. By making certain timeouts, limits, and thresholds numeric configurations, we are able to more quickly find the right trade-offs.

Customer One-Offs

As a B2B company, we often have to support special behavior for a very small set of customers, such as special query semantics or temporary overrides. Targeted toggles and customized configuration are great for this.

Related Reading: Building Customer Empathy With Legos

What We Learned

After using DynConf for many years, we’ve realized some important strengths and weaknesses.

Managing dynamic configuration settings is hard

In general, it’s easy to accumulate random dynamic configuration scattered around the {quote:left}It’s easy to accumulate random dynamic configuration scattered around the code.{/quote} code. As a result, important information like what values keys are set to, what a given key does, or even what keys exist and matter can become tribal (or lost) knowledge.

DynConf - management-minAnswer: DynConf.getNumber(“haoMuch.cheezburgerCanHaz”)

Because DynConf is incredibly simple, it doesn’t really organize this information. This prompted us to make a basic internal admin tool for listing keys, finding keys, and setting configuration values. In addition to reducing the likelihood of mistakes when setting values, the tool also records changes to a MySQL database to track historical setting values.

The admin tool isn’t perfect though. We still occasionally run into issues where developers are confused by “abnormal” behavior caused by a DynConf setting they didn’t know about. Truly solving the problem would require investing in a more sophisticated system for managing dynamic configuration information.

Related Reading: How to Build Product-Oriented Engineering Teams

Not everything makes sense as a DynConf

DynConf is incredibly flexible, but a specialized tool can often a better choice.

You’ll get it into the wall, but there might be a better way…

There are several broad types of feature flags and toggles, and the simplicity and lack of structure in DynConf makes it unsuitable for some of them. For example, we’ve created a separate system for releasing new features to end-users and for managing the complexities of what feature offerings a customer might have.

On the other hand, the reliability of DynConf continues making it the tool of choice for dev-ops kill switches in emergency situations. For example, we have a general toggle to disable real-time computation under heavy query load and a toggle to swap over to a backup kafka cluster in our ingestion pipeline.

DynConf’s value lies in speed of iteration and peace of mind

Due to its reliability and flexibility in handling complex rollout, DynConf makes it easy{quote:right}DynConf makes it easy to release new behavior with a built-in rollback switch.{/quote} to release new behavior with a built-in rollback switch. This greatly mitigates risk when making changes to critical services, which means worriers like me can save several days or weeks of over-validating and over-testing before being confident enough to deploy.

Similarly, as a small team with limited resources, being able to dynamically tune performance configurations against conditions in the production environment lets us quickly find “good enough” settings and move on, knowing we can always easily retweak them if needed.

And, best of all, there’s been little overhead around scaling or availability of DynConf. The only major change has been extra caching layers using Redis or DAX to reduce DynamoDB costs.

dynamicconfigurationisfaster-min While a bike isn’t not as nice as a car, it’ll still get you from A -> B way faster than walking. It’s easier to assemble too!

All things considered, DynConf gave us a massive boost in development velocity for a very small amount of investment.

Closing thoughts

Over time, we’ll probably keep seeing configurations move off of DynConf and needing further effort towards managing its growing complexity. Still, this simple, early investment into dynamic configuration has had a massive impact on the engineering team’s velocity getting to where we are today.

So, if you don’t have anything yet, adding something simple like Dyn(amic)Conf(ig) backed by your favorite reliable key-value store might be worth your time.

Kevin Wu

Kevin Wu, a software engineer at Amplitude, is passionate about using analytics to drive growth.

More from Kevin

#1 Product Intelligence

See how Amplitude helps you use customer data to build great product experiences that convert and retain users.

Explore Amplitude

Inside Amplitude