Planning on resilience

That thing you're launching: what if it fails to function?

The challenge of doing something for a crowd in real time is that if it doesn't work, you're busted. You have no way to alert people, to spread out demand, to reprocess inquiries. 

Batch processes gives you a fallback. If the first printing is a little off, you can fix it in the second (if the first printing is small enough). When you know the email address of the people you're dealing with, for example, you can easily reroute people and change expectations. If you know how to contact the ticket holders, you can let them know in advance that the theater roof is under repair. You can fix things today and get them right for tomorrow without disappointing a mob of people in real time.

There's a huge difference between interacting with customers one at a time, one after another, and learning as you go, vs. interacting with everyone, all at once, in parallel.

The arrogance of most web launches (from hip new sites to healthcare signups) is that they assume that nothing will go wrong if they do it live. So they try to do it live for everyone, at once.

When someone you have no data on bounces, you have no way to ask them to come back.

The only part of a launch that should be live is the part that benefits from being live. Everything else ought to be in a batch, reserved, asynchronous and capable of recovery.

It's a journey, not an event, and working in asynchronous batches is a smart way to stay resilient.