Email Jammed Up? Two Ways We Overcome Email Bottlenecks

Have you ever heard the term, critical path? 

In the world of email sending, you can think of a critical path as a bottleneck. If your email server isn’t running as fast as you’d like, it’s probably because one (or more) bottlenecks are gumming up the works. There’s a chance your email could be slow due to warming-up new IPs, ISP spam filters, or how your delivery throttling rules are configured (you know, the deliverability stuff). But if all of those are in place, then you’re down to how fast your software and your server can run. So that’s what we’re covering today, how we make the software a speed-demon. 

To give a real-life example, one of my colleagues at GreenArrow moved into a new home last month. But because the ground was too hard (read: frozen) for the builder to install her new mailbox, the post office agreed to hold all of her mail until the mailbox is installed this spring. It seems reasonable—except that they only allow pick-ups between 12-5 PM Monday – Saturday. For most working people, this means maybe they’ll pick up their mail maybe once a week. And sure enough, yesterday she showed up with an empty file box just to carry it all home!  For my colleague, the post office pickup hours have created a critical path that is severely affecting her mail delivery. (Hang in there, Nicole!)

Today, we’ll cover two features that have helped us overcome our own email sending bottlenecks: Direct Injection and our newest performance-enhancing feature, greenarrow-remote. Neither will help my colleague get her two-day Amazon Prime shipments any faster, but both can help you get better email results.

FedEx Kinkos Store
Our CEO David wrote the first iteration of the RAM queue while traveling in the early 2000s. In those days, public WiFi hotspots were hard to come by, so the only high-speed Internet connection he could find was a Kinko’s. When they closed for the day he wasn’t done, so, undeterred, he finished developing the first iteration of the RAM queue using their WiFi from his car in the parking lot!
Image Credit: Wikipedia


Bottlenecks are the Mother of Innovation

Before he created GreenArrow, our CEO David Harris was a qmail consultant. He found that qmail’s critical path was often its disk IO requirements. A qmail server’s disk might be thrashing away, trying to keep up with the load, while its CPU was almost idle. This was because qmail needed to write each new message to disk to queue it. Then it deleted the message from disk once the message was either delivered or bounced. David reduced this overhead by creating a queue in RAM, which allowed qmail to avoid writing the message to disk if it could be delivered on the first attempt. Later, when David created GreenArrow Engine, he applied what he’d learned to create GreenArrow’s RAM queue.

After David created GreenArrow Engine, a customer asked him to look into some reliability issues (and just plain old-bugs) they were having with their Interspire Email Marketer (IEM) installation. What came out of this initially was an IEM patch, but what we gained from this experience was much more.  David’s vision for what could be, a reliable, high-performing platform, GreenArrow Studio—our email marketing studio—was born.

GreenArrow Engine and Studio each had a number of critical paths that we addressed in the years that followed. For example, Studio’s PostgreSQL database was a bottleneck, so we optimized its configuration and tuned Studio’s schema. Engine’s DKIM signing process was a bottleneck, so we optimized it too.

The Critical Path In Action

Last year, we found ourselves in a place where our latest critical path wasn’t Engine or Studio by itself. Instead, it was the interface between the two. Here’s the how emails were being transferred from Studio to Engine’s RAM queue:

Overcoming Email Sending Bottlenecks

As you can see, Studio generated an email, then launched a new copy of the forward program and piped the email into it. The forward program processed the message, then piped it to the qmail-dk program to perform DKIM signing. qmail-dk then piped the DKIM signed message into qmail-queue, which wrote the message to GreenArrow’s RAM queue.

The above path was inspired by the Unix “do one thing and do it well” philosophy. The forward, qmail-qk and qmail-queue programs were each written to perform a simple task, and they each do that one simple task very well. All these handoffs between the programs were creating extra overhead, though. (Incidentally, emails injected into Postfix from other mailing list managers can take an approach that’s similar to what’s described above.)

The Solution: Direct Injection

From the very beginning, GreenArrow has been about delivering email—fast. Every year, we’re improving the software to make it faster. And the interface between Studio and Engine scaled well—to a point. In 2016, launching these three new processes for each message was our biggest bottleneck, and taking a toll on CPU resources. We had discovered a new critical path.

Since we wanted to send emails faster, we needed to optimize the process of getting messages from Studio to Engine. This is where Direct Injection comes in. Direct Injection (our internal codename) allows Studio to perform DKIM signing internally, then write each message straight to the RAM queue without having to spawn any new processes for each message:

Overcoming Email Sending Bottlenecks

This simplification improved performance dramatically. It was a key step to getting customers sending up to 6 million messages per hour! That’s a lot of email, and it’s only possible because we were able to identify the critical path that was slowing us down.

(Note: Direct Injection can be used to improve transactional email performance as well. Messages injected into GreenArrow using SimpleMH use Direct Injection by default with GreenArrow Engine 4.1.121-0 and later.)

Our Next Bottleneck and greenarrow-remote

The thing about optimizing the critical path is, every time you address one bottleneck, another one pops up to take its place. So, what’s next?

SSD technology has been advancing at a much faster pace than CPU technology in recent years. As a result, we now see something that used to be extremely unusual: GreenArrow servers with high-speed NVMe hard drives and Direct Injection enabled can become bottlenecked on CPU resources due to MTA overhead. (In the past, mail servers would get bottlenecked in disk IO long before CPU overhead became an issue.)

Our development team used a Flame Graph to look at what the MTA was spending CPU time on and found that a significant amount of it was being spent delivering email via SMTP. Here’s a different but similar CPU bottleneck:

GreenArrow Email Sending Flame Graph

Our developers discovered that each time an SMTP delivery attempt occurred, a significant amount of CPU time was spent creating and maintaining a new process to deliver each email. On servers sending millions of emails per hour, this meant that millions of processes were being launched each hour, and thousands were running at any given moment to keep up with email deliveries.

If you’re thinking this story sounds familiar, then you’re absolutely correct. Direct Injection’s performance improvements come primarily from not having to launch as many processes to complete the same amount of work. greenarrow-remote  (another internal codename) essentially uses the same trick on the SMTP delivery side of things.

With greenarrow-remote, a single persistent process performs the same SMTP delivery work previously performed by millions of processes being launched, run, and torn down each hour. We decided to write greenarrow-remote in the Go programming language, so it benefits from Go’s built-in concurrency support. (Our development team loves Go.) In addition, using some of the concurrency advantages of the Go programming language, we built a much more efficient DNS cache, which reduced CPU utilization even further.

Our primary goal with greenarrow-remote was to reduce GreenArrow’s CPU requirements, and on that point, it does exceptionally well. However, another unintended benefit is that the new code also happens to have much lower memory requirements than the code it replaces. Win-win! (We recently released a closed beta of greenarrow-remote, and are hoping to make it available soon.)

What’s Next for Your Team?

If you’re like us, performance optimization is an ongoing process. And right now, we’re riding the crest of the wave, but I can only imagine what our next critical path will be and how we’ll tackle it once we find it. After all, the very nature of technology compels us to push the proverbial envelope, break barriers, and do things better, faster, and smarter than before.

What is your current critical path? What bottlenecks are you facing, and what next-level solutions are you employing to surmount them? I’d love to hear from you about what you are you working on—whether it’s in the realm of email performance or something totally unrelated (like renting a temporary mailbox for your new home so you can get your two-day Amazon Prime deliveries on time). Anyway, good luck, and keep pushing that envelope!

Share

Don't Miss Out!

Sign up for the GreenArrow newsletter, and we’ll email you tips, updates, and resources.