img

Case Study: Proving a WordPress Staging Site Is Production-Ready

Performance Tuning, a Cache That Wasn’t Caching, and a 4-Gigabyte Ghost — What It Took to Make a Staging Site Production-Ready


Introduction: The Difference Between “Running” and “Ready”

In Part 2a of this series, I described building the new server for a nonprofit ministry’s WordPress site: standing up the infrastructure, deploying the staging copy, and locking down every external-facing connection so the clone couldn’t accidentally process real donations, send real emails, or leak data. By the end of that work, the staging site was running. Pages loaded, the admin panel was accessible, and the caching layers were in place.

But…running isn’t the same as ready.

My first real look at the staging dashboard told me we weren’t done:

  • The admin panel was cluttered with nag banners from plugins that had lost their license keys during the move.

  • Initial page load measurements confirmed what I suspected — the site was technically functional, but far from optimized.

  • And behind the scenes, I had a list of unanswered questions that would determine whether this server could actually handle production traffic, or whether it would buckle under the weight of the same technical debt we’d been excavating since Part 1.

What followed was five sessions of methodical investigation: ripping out a caching layer that wasn’t doing its job and replacing it with one that was, tuning the database and PHP configuration, profiling every millisecond of page render time to find the real bottleneck, cleaning up a database bloated by a years-old bug, discovering a plugin that was silently adding thirty seconds to every admin operation, and then recovering from a server crash caused by a PHP feature that had no business being enabled in the first place.

This is the story of making the build production-ready — not by assuming things worked, but by measuring, breaking, fixing, and measuring again.


Part One: The Nag Banners and What They Meant

As I alluded to above, when I logged into the staging dashboard for the first time after deployment, I was greeted by a wall of admin notices. Every one of them was telling a story, if you knew how to read them.

Object Cache Pro was demanding a license token. Redis was running perfectly but the WordPress plugin that connected to Redis was a commercial product, and its license hadn’t made the trip from the old server. It was the equivalent of a car with a working engine that won’t stop beeping because the satellite radio subscription lapsed.

WP Rocket had two complaints: the website domain had changed (to staging.example.com), and plugins had been enabled or disabled since its last cache generation. WP Rocket is a page caching plugin, and when it detects a domain change, it stops caching entirely. Every page load was hitting the full PHP stack.

WPForms reported an expired license. This one was genuinely concerning because an expired license blocks security updates for a plugin that handles form submissions, including donation forms.

Each of these notices was a clue. Together, they told me the caching architecture needed to be rebuilt.

The Object Cache Swap

Object Cache Pro is a premium WordPress plugin that acts as the bridge between WordPress and the Redis server. It’s excellent software but the client didn’t have a license for it, and there was no evidence they ever had. On the old managed hosting platform, the hosting provider had likely bundled it.

The fix was straightforward: replace the commercial plugin with the free Redis Object Cache plugin, written by the same developer. Same Redis server underneath, same functionality at this site’s scale, no license nag.

The execution was less straightforward. The standard wp redis enable command failed silently because the web server user didn’t have write permission to the wp-content directory where the drop-in file needed to live. I had to manually copy the drop-in file and set the correct ownership. This is the kind of small permission mismatch that’s easy to miss and produces symptoms that look like the plugin simply didn’t work.

With the swap complete, Redis status confirmed connected, the drop-in valid, and metrics enabled. Object Cache Pro was deleted from disk, so that was one nag banner down.

Killing WP Rocket and Building Something Better

The WP Rocket situation required a decision; the client’s license was expired, and they didn’t want to renew it. I didn’t push back, because I had a better plan.

WP Rocket is a PHP-level page cache. When a visitor requests a page, WordPress boots PHP, WP Rocket checks if it has a cached copy, and if so, serves the cached HTML. The problem is that “boots PHP” part. Even on a cache hit, every request passes through the PHP interpreter, which takes time and consumes memory.

The new server was running OpenResty (a high-performance web server built on top of Nginx). OpenResty supports FastCGI caching, which stores rendered pages at the web server level. When a cached page is requested, OpenResty serves it directly from disk or memory without ever touching PHP. It’s like the difference between calling a restaurant to ask if they’re open versus just reading the sign on the door: one path wakes up the whole system to answer a simple question, while the other reads a pre-prepared answer directly.

Removing WP Rocket should have been simple, but…it wasn’t (what ever is, right?). The wp plugin delete command hung indefinitely because WP Rocket’s uninstall hook was trying to clean up its own cache files and transients in an infinite loop. I killed the process and deleted the plugin directory manually, then swept up the artifacts it left behind: a stale wp-cache-config.php, a cache/wp-rocket/ directory, and an advanced-cache.php drop-in that was now pointing at software that no longer existed.

Building the FastCGI Cache

The replacement caching system had three components.

First, a cache zone definition in the OpenResty configuration: 128 megabytes of RAM for cache keys, 256 megabytes of disk for cached pages, and a 60-minute expiration for inactive entries. The cache key was built from the request scheme, method, host, and URI — meaning https://staging.example.org/donate/ and https://staging.example.org/about/ would each get their own cached copy.

Second, the bypass logic—the rules for when not to serve cached content. This is where the real thinking happens. POST requests skip cache (form submissions should never be served from cache), requests with query strings skip cache (they often indicate unique content), logged-in users skip cache (they see personalized dashboards, not the public page), and critically, every GiveWP and WPForms endpoint skips cache (donation processing and form submissions must always hit the live application).

Third, a cache purge system. When content changes in WordPress the cache needs to be invalidated. I wrote a must-use plugin that hooks into WordPress’s content lifecycle and flushes the entire cache directory on any change. It also adds a “Purge Cache” button to the admin bar for manual clearing.

Technical Note: A must-use plugin (or MU-plugin) is a PHP file in a special mu-plugins directory that WordPress loads automatically on every request, before regular plugins. It can’t be deactivated through the admin panel, which makes it ideal for infrastructure-level code like cache purging.

The MU-plugin crashed the site on the first attempt. The constructor called a WordPress function — is_admin_bar_showing() — that internally checks whether a user is logged in. But MU-plugins load before WordPress initializes its user authentication system, and the function didn’t exist yet when the code tried to call it, so that was producing a fatal error on every page.

The fix was a single line: move the admin bar code into a callback on WordPress’s init hook, which fires after the authentication system is ready. Even simple code needs to understand the boot sequence of the platform it’s running on.

The Results

With the FastCGI cache in place, I ran the first benchmarks from the server itself:

  • Uncached (MISS): ~675 milliseconds — PHP rendering the page from scratch
  • Cached (HIT): ~25 milliseconds — OpenResty serving HTML from memory

A 96% reduction in response time. For visitors arriving through Cloudflare’s CDN, total page delivery was under 300 milliseconds including the network hop from their location to the origin server. Not bad, but…there was a problem I hadn’t found yet.


Part Two: The Cache That Wasn’t Caching

The benchmark numbers looked great when tested from the server. Testing through Cloudflare told a different story though: every request was a cache MISS, which meant the FastCGI cache was never serving a HIT to real-world traffic.

I started debugging systematically. The bypass logic was clean: no Cloudflare headers matched the cookie or URL patterns that would trigger a skip; the access logs showed requests arriving through Cloudflare normally; the SSL configuration was correct (Full/Strict mode, no protocol mismatches).

Then I found it: a single line buried in the site configuration file.

fastcgi_cache_bypass $http_pragma;

This directive tells OpenResty to bypass the cache for any request that includes a Pragma: no-cache header. Browsers send this header on refresh, and Cloudflare, acting as a reverse proxy, faithfully forwards it on every request it passes to the origin server.

Every single visitor request coming through Cloudflare was bypassing the cache and hitting the full PHP stack. The FastCGI cache had been effectively useless for all real-world traffic since the moment I configured it.

I confirmed it with a targeted test: a curl request with -H "Pragma: no-cache" returned BYPASS; the same request without the header returned HIT. So, I removed the line, reloaded OpenResty, then tested again through Cloudflare and…HIT on the second request, consistently.

This is a good example of why benchmarking from the server itself isn’t enough. The cache looked perfectly healthy in local tests while serving zero cached pages to actual visitors coming through Cloudflare. Without testing through the real traffic path, I would have carried this invisible performance hole into production.


Part Three: Chasing the Real Bottleneck

With the cache working correctly for anonymous visitors, the next question was: what about that 675-millisecond uncached render time? Could it come down?

The site had 46 active plugins. Conventional WordPress wisdom says fewer plugins means faster page loads. After all, each plugin runs initialization code on every request, adding overhead, and so I expected to find at least a few expensive plugins dragging the render time up.

But before testing that hypothesis, I wanted to eliminate the infrastructure as a variable. If the database or PHP runtime were misconfigured, plugin benchmarks would be meaningless—I’d be measuring bottlenecks that had nothing to do with plugin overhead.

Tuning the Infrastructure

I started with the database and PHP runtime.

MariaDB was running with default settings on an 8GB server. The InnoDB buffer pool was set to 128 megabytes, far too small for a database this size. Disk read statistics showed the database was constantly evicting data from the buffer pool and re-reading it from disk. I increased the buffer pool to 512MB, bumped the log file sizes to reduce write pressure, opened the temporary table size so complex queries could be sorted in memory instead of hitting disk, and reduced the maximum connection limit from 151 (appropriate for a shared server with dozens of sites) to 30 (one site, twelve PHP workers, with headroom).

PHP-FPM needed revisiting. During the initial server build in Part 2a, I had configured the pool to allow up to 30 workers, estimating roughly 50MB per process based on the pre-migration audit data. The real number turned out to be much higher. With the full Avada theme, Fusion Builder, GiveWP, and 46 active plugins loaded in a PHP 8.4 environment, each worker was actually consuming around 230MB of RAM which was over four times the original estimate. That made the 30-worker ceiling dangerous: a traffic spike filling the pool would push PHP memory consumption to nearly 7GB, almost the entire server. One burst of concurrent requests could exhaust all memory and crash the system. I therefore reduced the maximum to 12 workers (a 2.76GB ceiling that left comfortable headroom), set them to recycle every 500 requests (preventing memory leaks from accumulating over time), and enabled the slow request log to catch any PHP operation taking longer than 3 seconds.

The Plugin Batches

After that, I moved onto deactivating plugins in batches, flushing the Redis and FastCGI caches after each batch, and benchmarking the uncached response time.

The first batch made no measurable difference. Expected. The second batch took the average from about 1.06 seconds to 1.00 seconds. The third batch no change. The fourth batch a small improvement to 0.95 seconds. The fifth batch the time bounced back up to 1.06 seconds.

After deactivating twelve plugins, the cumulative improvement was roughly 100 milliseconds, with high variance between runs. Plugin count wasn’t the bottleneck.

The Profiler

To find out what was actually consuming the render time, I needed to look inside PHP’s execution. I wrote a lightweight profiling MU-plugin that hooked into WordPress’s shutdown sequence and logged the total PHP execution time, query count, and peak memory usage for each request.

The first pass was revealing: the homepage required 901 database queries, consumed 50MB of RAM, and took about 1,000 milliseconds to render. That seemed like a database problem…until I looked more carefully.

With Redis object cache warm, only 97 of those 901 queries actually hit the database. The rest were served from Redis’s in-memory cache. Redis was already eliminating 89% of the database load. The database wasn’t the bottleneck either.

I upgraded the profiler to log the actual queries, capturing the 20 slowest and 15 most repeated patterns. The breakdown became clear:

  • Database time: 150 milliseconds total, with a single GiveWP query accounting for 109 of those milliseconds
  • Pure PHP execution: ~585 milliseconds — the Avada theme’s Fusion Builder rendering 106 shortcodes into 379 kilobytes of HTML
  • Peak RAM: 52MB per request

Technical Note: A shortcode is a WordPress convention where content like [gallery ids="1,2,3"] gets replaced at render time with the actual HTML for an image gallery. The Avada theme’s Fusion Builder uses shortcodes extensively — every column, row, container, text block, image, title, and separator on the page is a shortcode that Fusion Builder parses and renders into HTML on every uncached request.

The homepage had 27 columns, 13 containers, 13 rows, 11 text blocks, 10 images, 9 titles, 8 separators, 3 global elements, 3 accordions, and more. Each one was a PHP function call parsing a shortcode string, processing attributes, and outputting HTML. The accumulated cost of 106 render operations was ~585 milliseconds of pure CPU time.

This was the rendering floor — the architectural minimum for this page’s complexity. No amount of PHP tuning, OPcache optimization, or JIT compilation would meaningfully reduce it, because the bottleneck was the page builder doing its fundamental job: turning 98KB of shortcode content into 379KB of rendered HTML.

I confirmed this by enabling OPcache optimizations (increasing the bytecode cache to 256MB) and PHP’s JIT compiler; both had zero measurable impact on the uncached render time. The JIT compiler would become relevant later, (just not in the way I’d hoped).

The Conclusion That Matters

The real performance solution had already been implemented: the FastCGI page cache serving anonymous visitors at 25 milliseconds. For the vast majority of traffic to a nonprofit’s website, every visitor after the first one gets the cached page instantly. The ~950ms uncached render time only matters for the first visitor after a content change, and for logged-in administrators.

This was a valuable lesson in methodical diagnosis. I could have spent days trying to squeeze another 50 milliseconds out of PHP configuration. Instead, the profiler told me exactly where the time was going, and the answer was: somewhere I couldn’t optimize without redesigning the page. Time to focus on what I could fix.


Part Four: The Admin Panel and Its Hidden Costs

If the frontend render time was an architectural fact of life, the admin panel was a crime scene. The PHP-FPM slow log was filling up with entries from wp-admin operations.

The GiveWP donation management plugin was the worst offender. Its dashboard widgets made four concurrent REST API calls to generate reports (total income, total donors, average donation, total refunds), each taking roughly 1.4 seconds, executing 30-50 database queries, and consuming 48-52 megabytes of memory. Loading the WordPress dashboard meant waiting for all four reports to complete before the page would render.

The Avada theme’s Fusion Core added its own widget that fetched an external RSS feed: the SimplePie library making blocking HTTP requests to load ThemeFusion news. The WordPress Events & News widget did the same. Every dashboard load was making multiple outbound HTTP requests and waiting for responses.

I wrote another MU-plugin that surgically removed all of these widgets from the dashboard. GiveWP’s reports are still accessible through their dedicated admin pages; they just no longer block the main dashboard from loading.

The Database Bloat

GiveWP stores its donation form data using WordPress’s post type system, with form metadata in a table called wp_give_formmeta. This table had swollen to 323,387 rows.

Investigating the distribution of metadata rows revealed a bug: a single meta key — _give_recurring_goal_format — had 53,307 entries for 2,062 forms. That’s roughly 26 duplicate entries per form. A GiveWP process was inserting this value on a recurring basis without checking whether it already existed, creating duplicate rows every time it ran. Over the site’s lifetime, this had silently accumulated into a significant performance drag.

The cleanup was methodical:

  1. Backup the table. Always, before modifying production data.

  2. Delete orphaned rows — 1,193 metadata entries pointing to forms that no longer existed in the database.

  3. Deduplicate the recurring goal format entries — I kept only the newest row per form (identified by the highest metadata ID) and deleted the remaining 51,237 duplicates.

  4. Purge stale slug history — 26,162 rows of _wp_old_slug entries tracking every URL change every form had ever undergone.

  5. Optimize the table — an InnoDB rebuild that reclaims disk space and updates index statistics.

Final result: 323,387 rows reduced to 244,795. The GiveWP query that had been taking 109 milliseconds on the frontend would benefit proportionally.

The Thirty-Second Surprise

While deleting redundant plugins from disk, every WP-CLI command started hanging. The wp plugin delete command would freeze, then wp plugin list froze, then even wp --info ( a command that should return instantly) hung for thirty seconds before responding.

Something was blocking WordPress’s entire initialization process.

I used WP-CLI’s --skip-plugins flag to binary-search the culprit:

  • wp --skip-plugins --info — instant response. A plugin was responsible.
  • wp --skip-plugins="zoho-salesiq" eval "echo 'test';" — still hung. Not Zoho SalesIQ.
  • wp --skip-plugins="zoho-flow" eval "echo 'test';" — instant response. Zoho Flow.

Zoho Flow is an integration connector that links WordPress to the Zoho ecosystem. During WordPress’s init phase which runs on every single request, including WP-CLI commands, it makes an outbound HTTP request, likely checking API keys or webhook registrations against the site’s domain. But the staging domain wasn’t registered with the client’s Zoho account, so the HTTP request timed out every time on every request.

This meant that every uncached admin page load, every WP-CLI command, and every cron execution had been silently waiting 30 seconds for a timeout that would never resolve. The sluggish admin experience I’d been measuring was partly this — stacked on top of the GiveWP widget delays and RSS feed fetches, a single admin page load could have been waiting a minute or more before rendering.

Deactivating Zoho Flow on staging was the single largest admin performance improvement of the entire engagement. It would be reactivated on production, where the domain registration would be correct and the HTTP check would resolve normally.

The CAPTCHA Audit

While investigating the plugin stack, I found two Cloudflare Turnstile plugins active simultaneously: simple-cloudflare-turnstile (a general-purpose CAPTCHA plugin) and give-cloudflare-turnstile (a GiveWP-specific add-on).

Investigating which one was actually protecting the donation forms required tracing the configuration through three different storage locations. The general-purpose plugin had site keys configured but every protection toggle was blank which meant it was loaded on every page but protecting nothing.

GiveWP, it turned out, stores its own Turnstile keys directly in its settings table, independent of either plugin. The donation forms were protected by GiveWP’s own native integration, not by either of the Turnstile plugins.

The general-purpose plugin was fully redundant: loading JavaScript on every page, consuming render time, and protecting nothing. I deactivated it. Meanwhile, WPForms was using reCAPTCHA v3 (not Turnstile) through its own dedicated CAPTCHA plugin. Three separate anti-spam layers, three separate configurations, in three separate places which is the kind of sprawl that accumulates on a site managed by multiple people over multiple years.

The Tracking Script Graveyard

During the isolation audit in Part 2a, I’d found a Google Ads conversion pixel firing from within the Avada theme’s settings, embedded deep enough that removing it would have required re-adding it at cutover, but the theme wasn’t the only place tracking code was hiding.

The insert-headers-and-footers plugin contained an entirely separate layer of tracking scripts, and a layered history of the site’s tracking evolution.

The header had a working Google Tag Manager container, a Google Ads conversion tag loaded twice (redundant), a third-party lead attribution script from GAconnector, and a dead event listener for a contact form plugin that had already been deactivated.

The footer was worse: Universal Analytics calls that had been sending data into the void since Google sunset the platform in July 2023, a legacy Google Ads remarketing script superseded by the tag already in the header, dead form redirect code, and jQuery-based event tracking for PDF downloads and Mailchimp signups…all calling the defunct ga() function.

None of this code was throwing errors. It was just silently executing on every page load, making HTTP requests to endpoints that either ignored the data or didn’t exist anymore, adding kilobytes of JavaScript to every page, and on staging sending test traffic data to the client’s production Google Ads account.

I flagged that the entire plugin could potentially be replaced if the client’s Google Tag Manager container was properly configured to handle all the conversion tracking. That determination would require input from whoever manages their Google Ads account.


Part Five: When the Server Crashed

With the plugin cleanup nearly complete and performance dramatically improved, I was working through the final checklist items when the site went down.

I was browsing the Avada theme builder in the admin panel when WordPress threw its “critical error” screen. The PHP error log told the story:

PHP Fatal error: Allowed memory size of 268435456 bytes exhausted
(tried to allocate 4295229440 bytes) in wp-includes/theme.php on line 325

The memory limit was 256 megabytes. PHP was trying to allocate 4,295,229,440 bytes in a single operation. No legitimate WordPress function requests 4GB of memory. This wasn’t a memory shortage; it was corrupted data being fed to PHP’s unserialize() function, which was dutifully trying to reconstruct whatever malformed structure it had been given.

The Diagnostic Process

I went through the standard triage sequence:

  1. PHP-FPM status — still running. Workers were active and the process hadn’t crashed system-wide.

  2. Debug log — the fatal error was repeating on every request. I also found deprecation errors from easy-social-share-buttons, a plugin that wasn’t in the documented active list but was somehow enabled.

  3. Theme options — checked the serialized data for the Avada theme and all installed themes. Nothing oversized; the largest theme options were 912 bytes.

  4. Deactivated the mystery plugin — still crashed.

  5. Deactivated ALL plugins — still crashed.

  6. Switched to a different theme via direct database update but that theme also crashed, this time because it used create_function(), a PHP function removed in PHP 8.0.

I was running out of standard explanations. The error was happening in WordPress core’s theme loading code, with all plugins disabled, on two different themes. The corrupted data wasn’t coming from the database either; I had verified the stored theme options were clean.

Then I looked at the caching layer.

The Ghost in the Machine

PHP’s OPcache is a built-in performance feature that compiles PHP source code into bytecode and caches the compiled version, so subsequent requests don’t need to reparse the same files. PHP 8.4 added a JIT (Just-In-Time) compiler that goes a step further: it compiles frequently-executed bytecode into native machine code at runtime, theoretically making it even faster.

I had enabled JIT in an earlier session during performance tuning, specifically in tracing mode, where the JIT monitors code execution paths and compiles the hottest ones. The benchmark at the time showed no measurable improvement for WordPress (because WordPress is I/O-bound, not CPU-bound), but it hadn’t caused any problems, so I left it enabled…until now.

The 4-gigabyte allocation attempt was the signature of a JIT corruption: the tracing compiler had produced a corrupted compiled cache entry for a chunk of theme-loading code. When PHP loaded that corrupted cache entry, it deserialized garbage data that specified an absurd memory allocation. The first time the Avada theme builder loaded the specific code path that triggered it, the corrupted entry entered the cache. Every subsequent request that hit the same code path would crash.

The fix was layered: flush the OPcache (to purge the corrupted entry), flush Redis, flush the FastCGI cache, and restart the PHP-FPM process. Then permanently disable JIT by overwriting the configuration file:

opcache.jit=disable
opcache.jit_buffer_size=0

Technical Note: I verified the JIT was actually disabled by creating a temporary PHP file served through the web server, not through the command line. PHP-FPM (which serves web requests) and PHP-CLI (which runs WP-CLI commands) have separate configuration directories. Checking via wp eval would have shown CLI’s config, not the web server’s — a subtle distinction that had already produced misleading results earlier in the engagement.

After the fix, the site recovered immediately. I retested the Avada theme builder page that had triggered the crash and…it loaded cleanly.

JIT compilation in PHP is designed for CPU-intensive workloads: mathematical simulations, image processing, machine learning. WordPress is the opposite: it spends its time waiting for database responses and HTTP requests, not crunching numbers. The OPcache bytecode cache (which remained fully active at 256MB with a 98.9% hit rate) is where the real performance benefit lives for WordPress. JIT added instability risk for zero measurable benefit.

The Collateral Discovery

During the crash investigation, I discovered that easy-social-share-buttons — version 1.4.5, years out of date — was actively running on the site despite not appearing on any documented plugin list. It was throwing PHP 8.4 deprecation errors on every page load. How it got activated is unclear — possibly a leftover from a previous administrator’s work that was never cleaned up. It went on the deletion list.

I also discovered that the four inactive themes on the server (charity, generatepress, hestia, vw-charity-ngo) were all likely incompatible with PHP 8.4. The charity theme proved it by crashing with a create_function() error when I tried to switch to it during the JIT investigation. The client chose to keep them — a decision I documented and respected, noting that inactive themes with known PHP incompatibilities are a minor security surface but not a functional risk.


Part Six: Locking the Last Doors

With performance tuned and the JIT crash resolved, the final work was ensuring every external connection was properly sandboxed before the client started active testing on staging.

Payment Gateway Hardening

The Authorize.net gateway plugin had been deactivated during the initial isolation audit in Part 2a — that was the first line of defense. But the underlying GiveWP payment configuration still had test mode disabled and live API credentials active in the database. If the gateway plugin were reactivated at cutover without first enabling test mode, it would immediately start processing live transactions against the real Authorize.net account. One missed step in the cutover checklist, and staging becomes a payment-processing site.

I enabled GiveWP’s test mode via WP-CLI as an additional safeguard. The sandbox credentials were mostly empty, which meant donation forms would now fail gracefully rather than charging real cards. The live credentials remained in the database (they’d be needed at production cutover) but were now behind two independent blocks: the plugin deactivation and the test mode flag.

The client confirmed that Authorize.net was their only active card processor; Stripe had residual configuration from an old integration attempt but was never used. PayPal Standard was removed from the active gateways (PayPal Commerce remained as a secondary option).

Email Isolation

Transactional email was already double-blocked from an earlier session: WP Mail SMTP was deactivated (so WordPress couldn’t send through Gmail OAuth), and the server’s mail transfer agent (Exim4 which is installed by default on Debian) had been stopped and disabled at the system level. Even if something reactivated the mail plugin, the underlying mail transport was gone.

The production email plan was documented: reactivate WP Mail SMTP at cutover (the Gmail OAuth credentials were already configured), leave Exim4 permanently disabled (WP Mail SMTP bypasses it entirely), and test donation receipts and form notifications immediately after going live.

The Deactivated Plugin Roster

Over the course of the engagement, I had accumulated a list of ten plugins that were deactivated on staging and needed client decisions before production cutover. Each one represented a different kind of risk:

Some like the Authorize.net gateway and Zapier integration needed to be reactivated on production because the client actively used them. Others like the Google Analytics dashboard plugin the client confirmed they didn’t need. UpdraftPlus was being replaced with a proper server-level backup strategy. WP Mail SMTP needed reactivation for transactional email, and the WPForms add-ons for geolocation and form abandonment tracking were features the client wanted to keep.

Every one of these decisions was documented, with the rationale for each.


Lessons for Site Owners

Your page cache might not be caching. The Pragma header bypass bug meant the FastCGI cache was “working” in every test I ran from the server, while serving zero cached pages to real visitors. Testing performance from inside your own infrastructure isn’t enough; you need to test through the same path your visitors use.

Plugin count isn’t always the bottleneck. Twelve plugins were deactivated with barely 100 milliseconds of improvement. The real performance cost was the page builder rendering 106 shortcodes. That’s more of a design decision, not a plugin problem. Profiling (not assumptions) tells you where the time actually goes.

Database bloat accumulates silently. A GiveWP bug creating 26 duplicate rows per form on every recurring process had generated over 50,000 unnecessary database entries over the site’s lifetime. No one noticed because each individual insertion was tiny, but the cumulative effect dragged a key query from milliseconds into triple digits.

Staging environments need active isolation, not assumptions. A staging copy of a production site contains live API credentials, live webhook URLs, live email configurations, and live payment gateway settings. Always make sure that what needs to stay isolated is isolated.

Obscure features create obscure failures. PHP’s JIT compiler produced a corrupted cache entry that crashed the site with a 4-gigabyte memory allocation. The crash had no obvious cause; all plugins could be disabled and the theme could be changed and it would still happen, because the corruption lived in a compiled code cache, not in any PHP source file or database record. Features that provide no measurable benefit but add complex runtime behavior are liabilities, not optimizations.

Every optimization is a measurement. The work in this phase was defined by a rhythm: change something, flush the caches, measure the result, decide whether the change mattered. MariaDB tuning, OPcache tuning, and JIT didn’t move the TTFB needle; plugin deactivation barely moved it. The profiler showed me why: 585 milliseconds of Avada shortcode rendering that no amount of infrastructure tuning could touch. Without measuring, I would have kept optimizing things that didn’t matter.


Conclusion: Proving Readiness

Over the course of this work, the staging site went from “technically running” to genuinely production-ready:

  • Anonymous visitor performance: From ~675ms uncached (with a broken cache serving zero hits) to 25ms cached, with the cache actually working for real traffic
  • Admin dashboard: From a multi-second wait (with 30-second Zoho timeouts, blocking REST API calls, and external feed fetches) to under 2 seconds
  • Database: 78,592 rows of bloat removed from the donation metadata table
  • Plugin stack: From 46 active plugins with redundancies, dead code, and mystery activations to a clean, documented list of 31 with clear rationale for each
  • Caching architecture: From a broken WP Rocket installation serving nothing to a three-layer stack (Cloudflare edge → FastCGI page cache → Redis object cache) with automatic purging
  • Infrastructure stability: JIT disabled, cron running on system schedule, memory-safe FPM pool, tuned database buffers

But numbers only tell part of the story. The harder work was the investigation that produced them: discovering that a cache wasn’t caching, that a plugin was adding 30 silent seconds to every operation, that a PHP feature was quietly corrupting compiled code, that a GiveWP process was duplicating metadata rows 26 times over. Each of these findings required a different diagnostic approach, and each would have been invisible without systematic measurement.

Anyway, the server was built, the site was deployed, the isolation was verified, the performance was proven, but the work wasn’t finished. The 10GB media library still needed to be transferred, a final fresh database export from production would need to happen close to cutover to capture recent donations and content changes, the backup strategy that would replace UpdraftPlus still needed to be designed and tested, and then came the cutover itself—the moment when DNS changes, production traffic starts flowing, and every assumption gets tested by real visitors making real donations.

That work continues in Part 3.


This case study describes real work performed by Stonegate Web Security. Client details have been anonymized and certain identifying specifics altered. Technical details, methodologies, and findings are reported accurately.


Related Reading