Case Study: Hardening, Backups, and the Hidden Problems Waiting in Production
A Nonprofit Migration Reaches the Point of No Return — and Production Starts Talking
Introduction: The Uncomfortable Middle
There’s a phase in every server migration that nobody talks about. The exciting work is done: the new server is built, the site is deployed, the performance numbers look good, but you’re not live yet, and between “staging works” and “production is cutover” lies a stretch of work that is almost entirely about discipline…hardening the server so it can survive being on the internet, building the backup infrastructure that will catch you when something goes wrong, transferring the assets that haven’t moved yet, and — inevitably — discovering things about the production environment that nobody knew were broken.
In Part 2b of this series, I’d finished the performance tuning work on a nonprofit ministry’s staging server: ripping out a caching layer that wasn’t doing its job, replacing it with one that was, tuning the database and PHP configuration, profiling every millisecond of page render time, cleaning up a database bloated by a years-old bug, discovering a plugin that was silently adding thirty seconds to every admin operation, and recovering from a server crash caused by a PHP feature that had no business being enabled in the first place. The server was fast, the site rendered correctly, and the staging environment was isolated from production.
But it wasn’t ready. Not even close.
The server had no hardening beyond the basics I’d set during the initial build. There was no backup strategy in place (the old UpdraftPlus configuration had been removed and nothing had replaced it yet); nineteen gigabytes of media files were still being proxied from the old hosting provider; and production, which I’d been carefully not touching while staging work progressed, was about to reveal problems of its own.
This is the story of the work that turns a fast staging environment into a production-ready server, and the discovery that the production database had become a problem no simple import could solve.
Part One: Hardening the Stack
When I built this server in the earlier phases, the priority was getting the stack functional — OpenResty serving pages, PHP-FPM processing requests, MariaDB answering queries, Redis caching objects, CrowdSec watching the perimeter. The configuration choices I made during the build were sound, but they were focused on making things work correctly. Now, with staging proven and the performance tuning complete, it was time to come back and apply the production-grade hardening layer: the security tightening that I’d deliberately deferred until the foundation was solid.
Server hardening isn’t glamorous; it’s a methodical, layer-by-layer process of reducing the attack surface of every service running on the box…closing doors that don’t need to be open, removing capabilities that don’t need to exist, and making sure that if something does get compromised, the blast radius is as small as possible.
The Operating System
SSH was the first target. I deployed a drop-in configuration file that disabled root login, disabled password authentication entirely (key-only access), restricted login to a single named user, limited authentication attempts to three per session, and turned off every forwarding feature the daemon offers (TCP forwarding, agent forwarding, X11 forwarding). The configuration was validated with a dry-run syntax check before the service was reloaded. A syntax error in an SSH configuration file can lock you out of your own server, and there’s no “undo” button when you’re working on a remote machine.
The firewall didn’t need any changes. I’d already locked it down during the initial build, restricting SSH to a single IP address and limiting ports 80 and 443 to Cloudflare’s published IP ranges across fifteen CIDR blocks. This means the server refuses direct connections from the public internet. All web traffic must flow through Cloudflare first, which provides DDoS protection and hides the server’s real IP address.
Kernel-level network hardening came next: reverse-path filtering to block spoofed source addresses, ICMP hardening to prevent broadcast amplification and redirect attacks, SYN flood protection with tuned backlog and retry parameters, and martian packet logging for forensic visibility. These are the kinds of protections that never show up in a site speed test but make a real difference when someone decides to probe your server.
I configured unattended security updates — security patches only, no feature updates, no automatic reboots. The server applies critical fixes on its own, but anything that could change behavior waits for a maintenance window.
The box was already clean of unnecessary services. The usual suspects that sometimes ship with default Linux installations (rpcbind, avahi, cups, bluetooth) were never installed in the first place. I’d started from a minimal Debian image specifically to avoid this kind of bloat.
The Application Layer
PHP-FPM, the process manager that handles all PHP execution for the site, got the most attention. I deployed a hardening configuration that hid the PHP version from response headers, disabled twelve dangerous system-level functions like exec and shell_exec that a compromised plugin could use to execute arbitrary commands on the server, restricted filesystem access via open_basedir so PHP could only read from the WordPress directory and temporary paths, and locked down session cookies with httponly, secure, and samesite attributes. One thing worth noting: disabling exec and shell_exec can theoretically break ImageMagick, which some WordPress sites use for image processing. But this stack uses the GD library instead, so the restriction was safe.
MariaDB needed only two finishing touches: disabling local-infile (which prevents the MySQL LOAD DATA LOCAL command from being used to read arbitrary files) and symbolic-links (which prevents symlink-based path traversal attacks). Marginal improvements, but the kind of thing that closes a door you never want opened.
Redis needed its command vocabulary restricted. By default, Redis exposes commands like FLUSHALL (which wipes the entire cache), CONFIG (which can change the server’s configuration at runtime), DEBUG (which can crash the process), and SHUTDOWN (which does exactly what it sounds like). WordPress only needs basic key-value operations — GET, SET, DELETE, and a few multi-key reads. I disabled eight dangerous commands by renaming them to empty strings, effectively removing them from the command set.
OpenResty, the web server, already had blocks in place for xmlrpc.php and PHP execution in the uploads directory, both from earlier phases. The additions here were security headers and server token suppression. I created a hardening configuration with server_tokens off (hiding the OpenResty version number), plus five security headers: X-Content-Type-Options to prevent MIME-type sniffing, X-Frame-Options to block clickjacking, Referrer-Policy to control information leakage, a Permissions-Policy denying access to camera, microphone, geolocation, and payment APIs, and HSTS to enforce HTTPS.
Getting these headers to load required a small surgical edit to the main nginx configuration file. The conf.d directory wasn’t being automatically included; only CrowdSec had an explicit include line. I added one more, validated the full configuration with a syntax check, and confirmed all five headers were present in the response.
There’s a subtlety here worth mentioning for anyone managing an OpenResty or Nginx server: header inheritance. If any location block in the site configuration contains its own add_header directives (and this one had two, for cache status and static asset expiry)then the security headers defined at the server or http level silently disappear for those specific locations. Nginx doesn’t merge headers across scopes; it replaces them. I flagged this as a known gap but not urgent, since all external traffic passes through Cloudflare, which adds its own set of security headers.
Part Two: Building the Safety Net
The site’s previous backup solution had been UpdraftPlus, a WordPress plugin that was no longer active on the new server. A production server without backups is a server waiting for a catastrophe with no recovery plan. I designed a three-layer backup architecture that wouldn’t depend on any WordPress plugin and wouldn’t store sensitive data on the server itself.
Layer 1: Nightly database dumps pulled to my local workstation over SSH. The key design decision was that no backup files would ever sit on the server. The script SSHes into the droplet, runs mysqldump remotely, pipes the output through gzip compression, and writes the compressed file directly to my local machine. If the server is compromised, the attacker never finds a convenient copy of the database waiting for them. A rotation scheme keeps seven daily backups and four weekly snapshots.
Layer 2: Nightly file sync via rsync. This captures everything the database dump doesn’t: theme files, plugin code, uploaded media, configuration files, custom code. The rsync runs in mirror mode, meaning the local copy is always an exact replica of what’s on the server. After the initial sync, nightly runs only transfer whatever changed, usually a handful of files in a few seconds.
Layer 3: DigitalOcean droplet snapshots. A full disk image of the entire server — not just WordPress but the operating system, all configurations, all services. If the server is destroyed, a snapshot can spin up an identical replacement in minutes.
Three layers, three different failure scenarios covered: database corruption (Layer 1), file-level damage or ransomware (Layer 2), and total server loss (Layer 3). None of them depend on a WordPress plugin, none of them store sensitive data on the server, and none of them require the site to be functional to restore from.
Part Three: The Nineteen-Gigabyte Problem
The staging site had been serving media files by proxying requests back to the old hosting provider. When a visitor loaded a page on staging and that page included an image, the staging server would fetch the image from the production server in real time and pass it through. This worked for testing purposes, but it meant the staging environment wasn’t truly independent.
The media library was nineteen gigabytes. Transferring it meant pulling it from the old host to my local workstation first (since I couldn’t establish a direct connection between the two servers), then pushing it up to the new droplet.
The pull from the old host, but before pushing it to the new server, I analyzed what I’d downloaded. Nearly half of it (9.3 gigabytes) was a directory called ShortpixelBackups. ShortPixel is an image optimization plugin that compresses images to reduce page load times, and it keeps the original uncompressed versions as a backup in case you ever want to revert. The site never serves these originals to visitors; they’re purely insurance.
I made the decision to exclude the ShortPixel backups from the upload to the new server. The originals would stay on my local workstation as a safety net, but putting 9.3 gigabytes of files that are never served onto a 160-gigabyte SSD didn’t make sense. This cut the upload in half.
The upload to the droplet hit three permission issues in sequence. First, the SSH user couldn’t write to the uploads directory because it was owned by the web server user. Fix: add the SSH user to the web server’s group and set group-writable permissions. Second, rsync could transfer the files but couldn’t set group ownership on the destination. Fix: add --no-group --no-owner flags to skip ownership changes during transfer. Third, a batch of files transferred successfully but rsync complained about permissions and timestamps it couldn’t set. The files themselves were all present; a follow-up rsync showed a speedup ratio of 1,366x, meaning zero bytes of actual data needed to be transferred. A single recursive ownership and permissions command cleaned up the metadata.
With uploads now local to the droplet, I removed the proxy configuration from OpenResty entirely. The staging site was now fully self-contained, with zero dependency on the old hosting provider for serving any content.
I updated the nightly file sync script to include the --no-group --no-owner flags and to exclude the ShortPixel backups directory, so future syncs would run cleanly.
While I was working through the WordPress directory, I also cleaned up the standard files that ship with every WordPress installation and serve no purpose in production but do serve attackers: readme.html, which announces the WordPress version, and license.txt, which carries the same information. I also removed wp-trackback.php, an obsolete trackback/pingback endpoint that’s a known abuse vector. The xmlrpc.php file was already blocked at the web server level, which is more durable than deleting it because WordPress core recreates the file on every update.
Part Four: The Drift Problem
With the staging site fully self-contained, hardened, backed up, and independent of the old host, I turned to the last major question before cutover: was the staging database still in sync with production?
The staging site had been deployed from a production snapshot taken on February 6th. Since then, I’d done extensive cleanup work on staging and a full database reimport from production would destroy it.
But production hadn’t been frozen either: real donations had come in, content had been edited, and media had been uploaded.
I queried production for everything that had changed since the snapshot date. Nine new donations had been processed, two pages has been edited, along with eight associated reusable content blocks used by the site’s page builder, and about twenty new media files had been uploaded.
The donation data was more complex than a simple row count suggested. The donation platform had migrated from storing metadata in the standard WordPress postmeta table to its own custom tables — give_donationmeta, give_donors, give_donormeta, give_revenue. Three of the nine new donations were from new donors who didn’t exist on staging at all and those donor records had to be imported before their donation metadata or the foreign key relationships would break.
The page edits were straightforward in concept but massive in volume. The site uses Avada with Fusion Builder, which stores its layout data as serialized arrays in postmeta. Two pages and eight reusable content blocks together accounted for over 500 rows of metadata. Those rows had to be exported from production and imported to staging without touching anything else in the database.
This was the moment where a less careful approach would have been tempting: just reimport the whole production database, re-do the cleanup work, move on. But that cleanup work had taken multiple sessions. Re-doing it would mean re-archiving forms, re-deactivating plugins, re-verifying that every change was correct, and it would mean the cutover runbook couldn’t trust that the staging environment matched what had been tested.
Instead, I designed a selective sync strategy: surgical exports of specific records from production, using targeted mysqldump commands with WHERE clauses that would extract only the changed data. Each export would use REPLACE INTO statements, meaning the import on staging would overwrite matching rows by primary key without affecting anything else.
I documented every post ID, every table, every dependency relationship, and mapped out the exact sequence of ten SQL dump files that would bring staging to production parity without destroying a single row of cleanup work.
Part Five: Proving It Was Ready
Before any of the data sync work could begin, there was one more thing I needed: confirmation from the client that the staging site was acceptable.
I’d already benchmarked the performance. Time-to-first-byte on the new server averaged 223 milliseconds which was roughly half the 447 milliseconds on the old host. Total page load was also down from 568 milliseconds to 289 milliseconds which was a fifty percent improvement across the board.
I sent a message requesting formal user acceptance testing: check the pages, check the navigation, test the donation forms (noting that live payment processing was intentionally disabled on staging), test the contact forms, log into the admin dashboard, and flag anything that didn’t look right or didn’t match their expectations.
The response came back clean. The client’s stakeholder confirmed that pages, content, forms, the admin dashboard, and plugins all appeared correct, loaded well, and displayed properly.
The only thing left was to execute the sync, activate the final plugins, and build the runbook for the moment when DNS changes and production traffic starts flowing to the new server.
That work continues in Part 4.
This case study describes real work performed by Stonegate Web Security. Client details have been anonymized and certain identifying specifics altered. Technical details, methodologies, and findings are reported accurately.
Related Reading
-
Case Study: Proving a WordPress Staging Site Is Production Ready
Part 2b of this series — ripping out a broken caching layer, profiling every millisecond of page render time, and recovering from a server crash caused by a PHP feature that had no business being enabled. -
Case Study: What It Takes to Build and Prove a WordPress Migration Server
Part 2a of this series — building the server from scratch, deploying the site to staging, and finding live payment gateways before a single visitor touches the site. -
Case Study: How to Audit WordPress Plugins Before a Server Migration
Part 1 of this series — the plugin audit that mapped 88 plugins, uncovered a three-year-old database bug, and became the foundation for everything that followed.