You are considering re-platforming a major customer-facing website. But you are rightly concerned that it could negatively impact business during the cutover. How would you manage this? Here are some learnings from our recent re-platforming to a MACH architecture. These learnings also largely build upon my experience from a previous consulting life, where we re-platformed major billion-dollar-plus eCommerce websites. Using this approach, we migrated smoothly and also saw a modest increase in conversion and revenue-per-visitor metrics during and post-migration.
No big bang: Whether it is an eCommerce re-platforming or any other IT re-platforming project, the big-bang approach rarely works because today's complex systems have too many sub-systems and multiple internal and external integrations. Cutting over all those in one fell swoop requires extensive planning and coordination across the many internal and external stakeholders. Usually, this is where things go wrong because one small miss can cause a ripple effect leading to significant disruption. Alternatively, the more prudent option is not to bite more than we can chew and migrate one function or sub-system after the other. This approach is also popularly known as the strangler design pattern.

Waves and throttling approach: There are two ways of breaking down the quantum of change to make it bite-sized. Splitting it by functions, which we called waves, and then further splitting it by a percentage of the traffic which we called throttling. We used both ways simultaneously to manage risk.
Wave 1 consisted of the discovery part of the website. Usually, in most eCommerce sites, the majority of the page hits are to the discovery area, i.e., the Home page, Search, and Product Listing/category/Brand pages where the customer is discovering all the product offerings offered on the site. In this part of the customer journey, customers are enticed to zero in on a product they want to buy. It is also usually the visually rich portion of the website, sometimes with integrations into CMS, DAM, plug-ins that enrich content, and syndicated content which might include reviews. This website section receives high traffic and was visually richer (i.e., style), with many front-end integrations. Hence it was amenable to splitting traffic easily, and we chose this as our wave 1. The objective was to prove the new visual style guides, layouts, and front-end integrations before we got into the more difficult transactional portion of the website.
Wave 2 consisted of PDP, Cart, Checkout, and My account. By this time, the visual styling was all figured out. The focus was mainly on the transactional complexity, handling different payment methods, integrations with fraud verification (Signifyd), the backend, etc. Moving these functionalities to Wave 2 also gave us time to figure out the transactional elements across all systems. Also, the stakeholders here are an additional set of people, mainly in the organization's order management and fulfillment teams. Not having to involve those busy teams during wave one made it easier for the project team.
Throttling refers to releasing a feature to a small select segment of users based on some pre-defined criterion. It reduces the risk of rolling out a feature to a large audience at once. For this purpose, we used Akamai's Audience Segmentation Cloudlet at our CDN layer. Conceptually this works precisely like Hogwart's "Sorting hat" (from Harry Potter) to determine which way each new incoming user's request goes. Unlike the Sorting hat which works with magic, here we can actually define the segmentation criterion. We tried one or more of the criteria of

a) random allocation,
b) the state from which traffic originated (e.g., NY),
c) the device type (e.g., Mobile or Desktop), or
d) other parameters as shown in the screenshot.
This allowed us to compare the metrics for the old and new systems and narrow the problem to one of these segments. Once we fixed those problems, we could throttle up those segments. We also considered AB test tools, such as Google Optimize, for this purpose, but they require the first page visited on the site always goes on one side. The AB test tool's script then runs on that side and determines if this user should stay there or go to the other side. Sending them to the other side becomes a challenge because there will be an inconvenient page refresh. It also could pollute the metrics, not to mention other complications on slow connections. So we narrowed it down to Akamai's Cloud segmentation tool.
The above approach required further careful planning to ensure the following:
Force landing /Sticky users: Users selected for the new experience must stay with it and not switch back and forth on their subsequent visits, at least if they are visiting from the same device. This is important both for protecting the User experience as well as not polluting the metrics. We also needed a back door for testing and troubleshooting. Hence, we implemented a cookie mechanism along with a URL parameter to set/reset that cookie which always forced the user to land on the source or target system. This cookie also ensured the users stuck to the same side on subsequent visits once the "sorting hat" determined which system the user went to on their first visit.
URL structure:
One of the advantages of the BigCommerce platform is that it supports defining the URL structure for any of our pages to whatever we want it to be. We maintained the same URLs across both platforms through the migration process. The throttling was much simpler because the same URL works on both platforms. This also allowed us to de-risk the migration from any negative SEO impact. If the target platform does not support this feature, we could still manage this through 302 redirects (on the target platform) at the cost of additional latency. The 302s are temporary redirects, which must be replaced with 301s after the migration is complete.
SEO, Robots.txt, and sitemap.xml: During our migration, we directed all bots to the old platform, and except for the selected user segment, all other traffic never saw the target platform until the end stages of migration. This approach mitigated any negative SEO impact.
Analytics: One of the critical prerequisites to the migration was identifying the KPIs and metrics related to each page being throttled. The corresponding analytics tags were instrumented to capture these metrics for all aspects of the customer journey both on the source and target platforms. Yes, we had to do some tagging on the source platform to measure the migration's success. Every day, we compared the relative performance of both platforms along several dimensions for each metric. We used these comparisons to identify several minor issues and tweak the target system until we found it was performing the same or better than the source before we throttled up the page to 100%. This approach was really the secret to our success in achieving better business performance, even during migration.
Troubleshooting: We used tools like FullStory and QuantumMetrics to troubleshoot issues by replaying what precisely the user had seen when the metrics were not trending in the right direction, and we could not see any problems in the logs. These tools allowed us to visually experience exactly as the customer saw it and provided visibility into the browser-side errors as well. They were invaluable in troubleshooting browser-side issues.
Microservices versioning: To support throttling and multiple waves at the same time, most services or APIs had to support multiple versions in production; this also ensures that consumers of the service do not need to synchronize their releases with the rollout of the service. The service can be rolled out on its own schedule without impacting others. The consuming applications can be modified later and tested against the new version of the service. This also allows consumers to quickly roll back to the older version if problems arise during adoption. Amazon Web Services had successfully pioneered this best practice, and it is a foundation of composable (MACH) architectures.
Monitoring and alerts: We instrumented monitoring for errors in logs in addition to key business metrics. Automated alerts were sent out to the corresponding stakeholders if the metrics breached the established thresholds. Proactive alerts helped us address issues before they could cause any negative impact on business.
Environments: We needed four environments for each of the source and target stacks. (Dev, UAT1, UAT2, and Production). DevSecOps further supported this to build CI/CD pipelines to ensure a repeatable process and code integrity. Developers initially worked on their local, and then their Pull-request (PR) was merged into the Dev environment after a code review. Typically these pull requests correspond to a particular feature. This would then be deployed to either UAT1 or UAT2, depending on the wave the feature belonged to. A collection of such PRs comprised a release. Finally, the release was deployed to production after the automated regression testing, manual, and performance testing.
Automated testing: We used an automated testing framework integrated into our CI/CD pipeline which was critical in rolling out these changes quickly. This was an automated suite of about a thousand test cases. Every change had to go through manual testing of that feature along with automated regression testing, which ensured we did not break anything else in the complex ecosystem. We used the RIO framework for automating and analytics on our testing.
Finally, the attitude and approach: This is an obvious but extremely critical part. "It takes a village to raise a child."Many different internal and external organizations were associated with this project, and all operated as one team. We had daily sync-up calls where we discussed issues, reviewed metrics, and made decisions. We had a bias for action and celebrated small successes. If we hit a wall, we looked for ways around it and, in some cases, bulldozed the walls. We believed “Perfect” is the enemy of “Good” and as long as the new site metrics were better or equal to the old site we moved forward. (After creating a ticket for ourselves to come back later and “perfect” it) All this could not have happened without a large team of committed individuals (employee, partners, vendors, contractors) who invested their time, intellect, and energy and above all, a very supportive C-Suite. So, make sure you have a strong team, build a culture of teamwork, and engage the right partners to turn to when you are in a pickle.
The Project: Conn's is a publicly traded $1.5 billion retailer and a lender with 160+ stores in 15 states in the southern USA. Conn's offers a wide range of in-house financing options. Conn's also offers next-day delivery and/or same-day pickup as indicated on the website based on geo-fencing the user to the nearest DC and store. Pre-migration, the technology stack consisted of Magento 2.4.3 deployed on AWS monitored by new relic, Search on Klevu, DAM on Cloudinary, fraud by Signifyd, and integrations into the in-house backend that acts as our ERP, OMS, and POS system. All of this was behind the Akamai CDN and WAF. This Magento implementation was migrated to a SaaS-based BigCommerce (BC) system with the Search and Browse (PLP, Search pages) served by Klevu on BC. Other technologies had to be retrofitted to the BC SaaS platform and the site is continuously improving with better features. The MVP (home page) was launched rapidly within three months from the start of development. Then over the next six months, the remaining features were developed in parallel with the actual migration process using this throttling and wave approach.
The Author: Prasad Tangirala is the Vice President of eCommerce engineering at Conn's HomePlus and a MACH Ambassador. He brings over two decades of digital transformation experience, holding engineering leadership roles at the likes of Amazon, Apple, Cognizant, Fidelity, Wells Fargo, silicon valley startups, etc. Prasad has a master's in computer science from IIT Kanpur, a certificate in Strategy Management from Wharton, and is a Scrum master. Prasad loves to share his experiences building large customer-facing systems at www.linktr.ee/tvprasad/.