Magento 2 and AJAX with optimistic session locking
Recently after pouring into an optimization project, our team discovered something quite interesting about PHP session and Magento 2. Turns out this actually is going to be a relatively serious performance bottleneck with Magento 2 with PHP. This article tries to explain the issue with either built-in session or Redis session storage and Magneto 2. One way to tell if you have this issue and are using Redis session storage you may see a large amount of unexplained mysterious 503 errors showing up on the Magento 2 frontend (for that you can adjust the redis session settings). So we noticed an issue with Magento being a bit slow with a client and there was one transaction as tracked through NewRelic that was slow and it was a client-side service /customer/section/load which gets called several times on a Magento 2 page, typically for the shopping cart AJAX refresh. So here is what we saw. So what is going on here? My first thought when my team showed me this was, strange something must be messed up with Redis as a read should never take this long. Well turns out Redis was working fine so we dug into the code a bit to take a look at the read function.
public function read($sessionId) { // Get lock on session. Increment the "lock" field and if the new value is 1, we have the lock. $sessionId = self::SESSION_PREFIX.$sessionId; $tries = $waiting = $lock = 0; $lockPid = $oldLockPid = null; // Restart waiting for lock when current lock holder changes $detectZombies = false; $breakAfter = $this->_getBreakAfter(); $timeStart = microtime(true); $this->_log(sprintf("Attempting to take lock on ID %s", $sessionId)); $this->_redis->select($this->_dbNum); while ($this->_useLocking) { // Increment lock value for this session and retrieve the new value $oldLock = $lock; $lock = $this->_redis->hIncrBy($sessionId, 'lock', 1); // Get the pid of the process that has the lock if ($lock != 1 && $tries + 1 >= $breakAfter) { $lockPid = $this->_redis->hGet($sessionId, 'pid'); } // If we got the lock, update with our pid and reset lock and expiration if ( $lock == 1 // We actually do have the lock || ( $tries >= $breakAfter // We are done waiting and want to start trying to break it && $oldLockPid == $lockPid // Nobody else got the lock while we were waiting ) ) { $this->_hasLock = true; break; } // Otherwise, add to "wait" counter and continue else if ( ! $waiting) { $i = 0; do { $waiting = $this->_redis->hIncrBy($sessionId, 'wait', 1); } while (++$i < $this->_maxConcurrency && $waiting < 1); } // Handle overloaded sessions else { // Detect broken sessions (e.g. caused by fatal errors) if ($detectZombies) { $detectZombies = false; // Lock shouldn't be less than old lock (another process broke the lock) if ($lock > $oldLock // Lock should be old+waiting, otherwise there must be a dead process && $lock + 1 < $oldLock + $waiting ) { // Reset session to fresh state $this->_log( sprintf( "Detected zombie waiter after %.5f seconds for ID %s (%d waiting)", (microtime(true) - $timeStart), $sessionId, $waiting ), LoggerInterface::INFO ); $waiting = $this->_redis->hIncrBy($sessionId, 'wait', -1); continue; } } // Limit concurrent lock waiters to prevent server resource hogging if ($waiting >= $this->_maxConcurrency) { // Overloaded sessions get 503 errors $this->_redis->hIncrBy($sessionId, 'wait', -1); $this->_sessionWritten = true; // Prevent session from getting written $writes = $this->_redis->hGet($sessionId, 'writes'); $this->_log( sprintf( 'Session concurrency exceeded for ID %s; displaying HTTP 503 (%s waiting, %s total ' . 'requests)', $sessionId, $waiting, $writes ), LoggerInterface::WARNING ); throw new ConcurrentConnectionsExceededException(); } } $tries++; $oldLockPid = $lockPid; $sleepTime = self::SLEEP_TIME; // Detect dead lock waiters if ($tries % self::DETECT_ZOMBIES == 1) { $detectZombies = true; $sleepTime += 10000; // sleep + 0.01 seconds } // Detect dead lock holder every 10 seconds (only works on same node as lock holder) if ($tries % self::DETECT_ZOMBIES == 0) { $this->_log( sprintf( "Checking for zombies after %.5f seconds of waiting...", (microtime(true) - $timeStart) ) ); $pid = $this->_redis->hGet($sessionId, 'pid'); if ($pid && ! $this->_pidExists($pid)) { // Allow a live process to get the lock $this->_redis->hSet($sessionId, 'lock', 0); $this->_log( sprintf( "Detected zombie process (%s) for %s (%s waiting)", $pid, $sessionId, $waiting ), LoggerInterface::INFO ); continue; } } // Timeout if ($tries >= $breakAfter + $this->_failAfter) { $this->_hasLock = false; $this->_log( sprintf( 'Giving up on read lock for ID %s after %.5f seconds (%d attempts)', $sessionId, (microtime(true) - $timeStart), $tries ), LoggerInterface::NOTICE ); break; } else { $this->_log( sprintf( "Waiting %.2f seconds for lock on ID %s (%d tries, lock pid is %s, %.5f seconds elapsed)", $sleepTime / 1000000, $sessionId, $tries, $lockPid, (microtime(true) - $timeStart) ) ); usleep($sleepTime); } } $this->failedLockAttempts = $tries; // Session can be read even if it was not locked by this pid! $timeStart2 = microtime(true); list($sessionData, $sessionWrites) = $this->_redis->hMGet($sessionId, array('data','writes')); $this->_log(sprintf("Data read for ID %s in %.5f seconds", $sessionId, (microtime(true) - $timeStart2))); $this->_sessionWrites = (int) $sessionWrites; // This process is no longer waiting for a lock if ($tries > 0) { $this->_redis->hIncrBy($sessionId, 'wait', -1); } // This process has the lock, save the pid if ($this->_hasLock) { $setData = array( 'pid' => $this->_getPid(), 'lock' => 1, ); // Save request data in session so if a lock is broken we can know which page it was for debugging if (empty($_SERVER['REQUEST_METHOD'])) { $setData['req'] = $_SERVER['SCRIPT_NAME']; } else { $setData['req'] = "{$_SERVER['REQUEST_METHOD']} {$_SERVER['SERVER_NAME']}{$_SERVER['REQUEST_URI']}"; } if ($lock != 1) { $this->_log( sprintf( "Successfully broke lock for ID %s after %.5f seconds (%d attempts). Lock: %d\nLast request of ' . 'broken lock: %s", $sessionId, (microtime(true) - $timeStart), $tries, $lock, $this->_redis->hGet($sessionId, 'req') ), LoggerInterface::INFO ); } } // Set session data and expiration $this->_redis->pipeline(); if ( ! empty($setData)) { $this->_redis->hMSet($sessionId, $setData); } $this->_redis->expire($sessionId, 3600*6); // Expiration will be set to correct value when session is written $this->_redis->exec(); // Reset flag in case of multiple session read/write operations $this->_sessionWritten = false; return $sessionData ? (string) $this->_decodeData($sessionData) : ''; }
Well upon first look I was like interesting why is it locking things in the read function? Then I figured out that is why they call it “Optimistic” session locking. It turns out this sort of logic basically was built in a day were PHP request basically come from one HTTP request and that was about it (not AJAX where you may have 10 in the one-page request, more on that later). Effectively this is done so that when a page may write to the session it will know they have a lock, and they own it and can write immediately without issue. Why does this not work so well with AJAX? Well, Magento 2 uses AJAX to get around full page cache private data issues so that it can cache the full page, and fill in the private data later with AJAX. This is a good and typical tactic but what happens when a page fires off 5-10 AJAX requests. You can see the ones here made by our client’s site.
As you can see there are 11 AJAX calls here. Now many are from the theme code, as they properly privatized the data with an AJAX call according to the Magento developer documentation. So this brings us back to the graph above. If you notice, since it’s read locking, and each has to complete its read, then release the lock for the other request to obtain its lock and get it’s read lock. If you look at the new relic graph, you will see now why a read takes so long, it’s simply waiting for one of the other AJAX requests to finish before it gets it’s read lock. It’s also why the graph has a nice little waterfall effect, effectively with this mechanism it will execute (or at least finish executing) sequentially. That’s a problem in my book.
If it’s going to take to long to fully AJAXify a page, users will drop off and the pages may seem very slow to load. I also suspect, that many of these ajax request, for things like product stock etc, could even have a cache header put on them. Whaaaaat? Cache your ajax!!! but that defeats the purpose. I suspect that even putting up a 5 minute TTL on these would save lots, and lots of traffic and make the user experience so much better. (so ok absolutly not for the cart!)
Anyway, with the move toward PWA as a frontend I am not sure how much longer the full frontend of Magento will be around, but for sure this is an issue with PHP and Magneto that could use some creative solutions, I think Colin Mollenhour Redis module has added read-only support for Magento controllers, but I have not seen this version show up in Magento just yet.