INTRODUCTION
While working on a core ledger modernization project for a FinTech client specializing in high-frequency peer-to-peer payments, we encountered a concurrency challenge that standard testing initially missed. The system was designed to handle thousands of concurrent transactions per second, ensuring strict ACID compliance for every fund transfer. However, during a load simulation mirroring “Black Friday” traffic volumes, we noticed a non-trivial percentage of transactions failing not due to logic errors, but due to database timeouts.
We realized that as concurrency scaled, the database was aggressively terminating transactions to protect itself. The root cause wasn’t hardware capacity or index inefficiency—it was a fundamental architectural oversight in how resources were being locked. This challenge inspired this article, dissecting how we moved from erratic deadlocks to a stable, deterministic locking mechanism. It serves as a guide for engineering leaders looking to stabilize high-volume transactional systems.
PROBLEM CONTEXT
The application in question was a double-entry bookkeeping system serving a digital wallet platform. In this domain, a single “transfer” operation is actually two distinct database updates wrapped in a single transaction context:
- Debit the Sender’s account balance.
- Credit the Receiver’s account balance.
- Insert a ledger entry recording the movement.
The business requirement demanded strict consistency; money could not be created or destroyed. Consequently, we utilized pessimistic locking (`SELECT FOR UPDATE`) to ensure that no other transaction could modify an account’s balance while a transfer was in progress. Under low to moderate load, this architecture performed flawlessly. The latency was low, and data integrity was 100%.
However, the issue surfaced when the system scaled to support a surge in user activity where localized clusters of users were transferring funds back and forth rapidly.
WHAT WENT WRONG
The failures appeared in the application logs as `Deadlock found when trying to get lock; try restarting transaction`. In the database monitoring tools, we observed a spike in “rolled back” transactions.
The architectural oversight was the order in which locks were acquired. Consider two users, User A and User B, initiating transfers simultaneously:
- Transaction 1 (A pays B): Locks Record A, then attempts to lock Record B.
- Transaction 2 (B pays A): Locks Record B, then attempts to lock Record A.
If these two transactions execute at the exact same millisecond, Transaction 1 holds the lock on A and waits for B. Transaction 2 holds the lock on B and waits for A. Neither can proceed. The database deadlock detector eventually steps in and kills one of the transactions to let the other proceed.
In a high-velocity environment, simply retrying the transaction (the standard advice) created a “retry storm,” further bogging down the database with failed lock acquisition attempts. This is a classic scenario where companies realize the need to hire software developer teams with deep backend architectural experience rather than just feature implementation skills.
HOW WE APPROACHED THE SOLUTION
We gathered the engineering team to analyze the deadlock graphs provided by the database engine. We evaluated three potential solutions:
1. Optimistic Locking:
Instead of locking rows, we could use a version column. If the version changed between read and write, the transaction fails.
Tradeoff: Under high contention (hot accounts), this leads to excessive retries and poor user experience.
2. Queue-Based Serialization:
Push all transfers into a single queue and process them sequentially.
Tradeoff: This destroys scalability. The throughput is limited by the processing speed of a single consumer.
3. Deterministic Resource Ordering:
Enforce a rule where locks are always acquired in a specific mathematical order, regardless of the transaction direction.
Decision: We chose this approach. It maintains parallelism while mathematically guaranteeing that circular dependencies (deadlocks) cannot occur.
FINAL IMPLEMENTATION
The fix involved refactoring the service layer to implement a canonical sorting strategy before interacting with the database repository. We mandated that whenever a transaction involves multiple resources (accounts), the IDs of those resources must be sorted, and locks must be acquired in that sorted order.
Here is a sanitized logic representation of the fix:
// Generic representation of the Transfer Service Logic
public void executeTransfer(Long sourceId, Long targetId, BigDecimal amount) {
Long firstLockId;
Long secondLockId;
// DETERMINISTIC SORTING
// Always lock the smaller ID first, then the larger ID.
if (sourceId < targetId) {
firstLockId = sourceId;
secondLockId = targetId;
} else {
firstLockId = targetId;
secondLockId = sourceId;
}
transactionManager.executeInTransaction(() -> {
// Acquire locks in strict order
Account first = accountRepo.findByIdAndLock(firstLockId);
Account second = accountRepo.findByIdAndLock(secondLockId);
// Perform business logic (Debit/Credit)
// Note: We must identify which account is source/target
// regardless of locking order.
if (first.getId().equals(sourceId)) {
first.debit(amount);
second.credit(amount);
} else {
second.debit(amount);
first.credit(amount);
}
accountRepo.save(first);
accountRepo.save(second);
});
}
Validation:
We redeployed the service and re-ran the “Black Friday” load test. The deadlock exceptions dropped to zero. While individual transaction latency increased slightly (microseconds) due to the wait times for locks on hot accounts, the overall system throughput stabilized because we eliminated the rollback-and-retry overhead.
This implementation proved critical for the client’s stability. When you hire backend developers for financial systems, ensuring they understand concurrency patterns like this is non-negotiable.
LESSONS FOR ENGINEERING TEAMS
Reflecting on this implementation, here are the key takeaways for technical leaders:
- Database constraints are not enough: Foreign keys ensure referential integrity, but application-level logic is required to ensure transactional consistency in multi-row updates.
- Reproducibility requires load: Concurrency bugs rarely show up in unit tests or local dev environments. You must test with high concurrency simulation.
- Deadlocks are usually architectural: If you see deadlocks, do not just increase timeout thresholds. Analyze the lock acquisition order.
- Deterministic ordering is powerful: Simple sorting of resource IDs is a lightweight, robust way to prevent circular dependencies in distributed systems.
- Expertise matters: Complex locking strategies require engineers who understand database internals. If your team lacks this depth, it may be time to hire database architects for high-concurrency apps to audit your core transaction paths.
WRAP UP
Handling high-concurrency transactions requires looking beyond the code and understanding how the database engine manages resources. By switching to a deterministic locking strategy, we eliminated deadlocks and ensured the reliability of a critical financial ledger.
Social Hashtags
#DatabaseDeadlocks #HighConcurrency #FinTechEngineering #BackendArchitecture #ACIDTransactions #DistributedSystems #LedgerSystems #DatabaseDesign #ScalableSystems #EngineeringLeadership
If you are facing stability issues in your high-scale applications, contact us to discuss how our dedicated engineering teams can help.
Frequently Asked Questions
Pessimistic locking locks the record in the database immediately when it is read, preventing others from modifying it until the transaction ends. Optimistic locking allows multiple users to read the record but checks if the data has changed before saving updates, failing if a conflict is detected.
Deadlocks occur when two processes hold a resource the other wants (Circular Wait). By forcing all processes to acquire locks in the same order (e.g., Low ID to High ID), you make a circular wait mathematically impossible.
It can slightly increase latency for specific "hot" records because transactions must wait their turn rather than failing immediately. However, it significantly improves overall system throughput by eliminating the heavy cost of rolling back and retrying failed transactions.
This strategy works best within a single database. In a microservices architecture using distributed transactions (Sagas), you need different patterns like semantic locking or a dedicated coordination service, as you cannot easily lock rows across different database instances.
Success Stories That Inspire
See how our team takes complex business challenges and turns them into powerful, scalable digital solutions. From custom software and web applications to automation, integrations, and cloud-ready systems, each project reflects our commitment to innovation, performance, and long-term value.

California-based SMB Hired Dedicated Developers to Build a Photography SaaS Platform

Swedish Agency Built a Laravel-Based Staffing System by Hiring a Dedicated Remote Team

















