INTRODUCTION
While working on a core ledger modernization project for a FinTech client, we encountered a critical stability issue that only surfaced during peak load. The system, designed to handle high-frequency peer-to-peer transfers, began throwing sporadic transaction failures during our stress testing phase, specifically when simulating high-volume concurrent transfers between a dense graph of users.
The application was a robust double-entry bookkeeping system running on a microservices architecture. Initially, the errors seemed random—occasional 500s that would vanish upon retry. However, as we scaled the load to match production expectations, the database layer began to choke, not on CPU or memory, but on lock contention.
We realized that our standard ORM handling of transactions, while sufficient for low-volume apps, was creating race conditions at the database level. This challenge inspired this article, where we detail how we moved from implicit locking to deterministic ordering to guarantee data integrity in a financial environment.
PROBLEM CONTEXT
The system in question was a digital wallet platform responsible for real-time fund transfers. The core requirement was strict ACID compliance: money deducted from Wallet A must be simultaneously added to Wallet B.
The architecture utilized a Node.js backend interacting with a PostgreSQL database. The data model relied on a `Balances` table where every user had a record. A typical transfer operation involved:
- Starting a transaction (`BEGIN`).
- Reading the sender’s balance (lock row).
- Verifying funds.
- Updating the sender’s balance.
- Updating the receiver’s balance.
- Committing the transaction (`COMMIT`).
For a standard operational load, this logic held up well. The database handled row-level locking efficiently. However, the issue appeared when we introduced high concurrency involving reciprocal transfers—where User A sends to User B, while User B simultaneously sends to User A.
WHAT WENT WRONG
The root cause was identified through the database error logs, which began flooding with `deadlock detected` messages. Specifically, PostgreSQL’s deadlock detector was killing processes to protect the integrity of the data.
The logs revealed the following pattern:
Process 12345 waits for ShareLock on transaction 98765; blocked by process 12346. Process 12346 waits for ShareLock on transaction 98764; blocked by process 12345. HINT: See server log for query details.
Here is what was happening architecturally:
- Transaction 1 (Alice to Bob): Locks Alice’s row, then attempts to lock Bob’s row.
- Transaction 2 (Bob to Alice): Locks Bob’s row, then attempts to lock Alice’s row.
If these two transactions occurred within milliseconds of each other, Transaction 1 would hold Alice and wait for Bob, while Transaction 2 held Bob and waited for Alice. Neither could proceed. The database engine, detecting the cycle, would arbitrarily kill one transaction to let the other survive.
While retries could mitigate this, they introduced unacceptable latency and user friction in a real-time payment ecosystem. Relying on retries for architectural flaws is not a strategy we accept when clients hire backend developers for fintech systems.
HOW WE APPROACHED THE SOLUTION
Our initial investigation involved reviewing the ORM generated SQL. We confirmed that the application logic did not enforce a specific order for updating records. The code simply processed the `sender` and `receiver` as they appeared in the payload.
We considered three potential solutions:
1. Optimistic Locking: Using a version column. This would eliminate database locks but would result in a high number of application-level failures requiring complex retry logic on the client side.
2. Serialize Isolation Level: Setting the database isolation level to `SERIALIZABLE`. While safe, this drastically reduced throughput and caused excessive serialization errors under load.
3. Deterministic Locking (Pessimistic): Enforcing a strict rule on the order in which resources are locked, regardless of the transaction direction.
We chose the third approach. It provided the strongest guarantee of stability without the performance overhead of full serialization or the complexity of client-side retries.
FINAL IMPLEMENTATION
The solution was to refactor the transfer service to sort the resources by their primary key (ID) before initiating any database locks. By ensuring that every transaction locks resources in the exact same order (e.g., lowest ID to highest ID), we made deadlocks mathematically impossible in this context.
If Alice (ID: 10) sends to Bob (ID: 20), the system locks 10 then 20.
If Bob (ID: 20) sends to Alice (ID: 10), the system also locks 10 then 20.
In the second scenario, Bob’s transaction attempts to lock Alice (ID: 10) first. If Alice’s transaction already holds the lock on 10, Bob’s transaction simply waits. It does not hold a lock on 20 yet, so no circular dependency is created.
We implemented a utility function within the service layer to handle this logic:
// Generic representation of the sorting logic
async function performTransfer(senderId, receiverId, amount) {
const db = await pool.connect();
// Deterministic ordering: Always lock lowest ID first
const firstLockId = senderId < receiverId ? senderId : receiverId;
const secondLockId = senderId < receiverId ? receiverId : senderId;
try {
await db.query('BEGIN');
// Lock accounts in order
// SELECT FOR UPDATE locks the rows preventing concurrent modifications
const accountA = await db.query(
'SELECT * FROM accounts WHERE id = $1 FOR UPDATE',
[firstLockId]
);
const accountB = await db.query(
'SELECT * FROM accounts WHERE id = $1 FOR UPDATE',
[secondLockId]
);
// Perform balance checks and updates logic here...
// (Logic omitted for brevity)
await db.query('COMMIT');
} catch (e) {
await db.query('ROLLBACK');
throw e;
} finally {
db.release();
}
}After deploying this fix, we re-ran the load tests. The `deadlock detected` errors dropped to zero. Throughput remained high because the lock wait times were minimal and strictly queued, rather than blocking indefinitely until a timeout/kill.
LESSONS FOR ENGINEERING TEAMS
When building high-concurrency systems, relying on default database behaviors is rarely enough. Here are the key takeaways from this implementation:
- Understand Database Locks: Developers must know the difference between Row Share, Row Exclusive, and Update locks. ORMs often abstract this away to the detriment of performance.
- Enforce Determinism: In any system where multiple resources are accessed atomically, enforce a strict locking order to prevent circular dependencies.
- Load Test for Concurrency: Functional tests verify logic; load tests verify architecture. Deadlocks usually only appear under specific concurrent conditions.
- Observability is Key: Without access to the `pg_stat_activity` and deadlock logs, diagnosing this would have been a guessing game.
- Avoid “Try-Catch” Architecture: Don’t use error handling to mask architectural issues. If you are catching deadlocks, you have a design problem, not a runtime glitch.
WRAP UP
Solving concurrency issues requires a deep understanding of database internals and transaction isolation levels. By shifting from arbitrary locking to deterministic ordering, we secured the reliability of our client’s financial platform. Whether you are looking to hire postgres experts for database tuning or need to hire software developer teams for building scalable ledgers, architectural discipline is what separates a prototype from a production system.
Social Hashtags
#FinTech #DatabaseEngineering #PostgreSQL #HighConcurrency #BackendEngineering #DistributedSystems #LedgerSystems #ScalableSystems #SoftwareArchitecture #TransactionManagement #FinTechEngineering #PaymentSystems
If you are facing similar stability challenges in your application, contact us to discuss your architecture.
Frequently Asked Questions
Optimistic locking assumes conflicts are rare and checks for data integrity only upon commit (usually via version numbers). Pessimistic locking locks the record as soon as it is read, preventing others from modifying it until the transaction is complete.
Deadlocks occur due to circular dependencies (A waits for B, B waits for A). By sorting IDs, you ensure that all transactions traverse the resource graph in the same direction, making cycles impossible.
It can slightly increase latency for individual transactions that must wait in a queue for a lock. However, it significantly improves overall system throughput by eliminating failed transactions and deadlock retries.
No. The concept of deterministic locking (or resource ordering) is a universal computer science principle applicable to MySQL, SQL Server, Oracle, and even multi-threaded application programming.
You should consider bringing in specialists when your application moves from simple CRUD operations to complex transactional workflows, high-concurrency environments, or when you notice performance degradation during peak loads.
Success Stories That Inspire
See how our team takes complex business challenges and turns them into powerful, scalable digital solutions. From custom software and web applications to automation, integrations, and cloud-ready systems, each project reflects our commitment to innovation, performance, and long-term value.

California-based SMB Hired Dedicated Developers to Build a Photography SaaS Platform

Swedish Agency Built a Laravel-Based Staffing System by Hiring a Dedicated Remote Team

















