Table of Contents

    Book an Appointment

    INTRODUCTION

    During a recent project for a mid-sized retail client, we were tasked with building a rapid reporting dashboard. The goal was to aggregate order data from a third-party ERP system and push it into a Google Sheet for the finance team to analyze daily trends. It seemed like a straightforward ETL (Extract, Transform, Load) task perfect for a low-code automation tool like n8n.

    However, during the User Acceptance Testing (UAT) phase, we noticed a significant data discrepancy. While the workflow ran successfully without errors, the finance team reported that the sheet was missing recent transactions. Upon investigation, we realized the API was strictly paginating results, returning only the first 100 records per call. The workflow was ingesting the first page and ignoring the rest, leaving critical data behind.

    This challenge—automating the traversal of paginated API endpoints—is a frequent stumbling block in integration projects. This article outlines how we re-engineered the workflow to handle dynamic pagination and bulk data insertion, ensuring complete data visibility.

    PROBLEM CONTEXT

    The client operated a high-volume e-commerce environment where hundreds of orders were processed daily. They needed a lightweight solution to bridge their legacy ERP system with Google Sheets, avoiding the overhead of a full data warehouse implementation for this specific report.

    The architecture was simple:
    1. Trigger the workflow on a schedule (every morning).
    2. Fetch orders from the ERP REST API.
    3. Transform the JSON structure to match the spreadsheet columns.
    4. Append the data to Google Sheets.

    The issue surfaced in step two. Most enterprise-grade APIs enforce pagination to manage server load. Whether using offset-based pagination (e.g., ?page=1) or cursor-based pagination (e.g., ?after=token_xyz), the consuming client must explicitly request subsequent pages. Our initial implementation treated the endpoint as a single-fetch source, resulting in incomplete datasets and inaccurate financial reporting.

    WHAT WENT WRONG

    The failure was not in the tool, but in the logic configuration. When we examined the execution logs, we saw the following behaviors:

    • Static Retrieval: The HTTP Request node was configured to hit the endpoint once. It received a 200 OK response containing 100 items and a metadata field indicating total_pages: 45.
    • Silent Data Loss: Since no error occurred, the workflow proceeded to write those 100 lines to Google Sheets. The remaining 4,400 records were simply never requested.
    • Memory & Timeouts: When we initially attempted a brute-force approach (cloning nodes), we risked hitting execution timeouts and memory limits on the n8n instance, as holding thousands of individual JSON objects in memory requires efficient state management.

    HOW WE APPROACHED THE SOLUTION

    To solve this, we needed a loop mechanism that would continue requesting data until the API stopped returning results. We evaluated three approaches:

    • Recursive Workflows: calling the workflow from within itself. We discarded this as it makes debugging difficult and obscures the lineage of the data.
    • SplitInBatches Node (Legacy): The older method of handling loops in n8n. While functional, it is often more verbose and harder to maintain for simple pagination.
    • Loop Over Items / Conditional Loop: This was our chosen approach. We decided to construct a “Do-While” loop logic using n8n’s core nodes to check for the existence of a “next page” indicator before making the next call.

    We also had to address the writing side. Writing 4,500 rows one by one to Google Sheets would trigger the Google API rate limiter (429 Too Many Requests). We needed to aggregate the full dataset first and then write it in batches.

    FINAL IMPLEMENTATION

    Below is the architectural logic we implemented to resolve the pagination and writing challenges.

    1. Initialization

    First, we set up variables to track the current page and the aggregated data container. We use the Edit Fields (Set) node to initialize the loop context.

    // Pseudo-configuration for Set Node
    {
      "page": 1,
      "hasMore": true,
      "accumulatedData": []
    }
    

    2. The Loop Structure

    We utilized a logical loop. In newer versions of n8n, this can often be handled directly inside the HTTP Request node if “Pagination” is enabled, but for complex APIs (like this legacy ERP), building an explicit graph is often more robust.

    • Step A (HTTP Request): We configured the URL to use an expression: https://api.generic-erp.com/orders?page={{ $json.page }}.
    • Step B (Data Accumulation): We used a Code node to merge the new batch of 100 records into our accumulatedData array.
    • Step C (Condition Check): An “If” node checks the response body. If the API returns an empty array OR if the current page number equals the total_pages metadata, we set hasMore to false.
    • Step D (Increment): If hasMore is true, we increment the page variable by 1 and route the workflow back to the HTTP Request node.

    3. Batch Writing to Google Sheets

    Once the loop finishes (the “False” output of the If node), we have a single large array of data. We pass this to the Google Sheets node.

    Crucial Configuration: instead of “Append,” we often use “Clear and Append” or manage specific ranges to ensure we aren’t creating duplicates if the workflow runs twice. We ensured the data structure was flattened correctly to match the sheet headers.

    4. Validated Logic Example

    Here is a sanitized snippet of the logic used to determine if the loop should continue. This runs in a Javascript/Code node after the HTTP request:

    // Check if the API returned data
    const newItems = items[0].json.data; 
    if (newItems.length > 0) {
      // Return triggers to continue loop
      return [{
         json: {
           continue: true,
           page: items[0].json.page + 1
         }
      }];
    } else {
      // Stop loop
      return [{
         json: {
           continue: false
         }
      }];
    }
    

    LESSONS FOR ENGINEERING TEAMS

    Implementing this fix highlighted several key practices for teams looking to hire workflow automation engineers:

    • Assume Pagination by Default: Never build an API integration assuming the data will fit in one response. Always inspect the API documentation for limit, offset, or cursor parameters.
    • Batch Operations are Mandatory: When you hire python developers or n8n experts, ensure they understand the difference between iterative writing (slow, error-prone) and batch processing. Writing 5,000 rows in one request is infinitely better than 5,000 requests.
    • Memory Management: n8n stores execution data in memory or disk. Infinite loops can crash an instance. Always implement a “safety brake” (e.g., stop after 50 pages) to prevent runaway processes during development.
    • Rate Limiting Strategies: When connecting to Google Sheets, Slack, or CRMs, always implement wait/sleep logic if your loop processes data faster than the destination API allows.
    • Error Handling: If page 45 fails, does the whole workflow fail? We added “On Error Continue” logic to retry specific pages without losing the data from the previous 44 pages.

    WRAP UP

    By restructuring the workflow to handle dynamic pagination and batch processing, we turned a fragile 100-row report into a robust analytics tool capable of processing thousands of orders daily. Automation is rarely just “drag and drop”—it requires architectural thinking regarding data flow, API limits, and state management.

    Social Hashtags

    #n8n #APIPagination #WorkflowAutomation #LowCode #NoCode #DataEngineering #GoogleSheetsAutomation #IntegrationEngineering #AutomationBestPractices #EnterpriseAutomation #SaaSIntegrations #ETLPipelines

    If you are looking to scale your internal automation or need to hire dedicated developers to build resilient data pipelines, contact us to discuss your engineering needs.

    Frequently Asked Questions