INTRODUCTION
While working on a telehealth platform, we were tasked with building an AI-driven clinical assistant. The goal was to capture audio from virtual doctor-patient consultations, transcribe it in real-time, and automatically extract medical symptoms and diseases to pre-populate clinical notes.
To achieve this, we set up a robust streaming architecture using SocketIO for ingesting microphone audio and OpenAI’s Whisper model for continuous transcription. For the natural language processing (NLP) layer, MedSpaCy emerged as the obvious choice due to its clinical text processing capabilities, particularly its context and negation detection components.
However, during initial integration, we encountered a significant architectural hurdle. MedSpaCy is designed primarily for processing completed, static clinical documents. Furthermore, its entity recognition relies heavily on predefined TargetRules. Manually hardcoding these rules in a Python file for thousands of known diseases and their associated symptoms was completely unscalable. Additionally, running clinical NLP on fragmented, live text chunks from a streaming transcriber risked breaking the contextual context—such as separating a negation (“patient denies”) from a symptom (“chest pain”).
This challenge inspired this article. By sharing how we automated rule generation from a database and managed the streaming text context, we hope other engineering teams can avoid the scalability traps of hardcoded NLP pipelines.
PROBLEM CONTEXT
In our architecture, the client applications continuously streamed audio buffers to our backend via SocketIO. A dedicated thread handled the accumulation of these audio chunks, resampling them to 16kHz, and passing them to a Whisper transcription model once a specific time threshold was reached. The resulting text was then emitted back to the frontend and persisted in a MongoDB database.
The business use case required us to intercept this transcribed text in real-time and pass it through an NLP pipeline to identify structured medical data. Our target database schema for diseases was straightforward:
{
"_id": "uuid",
"disease": "Hypertension",
"symptoms": ["headache", "shortness of breath", "nosebleeds"]
}
The core problem appeared when we initialized MedSpaCy. The standard approach requires instantiating the NLP pipeline and manually feeding TargetRule objects into the target_matcher component. When you are dealing with comprehensive medical ontologies, maintaining this in code leads to massive deployment payloads, merge conflicts, and the inability to update the medical dictionary without a full code redeployment.
WHAT WENT WRONG
Our early prototypes highlighted two critical architectural oversights:
1. The Scalability of Hardcoded Rules: Defining symptoms and diseases programmatically meant our NLP microservice was tightly coupled to medical data. Every time a medical subject matter expert updated our clinical taxonomy, a developer had to translate those updates into Python TargetRule objects. This creates an immediate bottleneck when you scale. Companies that recognize this anti-pattern often hire python developers for scalable data systems to decouple configuration from application logic.
2. Context Fragmentation in Streaming: Whisper was configured to decode audio based on fixed utterance intervals. This resulted in fragmented transcription chunks. For example, Chunk A might be “The patient reports experiencing severe”, and Chunk B might be “chest pain but denies nausea.” If we passed these chunks individually to MedSpaCy, it would fail to recognize “severe chest pain” as a continuous entity and would completely miss the negation applied to “nausea”. MedSpaCy evaluates text based on syntactic dependencies, which requires complete sentences to function accurately.
HOW WE APPROACHED THE SOLUTION
To resolve the hardcoded rule issue, we designed a dynamic rule builder. Instead of defining TargetRule objects in the source code, we would query the MongoDB collection during the application’s startup phase (or via a triggered cache refresh), iterate through the disease and symptom records, and construct the MedSpaCy rules programmatically in memory.
To address the streaming context fragmentation, we implemented a sliding text buffer. Rather than processing raw Whisper chunks immediately through MedSpaCy, we accumulated the text into a rolling buffer. We then used basic sentence boundary detection to determine when a complete thought was formed, passing only complete sentences to the NLP model. The remainder of the text stayed in the buffer for the next transcription event.
FINAL IMPLEMENTATION
Below is the sanitized and abstracted implementation of our solution, demonstrating how to bridge a database of diseases with a MedSpaCy streaming pipeline.
Automating MedSpaCy TargetRules
First, we created an initialization function that fetches the ontology from MongoDB and dynamically registers the rules into the pipeline.
import medspacy
from medspacy.ner import TargetRule
def initialize_nlp_pipeline(db_collection):
# Load the base MedSpaCy pipeline
nlp = medspacy.load()
target_matcher = nlp.get_pipe("medspacy_target_matcher")
# Fetch all disease records from the database
cursor = db_collection.find({}, {"disease": 1, "symptoms": 1})
dynamic_rules = []
for doc in cursor:
disease_name = doc.get("disease")
symptoms = doc.get("symptoms", [])
if disease_name:
# Add rule for the disease
dynamic_rules.append(
TargetRule(disease_name, category="DISEASE")
)
for symptom in symptoms:
# Add rule for each symptom
dynamic_rules.append(
TargetRule(symptom, category="SYMPTOM")
)
# Add dynamically generated rules to the matcher
target_matcher.add(dynamic_rules)
return nlp
Managing the Streaming NLP Context
Next, we modified the event handler that receives text from Whisper. We introduced a session-based buffer to accumulate text and safely process sentences.
import re
class StreamingNLPProcessor:
def __init__(self, nlp_pipeline):
self.nlp = nlp_pipeline
# Store buffers by session ID
self.session_buffers = {}
def process_chunk(self, sid, new_text):
if sid not in self.session_buffers:
self.session_buffers[sid] = ""
# Append new chunk
self.session_buffers[sid] += " " + new_text.strip()
buffer_text = self.session_buffers[sid]
# Look for sentence boundaries (basic implementation)
# In production, consider more robust boundary detection
sentences = re.split(r'(?<=[.!?])s+', buffer_text.strip())
extracted_entities = []
# If we have more than one segment, it means we have at least
# one complete sentence to process.
if len(sentences) > 1:
# Process all complete sentences
for complete_sentence in sentences[:-1]:
doc = self.nlp(complete_sentence)
for ent in doc.ents:
# Capture entity, its category, and whether it is negated
is_negated = any([mod.category_ == "NEGATED_EXISTENCE" for mod in ent._.modifiers])
extracted_entities.append({
"entity": ent.text,
"label": ent.label_,
"negated": is_negated
})
# Keep the incomplete remainder in the buffer
self.session_buffers[sid] = sentences[-1]
return extracted_entities
By decoupling the rules from the application code and buffering the transcription chunks, we achieved a scalable, context-aware NLP extraction pipeline that runs smoothly in real-time.
LESSONS FOR ENGINEERING TEAMS
When dealing with real-time NLP and clinical data extraction, architectural foresight is critical. Here are the key takeaways our team extracted from this project:
- Decouple Configuration from Code: Never hardcode domain taxonomies (like medical symptoms) into your application logic. Always drive NLP entity matching dynamically via a persistent data store.
- Manage Streaming Context: NLP models rely on syntactic dependencies. Processing arbitrary chunks of a stream will break negation and context detection. Always implement a sliding window or sentence boundary buffer.
- Implement Cache Strategies: Querying a database to build MedSpaCy rules is fine on startup, but if the database updates frequently, implement an in-memory caching layer (like Redis) and use pub/sub to trigger pipeline refreshes.
- Resource Isolation: Running Whisper inference and MedSpaCy pipeline processing on the same node can lead to CPU starvation. Decouple these processes using an event bus (e.g., Kafka or RabbitMQ) and deploy them independently. If you need help architecting this, hire ai developers for production deployment to ensure resilient infrastructure.
- Graceful Degradation: If the NLP pipeline lags behind the streaming audio, ensure the raw transcripts are still saved to the database. You can always run a batch NLP job asynchronously to backfill missing entity extractions.
- Handle Custom Terminology: MedSpaCy’s base vocabulary might not cover localized slang or emerging clinical terms. Be prepared to implement custom tokenizer exceptions and spelling normalizers before the text hits the
TargetMatcher. For complex domain adaptations, organizations often hire nlp developers for healthcare automation to refine custom NER models.
WRAP UP
Building a live medical symptom extraction system requires marrying continuous streaming technologies with complex natural language processing requirements. By dynamically generating MedSpaCy rules from MongoDB and implementing a robust sliding sentence buffer, we successfully bridged the gap between Whisper’s chunked outputs and MedSpaCy’s contextual requirements.
Whether you are modernizing clinical workflows or building real-time voice analytics, ensuring your architecture scales efficiently without rigid hardcoding is the mark of a mature engineering team. If you are looking to scale your engineering capabilities and need to hire software developer expertise to build resilient backend and AI systems, contact us to discuss how our dedicated remote engineering teams can accelerate your project.
Social Hashtags
#AIHealthcare #MedSpaCy #ClinicalNLP #HealthcareAI #MedicalAI #NLPDevelopment #AIForHealthcare #WhisperAI #HealthTech #MachineLearning #PythonAI #RealTimeAI #AIEngineering #AIArchitecture #HealthTechInnovation
Frequently Asked Questions
Training a custom Named Entity Recognition (NER) model requires a massive amount of highly annotated clinical data, which is expensive and time-consuming to acquire. MedSpaCy’s rule-based approach using target matching combined with context rules (like negation and historical mapping) provides highly accurate clinical results out-of-the-box without requiring custom model training.
Fetching several thousand records from MongoDB and instantiating TargetRule objects usually takes only a few seconds. Since this is done once during service initialization or via a background thread during a cache refresh, it does not impact the latency of the live streaming transcription.
Whisper performs exceptionally well generally, but for highly specialized medical terminology, it may struggle compared to domain-specific ASR models. Providing initial prompts with medical context to Whisper or utilizing a specialized medical ASR model as the upstream provider can significantly improve transcription accuracy.
In cases of continuous speech without clear punctuation (which Whisper sometimes outputs), the buffer could grow indefinitely, leading to delayed NLP processing. To mitigate this, you should implement a maximum token or time-based threshold in the StreamingNLPProcessor to force an evaluation and flush the buffer.
Yes, provided the database and surrounding infrastructure are configured to meet compliance standards (such as HIPAA). This includes utilizing Field Level Encryption (FLE), enforcing TLS in transit, maintaining strict access controls, and ensuring comprehensive audit logging is enabled.
Success Stories That Inspire
See how our team takes complex business challenges and turns them into powerful, scalable digital solutions. From custom software and web applications to automation, integrations, and cloud-ready systems, each project reflects our commitment to innovation, performance, and long-term value.

California-based SMB Hired Dedicated Developers to Build a Photography SaaS Platform

Swedish Agency Built a Laravel-Based Staffing System by Hiring a Dedicated Remote Team

















