Object-Oriented Thinking in an AI World
Why good software architecture principles still matter when building with LLMs
The Prompt-Driven Chaos
I've been building software for over 25 years. I've seen frameworks come and go. I've watched entire paradigms shift—from procedural to object-oriented, from monoliths to microservices, from REST to GraphQL.
But nothing has made developers abandon good design principles faster than LLMs.
I get it. The first time you see GPT-4 write working code from a natural language prompt, it feels like magic. Why bother with clean architecture when you can just ask the AI to "make it work"?
Here's why: because six months from now, when that AI-generated code is in production and breaking in weird ways, you're going to wish you'd treated it like the complex, stateful, non-deterministic component it actually is.
LLMs aren't magic. They're APIs that return text. And like any external service you integrate into your system, they need to be wrapped, abstracted, and managed with discipline.
Let me show you how.
The False Choice: AI vs. Architecture
There's this weird narrative emerging that says: "In the age of AI, traditional software engineering is obsolete."
That's nonsense.
What's actually happening is that AI is making good architecture more important, not less.
Why? Because LLMs introduce:
- Non-determinism - Same input, different output
- Latency - Network calls that can take seconds
- Cost - Every API call costs money
- Versioning complexity - Model updates can break your app
- Debugging nightmares - How do you debug a prompt?
If you don't have clean separation of concerns, clear interfaces, and testable components, you're building a house of cards.
Principle #1: Encapsulation Isn't Optional
The first mistake I see: developers putting raw OpenAI API calls directly into their business logic.
# DON'T DO THIS
def process_user_request(user_input):
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": user_input}]
)
return response.choices[0].message.content
This looks simple. It's also a disaster waiting to happen.
What happens when:
- OpenAI changes their API?
- You want to switch to Anthropic's Claude?
- You need to add retry logic?
- You want to cache responses?
- You need to A/B test different prompts?
Better approach: Encapsulate the LLM behind an interface.
from abc import ABC, abstractmethod
class LLMProvider(ABC):
@abstractmethod
def generate(self, prompt: str, **kwargs) -> str:
pass
@abstractmethod
def stream(self, prompt: str, **kwargs):
pass
class OpenAIProvider(LLMProvider):
def __init__(self, model: str = "gpt-4", api_key: str = None):
self.model = model
self.client = openai.OpenAI(api_key=api_key)
def generate(self, prompt: str, **kwargs) -> str:
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
**kwargs
)
return response.choices[0].message.content
def stream(self, prompt: str, **kwargs):
stream = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
stream=True,
**kwargs
)
for chunk in stream:
if chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
Now your business logic depends on LLMProvider, not OpenAI specifically. You can swap providers, add caching layers, implement fallbacks—all without touching your core logic.
This is the Adapter Pattern, and it's as relevant today as it was in 1994.
Principle #2: Prompts Are Code
Here's another thing I see constantly: prompts scattered throughout the codebase as string literals.
# ALSO DON'T DO THIS
def summarize_document(doc):
return llm.generate(f"Summarize this document: {doc}")
def extract_entities(text):
return llm.generate(f"Extract named entities from: {text}")
Prompts are logic. They determine behavior. They need versioning, testing, and management.
Better: Use the Strategy Pattern for prompt management.
class PromptStrategy(ABC):
@abstractmethod
def build_prompt(self, **context) -> str:
pass
class DocumentSummaryPrompt(PromptStrategy):
def __init__(self, style: str = "concise"):
self.style = style
def build_prompt(self, document: str, max_length: int = 100) -> str:
return f"""Summarize the following document in a {self.style} style.
Maximum length: {max_length} words.
Document:
{document}
Summary:"""
class EntityExtractionPrompt(PromptStrategy):
def __init__(self, entity_types: list[str]):
self.entity_types = entity_types
def build_prompt(self, text: str) -> str:
types = ", ".join(self.entity_types)
return f"""Extract the following entity types from the text: {types}
Text:
{text}
Return as JSON with entity type as key and list of entities as value.
"""
Now you can:
- Version your prompts independently
- A/B test different prompt strategies
- Unit test prompt generation
- Keep prompt engineering separate from business logic
This is how you build systems that don't fall apart when you need to optimize prompt performance.
Principle #3: Chain of Responsibility for Complex Workflows
Most interesting LLM applications aren't single API calls. They're multi-step workflows:
- Classify user intent
- Extract parameters
- Generate response
- Validate output
- Format result
If you write this as a linear script, you'll end up with spaghetti code.
Better: Use Chain of Responsibility.
class LLMHandler(ABC):
def __init__(self):
self._next_handler: Optional[LLMHandler] = None
def set_next(self, handler: 'LLMHandler') -> 'LLMHandler':
self._next_handler = handler
return handler
@abstractmethod
def handle(self, context: dict) -> dict:
pass
def _pass_to_next(self, context: dict) -> dict:
if self._next_handler:
return self._next_handler.handle(context)
return context
class IntentClassifier(LLMHandler):
def __init__(self, llm: LLMProvider):
super().__init__()
self.llm = llm
def handle(self, context: dict) -> dict:
user_input = context['user_input']
intent = self.llm.generate(f"Classify intent: {user_input}")
context['intent'] = intent
return self._pass_to_next(context)
class ParameterExtractor(LLMHandler):
def __init__(self, llm: LLMProvider):
super().__init__()
self.llm = llm
def handle(self, context: dict) -> dict:
if context.get('intent') == 'search':
params = self.llm.generate(
f"Extract search parameters: {context['user_input']}"
)
context['parameters'] = params
return self._pass_to_next(context)
class ResponseGenerator(LLMHandler):
def __init__(self, llm: LLMProvider):
super().__init__()
self.llm = llm
def handle(self, context: dict) -> dict:
response = self.llm.generate(
f"Generate response for {context['intent']}"
)
context['response'] = response
return self._pass_to_next(context)
# Usage
classifier = IntentClassifier(llm)
extractor = ParameterExtractor(llm)
generator = ResponseGenerator(llm)
classifier.set_next(extractor).set_next(generator)
result = classifier.handle({'user_input': 'Find me Italian restaurants nearby'})
Each handler:
- Has a single responsibility
- Can be tested independently
- Can be reordered or replaced
- Passes context down the chain
This is how you build LLM workflows that don't become unmaintainable messes.
Principle #4: Observability Through the Observer Pattern
LLM calls are expensive and slow. You need visibility into what's happening.
The Observer Pattern gives you clean hooks for logging, monitoring, and debugging without polluting your core logic.
class LLMObserver(ABC):
@abstractmethod
def on_request(self, prompt: str, metadata: dict):
pass
@abstractmethod
def on_response(self, response: str, metadata: dict):
pass
@abstractmethod
def on_error(self, error: Exception, metadata: dict):
pass
class LoggingObserver(LLMObserver):
def on_request(self, prompt: str, metadata: dict):
logger.info(f"LLM Request: {metadata.get('model')}")
logger.debug(f"Prompt: {prompt[:100]}...")
def on_response(self, response: str, metadata: dict):
logger.info(f"LLM Response received: {len(response)} chars")
def on_error(self, error: Exception, metadata: dict):
logger.error(f"LLM Error: {error}")
class MetricsObserver(LLMObserver):
def on_request(self, prompt: str, metadata: dict):
metrics.increment('llm.requests', tags=[f"model:{metadata.get('model')}"])
def on_response(self, response: str, metadata: dict):
metrics.histogram('llm.response_length', len(response))
def on_error(self, error: Exception, metadata: dict):
metrics.increment('llm.errors')
class ObservableLLMProvider(LLMProvider):
def __init__(self, provider: LLMProvider):
self.provider = provider
self.observers: list[LLMObserver] = []
def attach(self, observer: LLMObserver):
self.observers.append(observer)
def generate(self, prompt: str, **kwargs) -> str:
metadata = {'model': kwargs.get('model', 'unknown')}
for observer in self.observers:
observer.on_request(prompt, metadata)
try:
response = self.provider.generate(prompt, **kwargs)
for observer in self.observers:
observer.on_response(response, metadata)
return response
except Exception as e:
for observer in self.observers:
observer.on_error(e, metadata)
raise
Now you can attach logging, metrics, cost tracking, or any other cross-cutting concern without modifying your LLM provider.
The Bigger Picture: Systems Thinking
Here's what I really want you to take away:
LLMs are components, not solutions.
The best AI applications I've seen aren't the ones with the cleverest prompts. They're the ones with the cleanest architecture.
When you treat LLMs like any other external service—with proper interfaces, error handling, observability, and testing—you build systems that:
- Scale - You can optimize, cache, and parallelize intelligently
- Evolve - You can swap models, update prompts, and add features without rewriting everything
- Debug - You can trace issues through clear component boundaries
- Cost-optimize - You know exactly where tokens are being spent
The developers who win in the AI era won't be the ones who abandon software engineering principles. They'll be the ones who apply those principles to new problems.
Practical Takeaways
If you're building with LLMs, here's what to do:
-
Never call LLM APIs directly from business logic - Always wrap them in an interface/adapter
-
Treat prompts as first-class code - Version them, test them, manage them systematically
-
Use established patterns for common problems:
- Adapter for provider abstraction
- Strategy for prompt management
- Chain of Responsibility for multi-step workflows
- Observer for monitoring and logging
- Decorator for caching and rate limiting
-
Design for failure - LLMs will timeout, hallucinate, and change behavior. Your architecture should handle this gracefully.
-
Make it testable - If you can't unit test your LLM integration, your architecture is wrong.
-
Think in systems - The LLM is one component. How does it interact with your database, your cache, your API layer? Design those boundaries clearly.
References & Further Reading
-
Design Patterns: Elements of Reusable Object-Oriented Software by Gang of Four - Still the definitive reference. The patterns are 30 years old and still relevant.
-
Martin Fowler's Refactoring - refactoring.com - Great resource on keeping code maintainable as requirements change.
-
Simon Willison's Blog - simonwillison.net - One of the best practitioners writing about LLM engineering in production.
-
The Twelve-Factor App - 12factor.net - Principles for building maintainable services. Apply them to your LLM integrations.
-
Anthropic's Prompt Engineering Guide - docs.anthropic.com - Good technical resource on structured prompting.
Final Thought
I've been in this industry long enough to see a lot of "this changes everything" moments.
The web. Mobile. Cloud. Microservices. Now AI.
And here's what I've learned: The fundamentals don't change. They just get applied to new problems.
Object-oriented thinking isn't about classes and inheritance. It's about managing complexity through clear boundaries, single responsibilities, and composable components.
That's as true for LLM-powered applications as it was for the software we were building 25 years ago.
The developers who understand this will build the AI applications that actually last.
About the Author: Shekhar is a startup founder and product leader with 25+ years of experience building 0→1 companies. He's raised funding from Founders Fund and Sequoia, and was among the first developers to publish apps on the iOS App Store. He currently advises founders on product architecture and AI integration strategies.