1 Introduction
With large language models taking the world by storm ever since the release of ChatGPT, one framework that has been ubiquitous has been LangChain. Recently, I was myself working on building an economic chatbot using the framework and wanted to look into the source code of what goes inside this complex framework. As part of this blog post, we start small. We pick the simplest use-case of LLMChain
and look at the source code to understand what goes inside the framework.
Let’s say that we want to hear a joke about any product. We can use the LLMChain
for this.
# https://python.langchain.com/docs/modules/chains/foundational/llm_chain#get-started
from langchain import LLMChain, OpenAI, PromptTemplate
= "Tell me a joke that includes {product}?"
prompt_template = OpenAI(temperature=0, openai_api_key=<openai_api_key>)
llm = LLMChain(
llm_chain =llm,
llm=PromptTemplate.from_template(prompt_template),
prompt=True,
return_final_only
)print(llm_chain("colorful socks")['text'])
The above code is internally calls the OpenAI chat completion API to tell a joke about “colorful socks”. Here is the output from the model.
Q: What did the sock say to the other sock when it was feeling blue?
A: "Cheer up, it could be worse, at least we're not white socks!"
And with that joke, let’s start looking into the source code of LangChain and understand everything that there is to know about LLMChain
.
All code below has been copied from from LangChain. At the time of writing, this was the GIT commit-id 24c165420827305e813f4b6d501f93d18f6d46a4
. LangChain
’s code might change in the future.
2 Code: Deep-dive
Calling any class in Python requires the __call__
method to be implemented. LLMChain
in itself is a subclass of Chain
which has the __call__
method implemented that looks like below:
# https://github.com/langchain-ai/langchain/blob/24c165420827305e813f4b6d501f93d18f6d46a4/langchain/chains/base.py#L185-L250
def __call__(
self,
str, Any], Any],
inputs: Union[Dict[bool = False,
return_only_outputs: = None,
callbacks: Callbacks *,
str]] = None,
tags: Optional[List[str, Any]] = None,
metadata: Optional[Dict[bool = False,
include_run_info: -> Dict[str, Any]:
) = self.prep_inputs(inputs)
inputs = CallbackManager.configure(
callback_manager
callbacks,self.callbacks,
self.verbose,
tags,self.tags,
metadata,self.metadata,
)= inspect.signature(self._call).parameters.get("run_manager")
new_arg_supported = callback_manager.on_chain_start(
run_manager self),
dumpd(
inputs,
)try:
= (
outputs self._call(inputs, run_manager=run_manager)
if new_arg_supported
else self._call(inputs)
)
Looking at the above code we can see that it calls self.prep_inputs(inputs)
and then calls self._call
method. We wil ignore the self.prep_inputs
part for now, as otherwise, this blog post will become too long. The self._call
method inside the Chain
class is an abstract method. Therefore, it must be implemented in LLMChain
.
Let’s look at the definition of it in LLMChain
.
# https://github.com/langchain-ai/langchain/blob/24c165420827305e813f4b6d501f93d18f6d46a4/langchain/chains/llm.py#L87-L93
def _call(
self,
str, Any],
inputs: Dict[= None,
run_manager: Optional[CallbackManagerForChainRun] -> Dict[str, str]:
) = self.generate([inputs], run_manager=run_manager)
response return self.create_outputs(response)[0]
Okay, great! Now, we see that the _call
method calling self.generate
passing in the [inputs]
. Remember, the inputs were prepared in self.prep_inputs(inputs)
step inside __call__
of Chain
.
Below I have shared the source code of generate
method from LLMChain
.
# https://github.com/langchain-ai/langchain/blob/24c165420827305e813f4b6d501f93d18f6d46a4/langchain/chains/llm.py#L95-L107
def generate(
self,
str, Any]],
input_list: List[Dict[= None,
run_manager: Optional[CallbackManagerForChainRun] -> LLMResult:
) """Generate LLM result from inputs."""
= self.prep_prompts(input_list, run_manager=run_manager)
prompts, stop return self.llm.generate_prompt(
prompts,
stop,=run_manager.get_child() if run_manager else None,
callbacks**self.llm_kwargs,
)
So, it’s calling self.llm.generate_prompt
. Great! At this point, I am starting to wonder, what’s the point of LLMChain
at all? Also, let’s skip the self.prep_prompts
part. Otherwise the blog post will be too long. Rather than looking at the source code of self.prep_prompts
, let’s just look at the output of it.
>> input_list
'product': 'colorful socks'}]
[{
>> prompts, stop
='Tell me a joke that includes colorful socks?')], None) ([StringPromptValue(text
So now all that has happened is that the above code has added product value to out input prompt. Why is this not an f-string I wonder?
Before we go any further, because, now we are starting to look at self.llm
’s source code and not so much on Chain
s let’s just look at the reponse from response = self.generate([inputs], run_manager=run_manager)
. Below is what the response looks like:
>> response
=[[Generation(text='\n\nQ: What did the sock say to the other sock when it was feeling blue?\nA: "Cheer up, it could be worse, at least we\'re not white socks!"', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {'prompt_tokens': 9, 'total_tokens': 49, 'completion_tokens': 40}, 'model_name': 'text-davinci-003'}, run=[RunInfo(run_id=UUID('5afa5d25-802a-49eb-b147-0f708d207a33'))]) LLMResult(generations
Now we are ready to look into self.llm.generate_prompt
, how did this method create the response
that we see above?
3 OpenAI LLM
So far we were looking at the LLMChain
source code. But, internally that in-itself is calling self.llm.generate_prompt
. If you remember from the top of the blog post, self.llm
was an instance of OpenAI
class.
The OpenAI
class is a subclass of BaseOpenAI
and that in itself is a subclass of BaseLLM
and the generate_prompt
method that was called from inside the generate
method of ChainLLM
is implemented in BaseLLM
. A bit complicated, isn’t it?
#
# https://github.com/langchain-ai/langchain/blob/24c165420827305e813f4b6d501f93d18f6d46a4/langchain/llms/base.py#L178-L186
def generate_prompt(
self,
prompts: List[PromptValue],str]] = None,
stop: Optional[List[= None,
callbacks: Callbacks **kwargs: Any,
-> LLMResult:
) = [p.to_string() for p in prompts]
prompt_strings return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
So, now we can see that the generate_prompt
method calls self.generate
again. Remember, the last time we called self.generate
it was for the LLMChain
, but this time, it is for the OpenAI LLM. The p.to_string()
part? That’s just converting our prompt to string. This is the output of prompt_strings
looks like below:
>> prompt_strings
'Tell me a joke that includes colorful socks?'] [
🤔 So far so good? Yes and No. I am a bit baffled at the amount of complexity in the code. I still don’t know why we had a Chain
and a LLMChain
. I guess we are looking at just one use-case of LLMChain
which is generation, LLMChain
s might also be supporting other use cases where this complexity might be needed.
Time to look at the generate
method. It is again implemented in BaseLLM
.
# https://github.com/langchain-ai/langchain/blob/24c165420827305e813f4b6d501f93d18f6d46a4/langchain/llms/base.py#L233-L302
def generate(
self,
str],
prompts: List[str]] = None,
stop: Optional[List[= None,
callbacks: Callbacks *,
str]] = None,
tags: Optional[List[str, Any]] = None,
metadata: Optional[Dict[**kwargs: Any,
-> LLMResult:
) """Run the LLM on the given prompt and input."""
if not isinstance(prompts, list):
raise ValueError(
"Argument 'prompts' is expected to be of type List[str], received"
f" argument of type {type(prompts)}."
)= self.dict()
params "stop"] = stop
params[= {"stop": stop}
options
(
existing_prompts,
llm_string,
missing_prompt_idxs,
missing_prompts,= get_prompts(params, prompts)
) = self.cache is not None and not self.cache
disregard_cache = CallbackManager.configure(
callback_manager
callbacks,self.callbacks,
self.verbose,
tags,self.tags,
metadata,self.metadata,
)= inspect.signature(self._generate).parameters.get(
new_arg_supported "run_manager"
)if langchain.llm_cache is None or disregard_cache:
if self.cache is not None and self.cache:
raise ValueError(
"Asked to cache, but no cache found at `langchain.cache`."
)= callback_manager.on_llm_start(
run_managers self), prompts, invocation_params=params, options=options
dumpd(
)= self._generate_helper(
output bool(new_arg_supported), **kwargs
prompts, stop, run_managers,
)return output
if len(missing_prompts) > 0:
= callback_manager.on_llm_start(
run_managers self), missing_prompts, invocation_params=params, options=options
dumpd(
)= self._generate_helper(
new_results bool(new_arg_supported), **kwargs
missing_prompts, stop, run_managers,
)= update_cache(
llm_output
existing_prompts, llm_string, missing_prompt_idxs, new_results, prompts
)= (
run_info =run_manager.run_id) for run_manager in run_managers]
[RunInfo(run_idif run_managers
else None
)else:
= {}
llm_output = None
run_info = [existing_prompts[i] for i in range(len(prompts))]
generations return LLMResult(generations=generations, llm_output=llm_output, run=run_info)
Okay, well, let’s look at the outputs step-by-step. This again is a majorly over-complicated piece of code. But, for what? Why do we need such complication?
>> params
'model_name': 'text-davinci-003', 'temperature': 0.0, 'max_tokens': 256, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'n': 1, 'request_timeout': None, 'logit_bias': {}, '_type': 'openai', 'stop': None} {
Honestly, looking at the above source code, I am still a bit confused at which point do we even call the model and call the create API. I can’t count how many layers down we are in code, and we still haven’t called the openai.Completion.create
method when generating an output from the prompt should be as simple as:
import openai
= openai.Completion.create(
response ="text-davinci-003",
model="Write a tagline for an ice cream shop."
prompt )
I believe the part where the generations actually happen is inside the following piece of code.
= self._generate_helper(
output bool(new_arg_supported), **kwargs
prompts, stop, run_managers,
)
# Values of `output` (This is the same as `response` from before)
>> output
=[[Generation(text='\n\nQ: What did the sock say to the other sock when it was feeling blue?\nA: "Cheer up, it could be worse, at least we\'re not white socks!"', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {'prompt_tokens': 9, 'total_tokens': 49, 'completion_tokens': 40}, 'model_name': 'text-davinci-003'}, run=[RunInfo(run_id=UUID('b32869ec-43a3-4780-805e-be482f1fe05b'))]) LLMResult(generations
So, now we need to look at another method self._generate_helper
and who knows what other methods that method will call. Let’s dig down more into the source code a bit further and look at self._generate_helper
.
# https://github.com/langchain-ai/langchain/blob/24c165420827305e813f4b6d501f93d18f6d46a4/langchain/llms/base.py#L200-L231
def _generate_helper(
self,
str],
prompts: List[str]],
stop: Optional[List[
run_managers: List[CallbackManagerForLLMRun],bool,
new_arg_supported: **kwargs: Any,
-> LLMResult:
) try:
= (
output self._generate(
prompts,=stop,
stop# TODO: support multiple run managers
=run_managers[0] if run_managers else None,
run_manager**kwargs,
)if new_arg_supported
else self._generate(prompts, stop=stop)
)except (KeyboardInterrupt, Exception) as e:
for run_manager in run_managers:
run_manager.on_llm_error(e)raise e
= output.flatten()
flattened_outputs for manager, flattened_output in zip(run_managers, flattened_outputs):
manager.on_llm_end(flattened_output)if run_managers:
= [
output.run =run_manager.run_id) for run_manager in run_managers
RunInfo(run_id
]return output
Unbelievable! The self._generate_helper
is further calling self._generate
method which is an abstract method in BaseLLM
but implemented in BaseOpenAI
!
😡 Are you still with me yet? We got to finish this. But, yes I am a bit frustrated. This whole part seems so complex and is so hard to follow and explain. Let’s recap what has happened so far. We wanted to look at what goes inside LangChain
to create outputs from a prompt. What we have discovered is extremely puzzling. And I am being euphemistic here.
We started with LLMChain
and called it using __call__
method that was implemented in Chain
which called_call
method of LLMChain
that in-turn called self.generate
which further called self.llm.generate_prompt
.
self.llm
is an instance of OpenAI
class which is subclass of BaseOpenAI
which is subclass of BaseLLM
and generate_prompt
method is implemented there. We are not done yet.
The generate_prompt
implemented in BaseLLM
calls self.generate
which in turn calls self._generate_helper
that in turn calls self._generate
of BaseOpenAI
!
Isn’t that a lot of code? And we still haven’t called the OpenAI API yet.
Okay, I am normally breathing again. Let’s continue. We are still not done yet and haven’t generated our results. Let’s continue and look at the self._generate
method which is implemented in BaseOpenAI
.
# https://github.com/langchain-ai/langchain/blob/24c165420827305e813f4b6d501f93d18f6d46a4/langchain/llms/openai.py#L272-L325
def _generate(
self,
str],
prompts: List[str]] = None,
stop: Optional[List[= None,
run_manager: Optional[CallbackManagerForLLMRun] **kwargs: Any,
-> LLMResult:
) = self._invocation_params
params = {**params, **kwargs}
params = self.get_sub_prompts(params, prompts, stop)
sub_prompts = []
choices str, int] = {}
token_usage: Dict[# Get the token usage from the response.
# Includes prompt, completion, and total tokens used.
= {"completion_tokens", "prompt_tokens", "total_tokens"}
_keys for _prompts in sub_prompts:
if self.streaming:
if len(_prompts) > 1:
raise ValueError("Cannot stream results with multiple prompts.")
"stream"] = True
params[= _streaming_response_template()
response for stream_resp in completion_with_retry(
self, prompt=_prompts, **params
):if run_manager:
run_manager.on_llm_new_token("choices"][0]["text"],
stream_resp[=self.verbose,
verbose=stream_resp["choices"][0]["logprobs"],
logprobs
)
_update_response(response, stream_resp)"choices"])
choices.extend(response[else:
= completion_with_retry(self, prompt=_prompts, **params)
response "choices"])
choices.extend(response[if not self.streaming:
# Can't update token usage if streaming
update_token_usage(_keys, response, token_usage)return self.create_llm_result(choices, prompts, token_usage)
Do we get to the part where we have results yet? Yes! The completion_with_retry
function looks like it! Let’s look at the inputs to this function. It looks like the _generate
also supports streaming.
>> _prompts
'Tell me a joke that includes colorful socks?']
[
>> params
'model': 'text-davinci-003', 'api_key': '<openai_api_key>', 'api_base': '', 'organization': '', 'temperature': 0.0, 'max_tokens': 256, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'n': 1, 'request_timeout': None, 'logit_bias': {}} {
We are now ready to call the completion_with_retry
passing in above as inputs. The _prompts
is just the Prompt Template converted to a string. And the params
are defined as defaults in BaseOpenAI
.
def completion_with_retry(llm: Union[BaseOpenAI, OpenAIChat], **kwargs: Any) -> Any:
"""Use tenacity to retry the completion call."""
= _create_retry_decorator(llm)
retry_decorator
@retry_decorator
def _completion_with_retry(**kwargs: Any) -> Any:
return llm.client.create(**kwargs)
return _completion_with_retry(**kwargs)
Now, self
get’s passed as an argument to the function and finally we call llm.client.create(kwargs)
.
Below is what the kwargs
look like. Remember, they are just _prompts
& params
mergfed together into a single dictionary.
>> kwargs
'prompt': ['Tell me a joke that includes colorful socks?'], 'model': 'text-davinci-003', 'api_key': '<api_key>', 'api_base': '', 'organization': '', 'temperature': 0.0, 'max_tokens': 256, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'n': 1, 'request_timeout': None, 'logit_bias': {}} {
4 Complexity and Readability in LangChain
Going through the LangChain code, I find myself filled with a mix of admiration and confusion. It’s clear that the creators of LangChain put a lot of thought into building a flexible architecture. The design, which involves multiple layers of abstraction and many separate components, suggests that the tool is built to handle a wide range of use cases beyond the one we’ve examined today. However, in this particular scenario, we’ve seen how complexity can make code harder to follow and understand.
Let’s start with the positives. The modular design of LangChain allows for easy extension and modification.
On the other hand, the complexity of the code made it difficult to trace the flow of execution. We found ourselves diving deeper and deeper into the call stack, and it took a considerable amount of time just to locate the point where the OpenAI API is actually called.
Another point of confusion was the usage of the self.dict()
method. It seems that this method is intended to create a dictionary representation of an object’s attributes, but it’s not immediately clear why this is necessary. In some cases, it seemed that a simpler approach, such as using f-strings for prompt generation, could have achieved the same result with less code.
In conclusion, examining the LangChain code has provided valuable insights into the design decisions that go into creating a complex tool like this. While the abstraction and modularity are commendable, the complexity of the code can potentially make it harder for anyone to understand.
What are your thoughts?