AI Applications with Langchain and Gemini

AI is everywhere, and it seems like everyone is leveraging it to create value for themselves or their businesses. Do you feel a bit out of place in this AI-driven world and want to take your first step toward integrating AI to benefit you? If so, keep reading! In this blog, I'll guide you on how to get started easily with Langchain, a powerful tool for AI developers. Together, we'll build a practical, real-world use case: a Product Review Analyser. And the best part? It’s completely free, as we’ll be using Google Gemini's 1.5 Flash Model, which is perfect for beginner-friendly projects.

This blog focuses on three key aspects:

Generating API Keys for Gemini Models
Structuring prompts with Langchain
Generating Outputs

Yes, it's really that straightforward! You don’t need to be an AI expert to build simple applications like this. However, a basic understanding of Python will definitely help you grasp the concepts more effectively and get more out of the process.

Generating API Keys for Gemini Models

To get started, we'll need to use one of the available Large Language Models (LLMs). In this case, we'll leverage Google's Gemini 1.5 Flash model, which is free to use with some rate limits: 1,500 requests per day, 1 million tokens per minute, and a maximum of 15 requests per minute. For more details on other Gemini models and their pricing, you can refer to the link here.

To generate you Google API key, visit Google's AI Studio using the link and sign up with your google account.

Please note: There are currently some issues with Google AI Studio setup in the EU and UK regions, which may make it inaccessible. In such cases, you can use alternative API endpoints like OpenAI, Llama, or others.

Once logged into Google AI Studio, click on the "Get API Key" option in the left pane. You can create an API key by selecting the Google Cloud Console project where you want the key to be enabled. If you need to create a new project, refer to this short video, or you can get started using the default "MyFirstProject."

With the API key at our disposal we can begin writing some code for importing and installing required packages, entering our API Key in the Google Colaboratory environment, and initialising the LLM Model. All of this with merely 10 lines of code shown below.

# Installing Necessary Packages
!pip install langchain
!pip install langchain_google_genai
!pip install python-dotenv

# Setting up Gemini Model
import os
import getpass
from langchain_google_genai import ChatGoogleGenerativeAI

# Entering your API Key
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google API key: ")

# Initialising the LLM
llm_model = ChatGoogleGenerativeAI(model="gemini-1.5.flash", google_api_key=os.environ["GOOGLE_API_KEY"],temperature=0.0)

Importing Data

Additionally, since we're using AI to analyse customer product reviews, we'll need to feed the reviews into the model. I'll upload a sample Google Sheets file containing 10 product reviews of some of my favourite shoes from Amazon. We'll import this file from Google Sheets, convert the data into a pandas DataFrame, and use it for further analysis. Here's a code snippet with another 10 lines of code that does the following tasks:

# Google Drive and sheets authentication
from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

gc = gspread.authorize(creds)

import pandas as pd

# Importing data to convert and store as pandas dataframe
worksheet = gc.open('Sample Product Reviews').sheet1

# Get all values from the worksheet
rows = worksheet.get_all_values()

# Convert to a DataFrame, with the first row as the header
reviews_data = pd.DataFrame(rows[1:], columns=rows[0])

Structuring Prompts with Langchain

With our systems ready, we can move on to the most crucial aspect of any AI application—the prompts. The quality of the outputs your model generates is directly tied to the quality of the prompts you provide. To enhance this, we can adopt a "chain of thought" approach using Langchain and define a prompt template.

Before writing the prompt, we need to consider a couple of key questions:

What output do we want to generate?
In what format do we need the output?

Since LLM models typically handle text input and output, we need to ensure the output is structured in a format suitable for further data operations. This process of converting outputs in formats beyond simple text, known as "Output Parsing," can be easily managed with Langchain functions.

Answering the two key questions from earlier, we can now design a prompt to extract the following information from the customer reviews:

Review Language: Identify the language of the review.
Review Sentiments: Analyse the sentiment (positive, negative, neutral).
Translated Review: Provide a translation for reviews in any language other than English.
Formal Response: Generate a formal response to the review.
Translated Response: Translate the formal response into the original language of the review.

All of this information will be structured in JSON format to allow easy collection, storage, and data cleaning operations.

Here's an example of a simple prompt template to achieve this:

We can define the output structure using ResponseSchema and utilise the StructuredOutputParser function to ensure the model generates responses in the desired format. By doing this, we can extend the prompt to produce structured outputs, like JSON, that match our schema.

from langchain.prompts import ChatPromptTemplate
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

## Output Parser Schema Design
review_language_schema = ResponseSchema(name="review_language",
              description="Which language the review was written in?")
review_sentiment_schema = ResponseSchema(name="review_sentiment",
           	description="What was the sentiment of the review")
translated_review_schema = ResponseSchema(name="translated_review",
              description="Reviews in other languages translated to English")
review_response_schema = ResponseSchema(name="review_response",
              description="Response to the review in the original language of the review")
translated_response_schema = ResponseSchema(name="translated_response",
              description="Response to the review translated to English.")

response_schemas = [review_language_schema,
                    review_sentiment_schema,
                    translated_review_schema,
                    review_response_schema,
                    translated_response_schema]

output_parser= StructuredOutputParser.from_response_schemas(response_schemas)

format_instructions = output_parser.get_format_instructions()

To consolidate the overall structured prompt, we combine the task instructions for the LLM, the text input from customer reviews in the database, and the formatting instructions defined earlier. This will ensure the model processes the reviews according to our requirements and produces outputs in the specified structured format.

## Output Template
output_template= """
For the following text, extract the following information:

review_language: Identify the language in which the review is written like English, Spanish, French, etc.

review_sentiment: Tell me whether the customer review for the product was positive or negative. If unclear label it as Unidentified.

translated_review: If the review is in a language other than english give me the review translated in English. If it's already in english don't return anything.

review_response: Give me reply to the review in less than 30 words in the language of the original review. Keep a formal tone of response. Respond to both positive and negative reviews.

Tailor responses accordingly to the sentiment of the review. I need a response for every review.

translated_response: Give me the above review_response translated into English. If it's already in english don't return anything.

Do not make stuff up.

Format the output as JSON with the following keys:
review_language
review_sentiment
translated_review
review_response
translated_response

Just give me the final json as output and nothing else.

Return Null if not appropriate answer available. Do not hallucinate.

text: {text}

{format_instructions}

"""

Note: The variables enclosed in {} are placeholders used to insert dynamic values into a string. In the code above, placeholders are utilised to incorporate customer reviews and format instructions into the prompts sent to the LLM.

Generating Outputs

Let’s integrate everything we’ve done so far by iterating over each customer review in our data. We’ll pass each review as context to our prompt and construct a consolidated prompt template. This template will be used to make calls to the LLM model. We’ll then store the resulting JSON outputs as parsed key-value pairs (Python dictionaries) in the Output Column.

chat= llm_model # Initialised above
for i in range(len(reviews_data)):
	# Using Chat Prompt Template from above
	prompt=ChatPromptTemplate.from_template(template=output_template)

	# Making a prompt by looping over each item_description one by one
	messages= prompt.format_messages(text=reviews_data['Customer_Reviews'].iloc[i], format_instructions=format_instructions)

	response= chat(messages)

    # Getting the response and Parsing it to a json dictionary
    output_dict= output_parser.parse(response.content)

    # Now store this dictionary output back to the Answer field
    reviews_data['Output'].iloc[i]= output_dict
    
    # Reinitialising (emptying) messages 
    messages=""

Voilà! In just about 30-35 lines of code and an enormous amount of help from AI, we have achieved the desired output, as illustrated below:

Now, let’s apply our Python data cleaning skills to extract individual key-value pairs from the Output column in the pandas DataFrame and organise them into a more convenient format for analysis. In this case, I’ve separated the Output column into multiple fields.

# Creating new Empty columns
for col in ['Review_Language', 'Review_Sentiment', 'Translated_Review','Review_Response','Translated_Response']:
    if col not in reviews_data.columns:
        reviews_data[col] = None

# Splitting key-value pairs into separate fields
for i in range(len(reviews_data)):
  output=reviews_data['Output'].iloc[i]
  reviews_data['Review_Language'].iloc[i]= output.get('review_language')
  reviews_data['Review_Sentiment'].iloc[i]= output.get('review_sentiment')
  reviews_data['Translated_Review'].iloc[i]= output.get('translated_review')
  reviews_data['Review_Response'].iloc[i]= output.get('review_response')
  reviews_data['Translated_Response'].iloc[i]= output.get('translated_response')

reviews_data

Here's the final output after the above data cleaning steps:

I would encourage you to explore further applications of AI with these simple methods to automate tasks in your field and improve your business operations. Irrespective of the domain you wish to rock with AI, I hope this guide serves as a helpful starting point.

Signing Off,

Yash

Did this help you Get Started in AI?

AI Applications with Langchain and Gemini

Generating API Keys for Gemini Models

Structuring Prompts with Langchain

Generating Outputs

Recent Posts

Commentaires

Want to stay in touch ?