AI

Llama 3: Get building with LLMs in 5 minutes

Get started building transformative AI-powered features within 5 minutes using Llama 3, Ollama, and Python.

β€’ 9 min read
Getting started with LLMs locally, Llama 2 and Ollama πŸ¦™
Getting started with LLMs locally, Llama 3 and Ollama πŸ¦™

To many developers, LLMs provide the opportunity to create AI-powered features that were previously unimaginable but there is also a sense of scrambling to avoid getting left behind in this persistent AI hype wave. This article will demonstrate how easy it is to start building transformative AI-backed features with LLMs using Ollama and Llama 3.

What you'll learn

What is Llama 3?

Large Language Models (LLMs) have taken the AI world by storm, capable of generating human-quality text, translating languages, and writing different creative content. Among these, Llama 3 shines as a powerful open-source option developed by Meta AI.

One key advantage of Llama 3 is its accessibility. Unlike some proprietary LLMs with limited access and hefty fees, Llama 3 embraces an open-source approach. This allows developers to experiment, fine-tune the model for specific tasks, and integrate it within their projects without barriers. Additionally, Llama 3 boasts impressive performance on various benchmarks, rivaling or exceeding the capabilities of commercial models.

What is Ollama?

Ollama is a software tool specifically designed to streamline the process of running Large Language Models (LLMs) locally. Think of it as the bridge between complex AI models and your computer. With Ollama, you can take advantage of powerful LLMs like Llama 3 without needing to navigate complicated technical setups.

Here's how it works:

With Ollama and LLama 3 combined, we can quickly get to work building compelling solutions backed by a powerful LLM, let's get to it.

Get Llama 3 running in 3 easy steps

Follow these simple steps to get Ollama and Llama 3 running on your local machine so we can get to work.

  1. Install Ollama: Ollama is available for MacOS, Windows, Linux and Docker. Follow the appropriate link to install the software.
  2. Launch Llama 3: Once installed run ollama run llama3 from a terminal which will launch an interactive interface to the LLM plus an API
  3. Test the API: Run the following command to test the API is available:
curl http://localhost:11434/api/generate -d '{"model": "llama3","prompt":"Who is Frodo Baggins?", "stream": false}'

If you get something that looks like the following, we're ready to go!

Testing the Ollama API and make sure everything is running correctly πŸš€

Using Python to interface with the Llama 3 Large Language Model

Now we have the API running locally we can build an application that utilises the LLMs capabilities. Ollama provides an easy-to-use Python and JavaScript library to make integration easy. However, we'll be using simple HTTP requests as this is more back-end agnostic which is useful if we decide to move this away from our local environment not using Ollama.

To give this example more substance we'll also interface with a news API, fetch the latest news stories, and then ask the LLM to summarise them.

  1. Grab an API key from newsdata.io so we can use their API to fetch news (it's completely free)
  2. Create a new Python file we can run from the CLI: get_started.py
  3. Enter the following code, replacing [API_KEY] with the API key, you've just acquired
# Import libraries to make the API requests and parse JSON
import json
import requests

# Make a GET request to the news API to fetch the latest technology news
newApiUrl = 'https://newsdata.io/api/1/news'
urlParams = {
    'apikey': '[API_KEY]', # API key
    'language': 'en', # Restrict to english only
    'category': 'technology' # Limit to the technology category
}
# Make the POST request
newsApiResponse = requests.get(newApiUrl, urlParams)
# Parse the JSON response and extract the articles
data = newsApiResponse.json()
articles = data['results']

print('Fetched {} articles, now asking the LLM to summerise them. This could take a minute or two...\n'.format(len(articles)))

# Loop through the articles and construct the prompt to ask the LLM to summerise them
prompt = 'Summerise all these news articles in a single paragraph. Dont use bullet points:\n\n'
# We're just pulling out the most relevant parts and formatting it in a way to make it easier for the model to understand
for article in articles:
    prompt += 'Title: {}\n'.format(article['title'])
    prompt += 'Description: {}\n\n'.format(article['description'])
# Now we make a POST request to the LLM to generate the summary
postData = {
    'prompt': prompt,
    'model': "llama3",
    'stream': False # Disable streaming to get the entire response at once
}
ollamaUrl = 'http://localhost:11434/api/generate' 
# Make the POST request
llmResponse = requests.post(ollamaUrl, data=json.dumps(postData))
# Parse the JSON response and extract the summary
result = llmResponse.json()
print('News Summary:\n\n', result['response'])
  1. In a terminal run your simple script with python get_started.py, after a minute or so, you should get a summary of the latest tech news:
Latest news summary using the Llama 2 Large Language Model (LLM)
Latest news summary using the Llama 3 Large Language Model (LLM)

Let's break down the more interesting parts of the code and take a closer look.

newApiUrl = "https://newsdata.io/api/1/news" 
urlParams = {
    'apikey': '[API_KEY]',
    'language': 'en',
    'category': 'technology'
}
newsApiResponse = requests.get(newApiUrl, urlParams)
data = newsApiResponse.json()
articles = data['results']

Firstly we're making a GET request to the news API to fetch the latest news within the technology category, limiting it to English content only. We're then extracting the resulting news articles into a articles variable for use later.

prompt = 'Summerise all these news articles in a single paragraph. Dont use bullet points:\n\n'
for article in articles:
    prompt += 'Title: {}\n'.format(article['title'])
    prompt += 'Description: {}\n\n'.format(article['description'])

Next, we're setting up our prompt to send to the LLM. The main element of the prompt is to instruct the LLM to summarise the news articles we're going to give it, and we're also giving it a strict instruction not to use bullet points. An instruction it likes to regularly ignore 🀬.

We're then looping over the articles variable and pulling out each article title and description and appending it to the prompt string making it clear where each article starts and finishes. If you were to print out the prompt variable at this point it'd look something like this:

 Summerise all these news articles in a single paragraph. Dont use bullet points:

Title: Google looking into reports of Phone Hub not working for some ChromeOS users
Description: ChromeOS users have been frustrated by a persistent bug causing their Android phones to not be detected by the Phone Hub. This issue, which started plaguing users around after the release of ChromeOS 121, disrupts one of Google’s key efforts to create a seamless integration between Chromebooks and Android devices akin to what Apple has […] The post Google looking into reports of Phone Hub not working for some ChromeOS users appeared first on PiunikaWeb.

Title: I woke up & found my dream car gone from my driveway – even though I paid down my loan and never missed a payment
Description: None

Title: Rajasekhar's remake film to stream on OTT soon?
Description: None
...etc.

We then make a POST request to the Ollama API with the constructed prompt.

postData = {
    'prompt': prompt,
    'model': "llama3",
    'stream': False
}
ollamaUrl = 'http://localhost:11434/api/generate' 
llmResponse = requests.post(ollamaUrl, data=json.dumps(postData))
result = llmResponse.json()
print('News Summary:\n\n', result['response'])

As well as specifying the prompt as an argument in the POST request, we also specify the model we want to use as llama3 and we also set streaming to False so we get the response back all at once. You can read the full documentation on the Ollama API for more info.

We then finish by printing the response. Thats it! This demonstrates how easy it is to get started building things with LLMs thanks to the accessibility of Llama 3 via Ollama. But we're not done yet, keep reading to see how you can fine-tune the model ... and have a bit of fun at the same time.

Fine-tune the Llama 3 model to customise its capabilities

The above example is great, but of course, it'd be better if the LLM summarised our news only as a pirate πŸ΄β€β˜ οΈπŸ¦œ!

Ollama allows us to easily tune a model to tailor its responses to our needs. We could just update our prompt above and ask the LLM to respond as a pirate every time, but if we want to create a consistent experience across all prompts as the application extends, fine-tuning is the way to go.

  1. Run ollama pull llama3 to make sure you've got the llama3 model locally
  2. Create a new Modelfile called pirate-news-Modelfile and add the following contents:
FROM llama3

PARAMETER temperature 1

SYSTEM """
You are a pirate from the Caribbean, you're always trying to get one over on anyone you meet. Always answer as a pirate.
"""
  1. Create a new model with the new Modelfile: ollama create llama3-pirate -f ./pirate-news-Modelfile
  2. Run the new model: ollama run llama3-pirate
  3. Update the Python script to interface with the new model:
postData = {
    'prompt': prompt,
    'model': "llama3-pirate", # <-- Specify the new model we've just created
    'stream': False
}
  1. Run your Python script: python get_started.py. You should get something like this:
Things are just better in "pirate" 🀣
Things are just better in "Pirate" 🀣

Obviously, this is a silly (but fun!) example. Instead, we could have used this method to fine-tune the model to ask it to never use bullet points instead of adding this to the prompt, if we felt this was something we'd want to use across all prompts.

One final thing before we conclude, you may have noticed the temperate parameter in the Modelfile from earlier. By tweaking this we can also control how creative the LLM gets with its responses. The higher the number the more creative the LLM will be, the lower the number the more concise it'll be.

Running with a temperature of 10 I get something like the following, note the sudden use of emojis:

Let the LLM be more creative and it'll start using emojis!
Let the LLM be more creative and it'll start using emojis!

Using a temperature of 0.1 I get something more mundane like this:

Tweaking the temperate allows you to change how creative the LLM will be
Tweaking the temperate allows you to change how creative the LLM will be

Hopefully, that gives you greater insight into how you can fine-tune a model to behave more specifically to your needs. Take a look at the full documentation around Modelfiles for more info.

Conclusion

As this article demonstrates, building compelling AI features with LLMs has never been easier. Tools like Ollama simplifies the process of running these powerful models locally, enabling developers to experiment and unlock new possibilities without complex technical setups. By combining Ollama's ease of use with the power of Llama 3, you can start integrating intelligent features into your applications today. Whether you want to summarize news articles, enhance customer support, or build entirely new AI-driven experiences, LLMs provide a fertile ground for innovation.

Opportunities for Further Exploration

This article provided a foundational introduction. Here are a few exciting directions you can explore next:

The Future is Bright

The field of LLMs is evolving rapidly. Stay curious, continue experimenting with tools like Ollama, and push the boundaries of what's possible. The true potential of LLMs in software development is only just beginning to be revealed.

Don't miss the next one, show support by subscribing below πŸ‘‡

Share This Post

Check out these related posts

AI Boom: A Second Chance at the Dot-Com Dream?

What Is Retrieval-Augmented Generation (RAG)?

What is an AI Agent?