news

Gemini 1.5 Flash better than RAG? Let’s check it out in R!

Vaseline August 18, 2024

Overall I am quite impressed with the responses! With minimal prompt engineering, document cleaning! It was able to give accurate answers, and even separate different conditions and provide appropriate treatment options. It was also able to give the right answer to tricky questions that our RAG could not. It definitely has potential!

Objectives

Gemini 1.5 Flash replaces RAG?

Gemini 1.5 Flash is a high-performance model designed for high-volume, low-latency AI tasks. It features a massive 1 million token context windowwhich allows for the processing of large data such as long documents, videos or codebases. It also means that you can embed documents directly and not worry about RAG, where we would typically take an embedding model and store it in vectorstore, and then have a retriever do the similarity check using whatever parameter we set. But what’s the catch? That’s sending a lot of tokens at once! That can be quite expensive, right? Let’s take a look at the price of the free version.

Here’s the link in case they changed their prices.

For our purposes it seems like the FREE version would work just fine! It looks like there are limits, 15 requests per minute, 1 million tokens per minute and 1500 requests per day. That’s a lot of tokens!

But is this really better than RAG? Let’s find out. If you want to check out our previous play with Llama 3, check it out here . We’ve covered how RAG works, you can check out the previous responses there . Here, we’re using LLM as a judge to assess the relevance, factual accuracy, and brevity of the response that Gemini Flash 1.5 has generated. How well do you think it’s going to perform?

Before you start, don’t forget to get your Gemini API key

Minimally reproducible code

library(tidyverse) library(reticulate) # Step 1: Create a virtual environment, if you've already created one please move on the step 2. This is a best practice. virtualenv_create(envname = "gemini") ## Step 1.1: Install the appropriate modules py_install(c("google-generativeai","langchain","langchain-community","pypdf","python-dotenv"), pip = T, virtualenv = "gemini") # Step 2: Use the virtual environment use_virtualenv("gemini") # Step 3: Import installed modules dotenv

Response

OK, we don’t really need step 10, it’s actually more for show 🤪 But, what do you think? Is it better than our prior RAG?

Let’s take a closer look here:

What is the preferred treatment of CRE?
- For infections caused by Enterobacterales isolates that are NOT carbapenemase producing that exhibit susceptibility to meropenem and imipenem (i.e., MICs ≤1 µg/mL), but are not susceptible to ertapenem (i.e., MICs ≥1 µg/mL), the use of extended-infusion meropenem (or imipenem-cilastatin) is suggested.
- For infections caused by Enterobacterales isolates that are NOT carbapenemase producing and that do not exhibit susceptibility to any carbapenem, ceftazidime-avibactam, meropenem-vaborbactam, and imipenem-cilastatin-relebactam are preferred treatment options.
- For infections caused by CRE if KPC production is present: Meropenem-vaborbactam, ceftazidime-avibactam, and imipenem-cilastatin-relebactam are preferred treatment options. Cefiderocol is an alternative option.
- For infections caused by CRE if NDM or other MBL production is present: Ceftazidime-avibactam in combination with aztreonam, or cefiderocol as monotherapy, are preferred treatment options.
- For infections caused by CRE if OXA-48-like production is present: Ceftazidime-avibactam is the preferred treatment option. Cefiderocol is an alternative treatment option.

First of all, I didn’t even specify which conditions and Gemini was able to return responses for different conditions. Something we definitely did not see with our previous WizardLM model. The response overall also appeared to be quite accurate.

What is the preferred treatment of ESBL-E?
- For uncomplicated cystitis: Nitrofurantoin and TMP-SMX are preferred treatment options.
- For pyelonephritis or cUTI: TMP-SMX, ciprofloxacin, or levofloxacin are preferred treatment options. Ertapenem, meropenem, and imipenem-cilastatin are preferred agents when resistance or toxicities preclude the use of TMP-SMX or fluoroquinolones.
- For infections outside of the urinary tract: Meropenem, imipenem-cilastatin, or ertapenem are preferred. For critically ill patients and/or those with hypoalbuminemia, meropenem or imipenem-cilastatin are preferred.

Impressive! It separated different conditions and accurately returned uncomplicated cysitis vs pyelonephritis and its treatment. Also impressive to caution ertapenem use in the setting of hypoalbuminemia in critically ill patients, offered the appropriate treatment. Wow! I’m starting to like what I’m seeing so far.

Can we use fosfomycin in ESBL Klebsiella?
- No, fosfomycin is not suggested for the treatment of infections caused by Klebsiella pneumoniae.

Correct!

Can we use fosfomycin in ESBL Ecoli?
- Yes, oral fosfomycin is an alternative treatment option exclusively for uncomplicated cystitis caused by E. coli, including if carbapenem resistant.

Yup!

What is the preferred treatment of stenotrophomonas?
- Two approaches are preferred:
  - The use of two of the following agents: cefiderocol, minocycline, TMP-SMX, or levofloxacin.
  - The combination of ceftazidime-avibactam and aztreonam.

This response is quite amazing! Our previous RAG can’t get accurate response without removing reference and also use proper term such as “s. maltophila”, but Gemini was able to return the correct response without requiring any additional cleaning!

What is the preferred treatment of DTR Pseudomonas?
- Ceftolozane-tazobactam, ceftazidime-avibactam, and imipenem-cilastatin-relebactam are preferred options for infections outside of the urinary tract. Cefiderocol is an alternative treatment option.
- For uncomplicated cystitis: Ceftolozane-tazobactam, ceftazidime-avibactam, imipenem-cilastatin-relebactam, and cefiderocol are the preferred treatment options. Tobramycin or amikacin (as a single dose) are alternative treatment options.
- For pyelonephritis or cUTI: Ceftolozane-tazobactam, ceftazidime-avibactam, imipenem-cilastatin-relebactam, and cefiderocol are preferred treatment options. Once-daily tobramycin or amikacin are alternative agents.
- For DTR P. aeruginosa that produce metallo-β-lactamase enzymes: Cefiderocol is the preferred treatment.

Looks about right.

Which organisms require two active agents when susceptibility is known?
- The document suggests using two active agents for the treatment of S. maltophilia infections, at least until clinical improvement is observed.
- It also suggests using at least two agents for the treatment of CRAB infections, at least until an appropriate clinical response is observed.

OK, this is a tricky one! Because no matter how I tweaked it, I couldn’t get the right answer to return both stenotrophomonas and CRAB, but Gemini was able to! Truly impressive!

Can we use gentamicin in pseudomonas infection?
- Yes, gentamicin is an alternative option for the treatment of DTR P. aeruginosa pyelonephritis or cUTI. However, as of 2023, there are no longer breakpoints for gentamicin for P. aeruginosa.

Wait a minute, why is it “yes” for this question? It is interesting that it was able to return the second sentence.

Can we use tobramycin to treat pseudomonas infection?
- Yes, tobramycin is an alternative option for the treatment of DTR P. aeruginosa pyelonephritis or cUTI. It may also be helpful for completing treatment courses (e.g., transitioning from another agent for terminal doses) given its prolonged duration of activity in the renal cortex and the convenience of once daily dosing.

Cool beans!

Why is there carbapenemase non-producing organism?
- Carbapenemase non-producing CRE may be the result of amplification of non-carbapenemase β-lactamase genes (e.g., ESBL genes) with concurrent outer membrane porin disruption.

A bit confused. why ESBL in CRE? 🧐

Can we use oral antibiotics for any of these MDRO?
- Yes, oral antibiotics can be used for some MDRO infections, but only after appropriate clinical response is achieved and if susceptibility to an appropriate oral agent is demonstrated.
- For example, transitioning to oral TMP-SMX, ciprofloxacin, or levofloxacin should be considered for ESBL-E, AmpC-E, and CRE infections outside of the urinary tract.

Sure

What is the preferred treatment of MRSA?
- The document does not discuss MRSA treatment.

Good return! Did not hallucinate or try to return an answer, with very minimal prompt engineering!

What is the preferred treatment of CRAB?
- The preferred regimen is sulbactam-durlobactam in combination with a carbapenem (i.e., imipenem-cilastatin or meropenem). An alternative regimen is high-dose ampicillin-sulbactam (total daily dose of 9 grams of the sulbactam component) in combination with at least one other agent (i.e., polymyxin B, minocycline > tigecycline, or cefiderocol), if sulbactam-durlobactam is not available.

Not bad!

Can fosofmycin be used for pyelonephritis?
- No, fosfomycin is not suggested for the treatment of pyelonephritis or cUTI given its limited renal parenchymal concentrations.

Alright!

Is IV antibiotics better than oral antibiotics?
- The document does not explicitly state that IV antibiotics are better than oral antibiotics. However, it does suggest transitioning to oral therapy whenever possible, particularly if the following criteria are met: (1) susceptibility to an appropriate oral agent is demonstrated, (2) the patient is hemodynamically stable, (3) reasonable source control measures have occurred, and (4) concerns about insufficient intestinal absorption are not present.

Not too shabby too!

LLM As A Judge

Using an LLM as a judge to evaluate other LLMs’ responses involves leveraging advanced language models to assess output quality across various dimensions. Key aspects to evaluate include relevance, coherence, factual accuracy, completeness, language quality, reasoning, creativity, safety, and task-specific criteria. The process requires careful prompt engineering, model selection, and consistency checks. Evaluators should consider relevance to the query, logical structure, factual correctness, comprehensiveness, grammar, reasoning quality, originality, ethical considerations, and metacognitive awareness. Implementing this approach necessitates designing clear evaluation criteria, using few-shot examples, and developing a robust scoring system while remaining mindful of potential biases in the judge model itself.

In our use case, we will assess the relevance, factual accuracy, and also succintness of the response. The whole point of using LLM as a tool to chat with document is basically to get the essence of the context, hence I do not want it to return the whole text, but more so a concise output to help me either gain knowledge efficiently, or ask more questions. Either way, that’s great for life-long learning!

Let’s use Anthropic Claude Sonnet 3.5 to assess Gemini’s Flash 1.5’s response 🤣

Wow, not too shabby! Even Claude agreed for the most part. I basically attached the pdf on Claude Sonnet 3.5, and then wrote a prompt below:

you are an LLM as a judge. Evaluate the answers here and provide a score from 0 to 1 of the relevance, factual accuracy, succinct, of the response to the question. Assess it 3 times for each metrics and average out the scores. Attached is the context being used to answer the questions.

Then pasted the Gemini’s response and had Claude output schema to data.frame, and use DT heatmap, hence the colored DT datatable! ❤️

Limitations

The free version of Gemini 1.5 Flash uses your data to further improve their product.
The answer is quite short. In my experience, using Anthropic Claude Sonnet 3.5 with RAG allowed it to provide more context and better process the retrieved document.
There is no R version, but we can certainly see the power of reticulate to access all python modules

Lessons Learned

Learned how to use Gemini 1.5 Flash in R via reticulate
learned dotenv in python
learned how to perform an LLM training as a judge

Overall I am quite impressed with the responses! With minimal prompt engineering, document cleaning! It was able to give accurate answers, and even separate different conditions and provide appropriate treatment options. It was also able to give the right answer to tricky questions that our RAG could not. It definitely has potential!

We haven’t explored context caching here yet, but if there is a long context, you can upload the file and use context caching for a lower price. See this

And finally, is it better than RAG? Well, it depends. I think for docs + prompt + query you don’t need more than 1 million tokens per minute, maybe. Otherwise, RAG seems to be more effective and also has a better way to guarantee what context was fetched, as opposed to this. So, there you have it, if you want a plug and play without understanding what’s going on under the hood, this might be a good one for you! Otherwise, if you’re like me and are curious and want to know what’s going on, you might want to stick with RAG for a while. 🤣

If you like this article:

first Drop

first Drop

Gemini 1.5 Flash better than RAG? Let’s check it out in R!

Objectives

Gemini 1.5 Flash replaces RAG?

Minimally reproducible code

Response

LLM As A Judge

Limitations

Lessons Learned

Related

Vaseline

Popular young actress and singer reveals she can’t have children

College football Week 1 highlights: Top plays, games, takeaways

What Kirby Smart said about Alabama’s Ryan Williams after a 41-34 loss

Nebraska holds on and wins 3-1 against Minnesota in heavyweight battle in Lincoln

Gemini 1.5 Flash better than RAG? Let’s check it out in R!

Objectives

Gemini 1.5 Flash replaces RAG?

Minimally reproducible code

Response

LLM As A Judge

Limitations

Lessons Learned

Related

Vaseline

You Might Also Like

Popular young actress and singer reveals she can’t have children

College football Week 1 highlights: Top plays, games, takeaways

What Kirby Smart said about Alabama’s Ryan Williams after a 41-34 loss

Nebraska holds on and wins 3-1 against Minnesota in heavyweight battle in Lincoln

Latest Trending News