• Refuel
  • Posts
  • Refuel Updates (Oct '24)

Refuel Updates (Oct '24)

👋 We’re back with another edition of Refuel updates!

Here’s what’s new with Refuel from this past month:

  1. Task chaining

Task chaining makes it simple to build Compound AI systems.

Solving meaningful business problems requires you to build complex, multi-step workflows, while pulling in relevant information from internal or external data sources that help an LLM or set of models produce high-quality outputs. 

LLMs with a single prompt are insufficient – accuracy isn’t good enough + hard to manage/improve complex prompts. Instead, teams need a seamless way to define workflows that leverages pre-built templates for common task types (“classification”, “extraction”, “web search”, etc) that saves them weeks of effort. 

Take for example, the task of categorizing the risk levels of a business or a transaction.

A risk team would approach this by looking up a knowledge base of known risky businesses, performing a number of Google Searches, reading the company's website, and reviewing previous similar decisions to eventually reach an output.

Contrast this with making a single LLM call where the model relies solely on its training data. LLMs by themselves are not connected to your evolving enterprise data, and haven't yet learnt from business decisions made by your team in the past.

We’ve now made it easier to build Compound AI Systems with Task Chaining.

  1. Distillation using teacher LLMs

You can now distill larger “teacher” LLMs into smaller “student” models to increase throughout and reduce latency for your data tasks. And it just takes 3 clicks in Refuel. 

Sometimes, the task at hand might be simple enough to be tackled by a smaller model, especially when you have sufficient data for training (which might be ~100 rows). Larger models can be slow and expensive (especially if you have significant data volumes) – Refuel offers smaller models at three sizes: 47B parameters, 8B parameters and 1.5B parameters. 

We’re excited to launch model distillation – this is the simplest way to execute your task with higher throughput, lower latency, and lower costs without degradation in accuracy.

  1. Usage Monitoring

Monitor request volume, latency, and token usage all in one panel.

This allows you to keep an eye on performance drops and cost estimations, and unlocks full observability for production traffic.

Bonus:

Let us know if you have questions or feature suggestions – see you next month!