Case Study: Bolt Health

Let’s dig into a client of mine, Bolt Health. It is an employee wellness startup that “gets to know” it’s users through AI called large language models, or LLMs (think chatGPT).

My primary role was to help the team improve the AI responses. It gave generic, repetitive suggestions to the user.

Discovery

The first thing we did was to define the desired end state.

The team wanted an AI system that could provide the user with useful suggestions.
The underlying reason was to help the user address a problem, improve a habit or accomplish a goal

In this discovery, we identified two key areas to address.

A single request to the AI system to handle this wasn’t going to work
Replace the chat interface with a more guided UI

Luckily, solving the first area naturally addressed the second. Instead of a chat interface, the Bolt team used dedicated cards for each category.

(You may be thinking this isn’t “data” work, but stay with me 🙂

Decomposing the Request

The Bolt team wanted user suggestions to fit into four categories; social, physical, work, emotion; so we approached each category separately.

This meant writing more focused prompts, narrow set of user context included, and setting up a clear evaluation framework to measure response quality. Note: Optimizing user context process was out of scope

The evaluation framework was used in two ways

Ongoing evaluation of performance
Ability to compare previous evals vs. a new models or inputs (e.g a new “better” LLM is available)

Framework Example: Physical

We decided on three metrics outcomes to measure.

Wanted Response	Measurement	Average Last 7 Days	Coverage %
Related to physical wellbeing	Is related to physical wellbeing	95% (100% is goal)	100%
Not repetitive from previous	Was activity was suggested in the last 3 days? Yes / No (0, 1)	20% (0% is goal)	100%
“Vibe Check”	team answered No or Yes (0, 1)	50% (100% is goal)	20%

We used a few natural language processing techniques to automate the first two measurements. The “vibe check” was manually reviewed (which is a good thing).

The Outcome

I was able to apply my expertise to guide the Bolt team while developing the evaluation framework (using AI and standard analysis). Now the team is able to

Understand how their product is working
Confidently tweak model inputs or swap out models
Prioritize which areas of their product need attention

A good side effect is the team has a lot of new ideas for improving their product going forward.

Discovery

Decomposing the Request

Framework Example: Physical

The Outcome

Leave a Reply Cancel reply