EVDojo — Expected Value Dojo

Inspiration

“It’s insane we don’t have rapid feedback for this already! If you’re learning piano, you immediately hear when you press a wrong key; … You experiment, adjust, until you figure out how to get closer to your goal. you get feedback sometimes … But most of the time your behavior is causing concrete, visceral impacts … that you will never know about. It’s like trying to learn how to play a piano with earplugs in. All your moves fall into some great void. Nobody else is doing this. Someone should do it. I think it would be good for the world.”

As it turns out, many things in life don’t always have unambiguous feedback, and the beauty of tech should help bridge this gap.

This is where EV Dojo comes in. We want dojos — training arenas where we can fail and improve — so we can hone the expected value of our skills where a feedback loop may not exist otherwise.

The Math and The Model

We already have a useful way of solving the problem of a lack of clear feedback: coaches. Experts in a specific topic are great for giving insightful comments, but they tend to be costly and too individualized to scale to a wider public.

We can, however, leverage the implicit knowledge that experts and coaches have to create our own mini‑judges for each topic through a particular system. Since our goal is to have a generalized way to find these functions for each topic and let people interact with them in real time, we need a way for experts to collaboratively work together to train a function to approximate their decision‑making.

This is where the Bradley‑Terry (BT) pairwise preference model comes into play. If we have two choices, i and j in some set D, and we want to find their approximate ranking for all elements in D, we can optimize their individual state functions s_i by comparing the expected result of BT to our crowdsourced preferences.

(we also scale our function by each rater’s “trust value” to make it more robust)

With these s_i scores for each input, we then turn each score into a probability. Roughly, this equates to “the input i is good roughly p(i) percent of the time,” which means we can take the top and the bottom of all of our possible inputs as the “ticks” that are the most important for feedback.

To turn a score into a probability, we use Platt scaling.

Now that we have learned “value mini‑judgers,” we can use them to evaluate people on their topics in real time. To do this, we use a classification algorithm (unique to the modality — video, audio, text, etc.) to estimate the probability that a certain input is being exhibited, and then multiply that confidence by our internal probability to alert the user when we are confident they are doing something notably good or bad.

To further improve robustness, we model the brain’s neuron‑action potentials: an event requires persistence (consecutive inputs above threshold) and hysteresis (after firing, thresholds adjust slightly) so users aren’t spammed when signals hover around a boundary.

Writing Coach

Get immediate writing feedback as you write.

Choose a writing topic, then receive continuous edits while you write.

Expert A/B Labeling

Use your skills as an expert to rank A/B pairs and provide feedback.

Label A/B pairs quickly to train the topic fitness function. This is where we see how inputs get transformed into scores behind the scenes. These scores are then used to provide feedback and improve the model, eventually picking the good/bad input criteria that is used in both the writing coach and interview analysis tools.

Interview Analysis

Get Live Video response feedback for professionalism.

Practice with live feedback on smiles, nods, and other cues. This input-agnostic system uses a classification algorithm to get inputs from your live video feed, then picks the most relevant cues from our A/B expert-trained model to give you real-time feedback on your interview presence.

Admin & Debugging

Access advanced features for managing and troubleshooting the system.

Preview compare mode and debug metrics (no auth in debug). This is not fleshed out in the current version.