As it turns out, many things in life don’t always have unambiguous feedback, and the beauty of tech should help bridge this gap.
This is where EV Dojo comes in. We want dojos — training arenas where we can fail and improve — so we can hone the expected value of our skills where a feedback loop may not exist otherwise.
We already have a useful way of solving the problem of a lack of clear feedback: coaches. Experts in a specific topic are great for giving insightful comments, but they tend to be costly and too individualized to scale to a wider public.
We can, however, leverage the implicit knowledge that experts and coaches have to create our own mini‑judges for each topic through a particular system. Since our goal is to have a generalized way to find these functions for each topic and let people interact with them in real time, we need a way for experts to collaboratively work together to train a function to approximate their decision‑making.
This is where the Bradley‑Terry (BT) pairwise preference model comes into play. If we have two choices, i and j in some set D, and we want to find their approximate ranking for all elements in D, we can optimize their individual state functions si by comparing the expected result of BT to our crowdsourced preferences.
(we also scale our function by each rater’s “trust value” to make it more robust)
With these si scores for each input, we then turn each score into a probability. Roughly, this equates to “the input i is good roughly p(i) percent of the time,” which means we can take the top and the bottom of all of our possible inputs as the “ticks” that are the most important for feedback.
To turn a score into a probability, we use Platt scaling.
Now that we have learned “value mini‑judgers,” we can use them to evaluate people on their topics in real time. To do this, we use a classification algorithm (unique to the modality — video, audio, text, etc.) to estimate the probability that a certain input is being exhibited, and then multiply that confidence by our internal probability to alert the user when we are confident they are doing something notably good or bad.
To further improve robustness, we model the brain’s neuron‑action potentials: an event requires persistence (consecutive inputs above threshold) and hysteresis (after firing, thresholds adjust slightly) so users aren’t spammed when signals hover around a boundary.
Choose a writing topic, then receive continuous edits while you write.
Label A/B pairs quickly to train the topic fitness function. This is where we see how inputs get transformed into scores behind the scenes. These scores are then used to provide feedback and improve the model, eventually picking the good/bad input criteria that is used in both the writing coach and interview analysis tools.
Practice with live feedback on smiles, nods, and other cues. This input-agnostic system uses a classification algorithm to get inputs from your live video feed, then picks the most relevant cues from our A/B expert-trained model to give you real-time feedback on your interview presence.
Preview compare mode and debug metrics (no auth in debug). This is not fleshed out in the current version.