I've known of this game since 2019. We used to play with cards instead of dice, which actually ended up removing too much from the complexity of the game. Now, you can play with your friends at
perudoonline.com without owning a truly questionable number of dice.
A little about the game -- it's a survival game where you win by being the last player standing. Here are the
rules if you want to learn more; they're also available on the website. See if you can come up with a strategy to beat your friends!
(01-13-25) My overall goal is to beat
dudo.ai. The core idea is fine-tuning counter-factual regret techniques using pytorch, which is similar to experts. Thus far, I've ported some code to a collab notebook for CUDA (GPU) access. I've coded up a lot of utilities. I've trained 2 similar models for a measly 6k iterations, where the only thing I varied is a hyperparameter that I invented. Their outputs make some sense. One model simplifies strategies a little more aggressively than the other.
I've messed with regret-hedge algorithms, including the one at
https://arxiv.org/pdf/0903.2851 that tried to avoid the problem of tuning a learning rate parameter. However, when I implemented it, weighting weights in proportion to re^(r^2) majorly amplifies the percent of the time a policy utilizes a high-regret action. In combination with our monte-carlo sampling, what ends up happening is that calling "lie" (an action that has high-regret to the random policies that initiate training) becomes overdone, with the strategy when no calls have been made just resembling an almost-deterministic policy basically calling some held face.
For the future, there are a lot of design choices to be made. Interestingly, there is the choice of learning rate scheduler and an exploration parameter that seems adjacent. There are competing regret frameworks that I can use. It's also possible to integrate linear programming (LP) into the model. I got some experience with them in CMUs 15451, where I found that making simple observations about games before applying LP can lead to a lot of speedup. I've found such opportunity within Perudo. (The dudo guy also mentioned LP, but seems to have applied a general-purpose conversion of solving sequential games to LPs.)