Gary

Gary (aka Gaming Gary™️) is a Neuro simulator written in Python. Gary allows you to use models downloaded onto your computer for testing Neuro Game API integrations.

Gary is maintained by Govorunb, and can be found here.

Setup

Clone Gary into a folder:

git clone https://github.com/Govorunb/gary

cd into it and sync the uv lockfile:
Terminal window
```
cd gary
uv sync
```
Run the uv command:
Terminal window
```
uv run gary
```
It shouldn’t start immediately as some configuration is required, but this is to verify everything works and no unknown error pop up.

Gary does not come with models by default. See Configuration - Before startup for potential areas to get models from.

Configuration

Before startup

Gary does not come with any models by default. As such, you’ll need to download a model yourself to use.

Once you have your model downloaded, place it in the repository, and rename it to have an _ at the start (so it gets ignored by Git). You should also copy config.yaml and make a new config file (whose name also starts with _), then customise it to point to your model and set other params. Finally, start Gary with this command:

uv run gary --config <YOUR_CONFIG>.yaml # optional: --preset <PRESET_NAME>

or configure using a .env file:

GARY_CONFIG_FILE=_your_config.yaml
GARY_CONFIG_PRESET=randy

And it should now launch successfully.

After startup

Gary’s web panel should be accessible at http://localhost:8001 (or whatever the port you set in the config file is, plus 1). You should see Gary’s configuration panel.

With this web panel, you can see the incoming/outgoing packets by the game/Gary, as well as the list of actions and a config on the right-hand side.

Toggling “Tony Mode” stops using the model and instead acts similar to Tony, allowing you to send actions on behalf of your model.

Tips from Govorunb about using your model:

Smaller models are generally less intelligent than larger ones. A 3B model may not be able to perform logical leaps or multi-step actions without extreme handholding.

Since this project is focused on local models, success will depend on your model/hardware. Gary might turn out to be dumber than a rock when it comes to strategy and decisionmaking (which is ironic because it’s made of rock) - maybe even worse than Randy. If so, Gary probably cannot help you and you’d be better off using Randy, Tony, or Jippity instead.

That being said, it’s always better in the long run to invest effort into refining your prompts to make things clearer. Getting a less intelligent model to successfully play your game will help more intelligent models make even smarter decisions.

Prompting (descriptions, context)

Use direct and concise language
- Having less text to process makes the LLM faster and more focused
- Aim for high information density - consider running your prompts through a summarizer
Do your best to keep a consistent tone
- All context influences the response and context that is out-of-tone can throw off the model
- (opinion) Flowery or long-winded descriptions should be used very sparingly
Natural language (e.g. Consider your goals) is okay - it is a language model, after all
- That said, language models are not humans - watch this short video for a very brief overview of how LLMs work
If you are testing with a small model (under 10B):
- Keep in mind Neuro might act differently from your model
- Including/omitting common-sense stuff can be hit or miss
- Rules with structured info (e.g. with Markdown) seem to perform better than unstructured
- Try more models (and try a bigger model - even if it’s slower) to see what info is generally useful and what’s just a quirk of your specific model

Managing context

Generally, LLMs prioritize the most recent context more when generating.

Send a description of the game and its rules on startup
Keep context messages relevant to upcoming actions/decisions
Send reminders of rules/tips/state at breakpoints, e.g. starting a new round

If an action fails because of game state (e.g. trying to place an item in an occupied slot), you should attempt, preferrably in this particular order:

Disallow the illegal action (by removing the illegal parameter from the schema, or by unregistering the action entirely)
- This is the best option as there’s no chance for mistakes at all (unless Neuro decides to ignore the schema)
Suggest a suitable alternative in the result message
- For example, "Battery C is currently charging and cannot be removed. Batteries A and B are charged and available."
Send additional context as a state reminder on failure so the model can retry with more knowledge
Or, register a query-like action (e.g. check_inventory) that allows the model to ask about the state at any time and just hope for the best

Quoted from the repository README.

Known issues

Taken (more or less) verbatim from the repository README.

Context trimming

Trimming context (for continuous generation) only works with the llama_cpp engine. Other engines will instead fully truncate context, and may rarely fail due to overrunning the context window.

Guidance token forwarding

There’s a quirk with the way guidance enforces grammar that can sometimes negatively affect chosen actions.

Basically, if the model wants something invalid, it will pick a similar or seemingly arbitrary valid option. For example:

The game is about serving drinks at a bar, with valid items to pick up/serve being "vodka", "gin", etc
The model gets a bit too immersed and hallucinates about pouring drinks into a glass (which is not an action)
When asked what to do next, the model wants to call e.g. pick up on "glass of wine"
Since this is not a valid option, guidance picks "gin" because (gives a long explanation)

For nerds - guidance uses the model to generate the starting token probabilities and forwards the rest as soon as it’s fully disambiguated.

In this case, "g has the highest likelihood of all valid tokens, so it gets picked; then, in" is auto-completed because "gin" is the only remaining option (of all valid items) that starts with "g.

Learn more

In a case like this, it would have been better to just let it fail and retry - oh well, at least it’s fast.

JSON schema support

Not all JSON schema keywords are supported in Guidance. You can find an up-to-date list here.

Unsupported keywords will produce a warning and be excluded from the grammar.

Following the Neuro API spec is generally safe. If you find an action schema is getting complex or full of obscure keywords, consider logically restructuring it or breaking it up into multiple actions.

Miscellaneous jank

The web interface can be a bit flaky - keep an eye out for any exceptions in the terminal window and, when in doubt, refresh the page

Implementation-specific behaviour

There may be cases where other backends (including Neuro) may behave differently.

Differences marked with 🚧 will be resolved or obsoleted by the v2 of the API.

Gary will always be different from Neuro in some aspects, specifically:
- Processing other sources of information like vision/audio/chat (for obvious reasons)
- Gary is not real and will never message you on Discord at 3 AM to tell you he’s lonely 😔
- Myriad other things like response timings, text filters, allowed JSON schema keywords, long-term memories, etc
🚧 Registering an action with an existing name will replace the old one (by default, configurable through gary.existing_action_policy)
Only one active websocket connection is allowed per game; when another tries to connect, either the old or the new connection will be closed (configurable through gary.existing_connection_policy)
🚧 Gary sends actions/reregister_all on every connect (instead of just reconnects, as in the spec)
etc etc, just search for “IMPL” in the code

Remote services? (OpenAI, Anthropic, Google, Azure)

Only local models are supported. Guidance does allow using remote services, but it cannot enforce grammar/structured outputs if it can’t hook itself into the inference process, so it’s more than likely it’ll just throw exceptions because of invalid output instead.

log excerpt showing remote generation failed after exceeding the limit of 10 attempts

Therefore, they are not exposed as an option at all. You should use Jippity instead anyway.

For more info, check the guidance README or this issue comment.