Gary
Gary (aka Gaming Gary™️) is a Neuro simulator written in Python. Gary allows you to use models downloaded onto your computer for testing Neuro Game API integrations.
Gary is maintained by Govorunb, and can be found here.
-
Clone Gary into a folder:
Terminal window git clone https://github.com/Govorunb/gary -
cdinto it and sync the uv lockfile:Terminal window cd garyuv sync -
Run the uv command:
Terminal window uv run garyIt shouldn’t start immediately as some configuration is required, but this is to verify everything works and no unknown error pop up.
Configuration
Section titled “Configuration”Before startup
Section titled “Before startup”Gary does not come with any models by default. As such, you’ll need to download a model yourself to use.
Once you have your model downloaded, place it in the repository, and rename it to have an _ at the start (so it gets ignored by Git).
You should also copy config.yaml and make a new config file (whose name also starts with _), then customise it to point to your model and set other params.
Finally, start Gary with this command:
uv run gary --config <YOUR_CONFIG>.yaml # optional: --preset <PRESET_NAME>or configure using a .env file:
GARY_CONFIG_FILE=_your_config.yamlGARY_CONFIG_PRESET=randyAnd it should now launch successfully.
After startup
Section titled “After startup”Gary’s web panel should be accessible at http://localhost:8001 (or whatever the port you set in the config file is, plus 1). You should see Gary’s configuration panel.
With this web panel, you can see the incoming/outgoing packets by the game/Gary, as well as the list of actions and a config on the right-hand side.
Toggling “Tony Mode” stops using the model and instead acts similar to Tony, allowing you to send actions on behalf of your model.
Known issues
Section titled “Known issues”Taken (more or less) verbatim from the repository README.
Context trimming
Section titled “Context trimming”Trimming context (for continuous generation) only works with the llama_cpp engine. Other engines will instead fully truncate context, and may rarely fail due to overrunning the context window.
Guidance token forwarding
Section titled “Guidance token forwarding”There’s a quirk with the way guidance enforces grammar that can sometimes negatively affect chosen actions.
Basically, if the model wants something invalid, it will pick a similar or seemingly arbitrary valid option. For example:
- The game is about serving drinks at a bar, with valid items to pick up/serve being
"vodka","gin", etc - The model gets a bit too immersed and hallucinates about pouring drinks into a glass (which is not an action)
- When asked what to do next, the model wants to call e.g.
pick upon"glass of wine" - Since this is not a valid option, guidance picks
"gin"because (gives a long explanation)
For nerds - guidance uses the model to generate the starting token probabilities and forwards the rest as soon as it’s fully disambiguated.
In this case, "g has the highest likelihood of all valid tokens, so it gets picked; then, in" is auto-completed because "gin" is the only remaining option (of all valid items) that starts with "g.
In a case like this, it would have been better to just let it fail and retry - oh well, at least it’s fast.
JSON schema support
Section titled “JSON schema support”Not all JSON schema keywords are supported in Guidance. You can find an up-to-date list here.
Unsupported keywords will produce a warning and be excluded from the grammar.
Following the Neuro API spec is generally safe. If you find an action schema is getting complex or full of obscure keywords, consider logically restructuring it or breaking it up into multiple actions.
Miscellaneous jank
Section titled “Miscellaneous jank”- The web interface can be a bit flaky - keep an eye out for any exceptions in the terminal window and, when in doubt, refresh the page
Implementation-specific behaviour
Section titled “Implementation-specific behaviour”There may be cases where other backends (including Neuro) may behave differently.
Differences marked with 🚧 will be resolved or obsoleted by the v2 of the API.
- Gary will always be different from Neuro in some aspects, specifically:
- Processing other sources of information like vision/audio/chat (for obvious reasons)
- Gary is not real and will never message you on Discord at 3 AM to tell you he’s lonely 😔
- Myriad other things like response timings, text filters, allowed JSON schema keywords, long-term memories, etc
- 🚧 Registering an action with an existing name will replace the old one (by default, configurable through
gary.existing_action_policy) - Only one active websocket connection is allowed per game; when another tries to connect, either the old or the new connection will be closed (configurable through
gary.existing_connection_policy) - 🚧 Gary sends
actions/reregister_allon every connect (instead of just reconnects, as in the spec) - etc etc, just search for “IMPL” in the code
Remote services? (OpenAI, Anthropic, Google, Azure)
Section titled “Remote services? (OpenAI, Anthropic, Google, Azure)”Only local models are supported. Guidance does allow using remote services, but it cannot enforce grammar/structured outputs if it can’t hook itself into the inference process, so it’s more than likely it’ll just throw exceptions because of invalid output instead.

Therefore, they are not exposed as an option at all. You should use Jippity instead anyway.
For more info, check the guidance README or this issue comment.