GameObject - AI-Adventure

posted

2024-04-06

updated

2024-09-14

This is my entry to the Olc-Coding-Jam 2024 (version 0.0.4).
The theme of the Jam was "Run"
See Itch.io Page

Idea

The idea of this project was to have an adventure game that is mostly generated with the help of AI.
I wanted a system where a "Game Designer" could just interact with an AI and shape a fully functional Adventure game,
without much direct intervention in the code.
The currently shown adventure itself is supposed to be about a restless spirit,
which haunts the player through some forrest with the intention to keep the player running.
The player has to figure out why and how to escape the area alive.

The Engine

First I needed to construct some engine that would allow playing an Adventure Game.
Since at the time i thought that the current AI systems would understand it the best and that the implementation would be simpler,
I chose to use plain HTML as the basis for the engine.
I registered some custom-elements that would allow defining the attributes and logic i needed.

// primary elements defining either a location/scene or an interactive object on the scene.
window.customElements.define('game-interactable', HTMLGameInteractable);
window.customElements.define('game-location', HTMLGameLocation);

// elements inside the objects
window.customElements.define('game-data', HTMLGameData);
window.customElements.define('game-data-action', HTMLGameDataAction);
window.customElements.define('game-data-property', HTMLGameDataProperty);

// Helper elements to structure the logic and define the interactions
window.customElements.define('game-logic', HTMLGameLogic);
window.customElements.define('game-logic-action', HTMLGameLogicAction);
window.customElements.define('game-logic-option',HTMLGameLogicOption);
window.customElements.define('game-logic-condition', HTMLGameLogicCondition);

All the logic and what the player gets to see was now to be defined with HTML.
Each scene or object is stored in a separate ".xml" file, which can almost fully validate against a DTD-schema.
This approach was supposed to give better feedback to the AI should it make a syntactic mistake.

Locations, for the most part, simply define their background music plus image and include the objects that are inside the scene.
However they also may contain some logic to what happens when the player enters the scene.
This allows the example below to give a first time comment from the player character.

<!DOCTYPE game-location SYSTEM "http://www.game-object.com/dtd/location">
<game-location id="scene-lost-forest" class="scene" music="title">

    <!--     The first screen the adventure wakes up at    -->

    <img
            class="background"
            src="./images/dummy.png"
            data-src="/assets/images/backgrounds/lost-forest.png"
            alt="The awakening"
    />

    <!--     The interactive logic definition starts here     --> 

    <game-data class="data">
        <game-data-action type="enter">
            <game-logic type="once">
                <game-logic-action type="output">
                    Ouf!
                    ...
                    Where am I?
                </game-logic-action>
                <game-logic-action type="output">
                    I don't remember anything.
                </game-logic-action>
                <game-logic-action type="output">
                    I should look around to figure out what's going on.
                </game-logic-action>
            </game-logic>
        </game-data-action>
    </game-data>

    <!-- This part includes the objects on the scene -->

    <load src="/src/html/objects/lost-forest/lost-forest-2.xml"/>
    <load src="/src/html/objects/lost-forest/lost-forest-3.xml"/>
    <load src="/src/html/objects/lost-forest/leaves.xml"/>
    <load src="/src/html/objects/lost-forest/footprints.xml"/>
</game-location>

The interactable objects are a little more complex and often contain more logic.
First they have some attributes that defines their size and position on the scene.
They should be in percentages so screen sizes do not matter.

<!DOCTYPE game-interactable SYSTEM "http://www.game-object.com/dtd/interactable">
<game-interactable
    id="lost-forest/footprints"
    class="object"
    cx="41%"
    cy="73%"
    width="20%"
>

Then follows some "game-data" block that contains the interactive logic.

A "game-data-action" element serves as an entrypoint or event trigger.
It's attribute event may have different values like

interact"
"inspect"
"pick-up"
"talk"
"combine", where a second attribute "using" refers to the other object

These types all refer to actions the user can trigger when clicking on the object.

The "game-logic-action" element then defines what the game engine should do.
It offers things like:

"output", reading out some text
"set-variable", setting some variables, global or local to the object
"set-attribute", change element-attribute like the position of an object in a scene
"change-scene", opens another scene
"add-inventory", add some object to the players inventory

The "game-logic" and "game-logic-condition" element can wrap these actions to play them only if some conditions are met,
or decide whether to play them in sequence forcing the user to click multiple times, rather to trigger them all with a single click.
The game-logic-type "first" is also quite useful to trigger something only the very first time and then never again.
Allowing the game to output a first impression of the character after an action, that should not be repeated every time.
The conditions can refer to global variables or the state of other object and compare them to be true or false.

    <game-data class="data">
        <game-data-action type="interact">
            <game-logic-action type="trigger">inspect</game-logic-action>
        </game-data-action>
        <game-data-action type="inspect">
            <game-logic type="sequence">
                <game-logic-action type="output">
                    Footprints? That must be, where I came from.
                </game-logic-action>
            </game-logic>
        </game-data-action>
    </game-data>

At the end the object then needs to define the image that is shown for it, and and svg defines the outline where the user can click/hover to be able to target the element.

    <img
            class="image"
            src="./images/dummy.png"
            data-src="/assets/images/objects/footprints/footprints.png"
    />
    <object
            class="click-area"
            data="./images/dummy.png"
            data-data="/assets/images/objects/footprints/footprints.svg"
            type="image/svg+xml"
    > </object>
</game-interactable>

The AI-Pipeline

Currently only the asset generation is done using AI.
All attempts at giving the document definitions to ChatGPT did not end well.
It would repeatedly invent new types of action, or not structure them correctly.
(I have not yet tried gpt-o1 though, this was done using GPT-4)

For the most part I gave chatGpt a rough description of a scene
and it would give me the prompts for the background image and music.
Then i took those to Midjourney and Suno.ai.
There i repeated and adjusted the prompt until it was something i could use.
Most of the time i had to go back to ChatGPT though and adjust the storry around the scene though,
which was easier then getting the correct image.

For the objects i had to add a manual step in which i took my image-editing program.
And had to cut out the supposedly transparent background of an image, as midjourney has no transparency by default.
Also i then had to manually generate the objects-contour by first selecting the object,
fill it with black and then convert it to a vector-graphic and export as svg.

The only "easy" part of the pipeline was the audio generation using chatGPT.
I added some debug mechanic to the engine that would check for all possible "output" text ,
that is defined in the HTML and check for missing audio data.
If some text had no corresponding audio track it would print the missing lines in the console.
I would then copy those lines and feed them to a script which would send api requests to the ChatGPT audio AI
and return the generated voice-lines and add them correctly to the HTML automatically.

Behind the Curtain

Of course there is a lot of javascript/typescript behind it all to make it all work.
At the start of the game all assets are also pre-loaded to be able to smoothly switch between scenes.
Having a queue that runs the audio in the correct order while also allowing the user to skip some animations/text.
Not to mention the logic that is defined in the HTML needs to be followed correctly.

Final Thoughts

The system as it is still requires a lot of manual labor and is quite frustrating when it comes to iterating between
having a good story or puzzle the user should solve and getting midjourney to generate the matching images.
In the future, before continuing I would probably first make sure i have full on automated API pipelines to generate all assets,
which would mean I use Dall-E for images instead of Midjourney as Midjourney does not have an API.

Also I should probably test if I can build an automated flow for the AI to generate and iterate over the HTML scripts.

AI-Adventure

posted

updated

Idea

The Engine

The AI-Pipeline

Behind the Curtain

Final Thoughts

Releases

0.0.4

0.0.3

0.0.2

0.0.1

Comments