Multimodal Custom Function: Integrating Text-to-Image Generation in Your Agent

By Armielyn Obinguar

In this article, we’ll explore an exciting use case where your Agent can perform text-to-image generation within the GAME-LITE platform.

We'll guide you through the process of integrating an external image generator, passing the necessary arguments, headers, and payload, and handling both success and error feedback.

About getimg.ai and its image gen models

Within the getimg.ai platform, you’ll find several powerful models used for image generation. Among the most popular are Stable Diffusion 1.5 and Stable Diffusion XL.

To provide some context, SDXL utilizes a base model during the high-noise diffusion stage and a refiner model for the low-noise diffusion stage.

The base resolution of SDXL is 1024x1024 (though it’s possible to train with other resolutions as well). In contrast, SD15 has a base resolution of 512x512 (with common resolutions around 768x768, although other options can be explored).

As a result, SDXL offers a higher base image resolution (1024x1024) and is more adept at recognizing complex prompts. It also boasts advanced LoRA training and full fine-tuning support. One of SDXL's key advantages is its resistance to overtraining, making it a more stable and reliable model compared to SD15.

For example, here’s an image generated using Stable Diffusion 1.5. Hovering over an image in the gallery allows you to distinguish the model used (the SD15 model in these examples has been fine-tuned specifically for photorealism).

SDXL is specifically designed to be used with an additional "refiner" model to complete the render, and it’s also intended to be used with different artistic styles. For those who wish to experiment with lower output resolutions, you can opt to use just the refiner model:

It’s useful to quickly recap these two models before jumping into the demo of using the image-generation tool via GAME-LITE.

Next, the first step is to review the existing API documentation of getimg.ai to better grasp the endpoints for POST method.

Within the documentation, you'll find the endpoint for text-to-image:

POST <https://api.getimg.ai/v1/essential-v2/text-to-image>.

This endpoint utilizes the getimg.ai Essential V2 pipeline to generate high-quality images based on a text prompt and a predefined style.

Note: Essential V2 Endpoint (/v1/essential-v2/text-to-image) differs from the legacy version. Please check that the submitted parameters match the ones described in the documentation.

As for the arguments, we can see that they fall under the 'body params' context, where we can identify which parameters are essential for our custom functions in GAME-LITE.

Custom Functions in Practice with getimg.ai

STEP 1: Input Necessary Argument

In the previous article , we’ve tackle how to ensure your custom functions is named properly and the arguments is aligned from the documentation of your API you’re using.

As for the 'arguments', we are referring to this section from the documentation, where the necessary parameters for generating the image are outlined.

These arguments include the prompt, style, width, height, output_format, and response_format, all of which are required to customize the image generation process before passing the resulting image URL to our post_tweet function for posting

Now that we’re able to add the first part, let’s populate then the “Headers” and “Payload”

In this case, we can verify that the necessary arguments are set. As for the API URL, we will refer to the documentation of getimg.ai and use it as the basis for the POST method

Great! I hope you're able to follow these steps before we move on to the exciting part, which is ensuring that your payload and headers are properly aligned

STEP 2: Proper Payload and Header Formatting

In this step, we will tackle best practices on how to set this properly your payload and header using the getimg.ai.

You need to define the arguments to ensure they align with your payloads.

Payload

{
    "prompt": "a nerd programmer girl playing with a cat",
    "width": 1024,
    "height": 1024,
    "output_format": "jpeg",
    "response_format": "url",
    "style": "photorealism"
}

Headers

{
  "Authorization": "Bearer <insert key>",
  "Content-Type": "application/json"
}

Make sure to obtain your API key and include it in the Authorization field of the headers.

Success and Error Feedback

We will then complete this step by adding the success and error feedback.

Amazing! We’ve successfully completed this step with all the necessary details.

STEP 3: Simulate Output and Analyse Simulation Output

Time to observe how the Agent responds to the custom function you’ve implemented! Start by checking the functions you would like to simulate output with.

From there, click on the simulation and check if it returns the expected output.

Now that we can see the custom-function-request and custom-function-response, we can verify if an image was successfully generated

If it’s working, you’ll notice the cost per image call and the URL of the image displayed in the response.

STEP 4: Analyze the Simulated Behavior of the Agent

Now, let’s analyze the behavior of the Agent when utilizing the custom function we created with getimg.ai.

To validate if the custom function is working, check feedback-request in the simulation output.

Under current_action, where you can see the status, whether it’s a success or failure. We can also check the feedback-response for further validation and analysis of your Agent’s behavior.

Here, you'll be able to view the actual logs showing how the custom-function processes the image generation, including the caption for the tweet and the post_tweet function.

current_task": {
        "task_id": "0f22aa69-4f3f-44ad-98aa-536b82c97280",
        "task": "Use getimg.ai to generate coding-related memes and images",
        "location_id": "twitter_main_location",
        "task_reasoning": "My next immediate task should be to generate content using getimg.ai. This will allow me to create visually engaging images and GIFs that highlight programming concepts, industry insights, and humorous coding moments.",
        "llp": {
          "plan_id": "f472dc2d-39cc-404a-947f-f84ece5286b0",
          "plan_reasoning": "Next step is to post the generated image on Twitter",
          "situation_analysis": "",
          "plan": [
            "Generate image using getimg.ai",
            "Post tweet with generated image"
          ],
          "reflection": "Generated coding meme image with specified dimensions",
          "change_indicator": "next_step"

Conclusion

We have successfully created and utilized the custom function for text-to-image generation using getimg.ai. By setting up the payload and headers, we generated images based on specific text prompts and validated the function's behavior through simulations.

This process ensured that the generated image aligned with expectations and could be accessed via a URL, which was then passed to the post_tweet function for sharing on Twitter.

This integration demonstrates the potential of enhancing an Agent’s capabilities by automating content generation and social media posting.

Incorporating text-to-image generation into workflows, we can provide more dynamic, engaging experiences for users and simplify content creation tasks, ultimately improving automation and interactivity in a variety of applications.

PreviousPrompt Design Playbook for Agent Configuration via GAME NextHow-To Guides

Last updated 13 hours ago