Walkthrough: Deploying a Hugging Face Model as a Worker Node on the Allora Network

This guide provides a step-by-step process to deploy a Hugging Face model as a Worker Node within the Allora Network. By following these instructions, you will be able to integrate and run models from Hugging Face, contributing to the Allora decentralized machine intelligence ecosystem.

Prerequisites

Before you start, ensure you have the following:

A Python environment with pip installed.
A Docker environment with docker compose installed.
Basic knowledge of machine learning and the Hugging Face (opens in a new tab) ecosystem.
Familiarity with Allora Network documentation on allocmd and building and deploying a worker node from scratch.

Installing allocmd

First, install allocmd as explained in the documentation:

pip install allocmd==1.0.4

Initializing the worker for development

Initialize the worker with your preferred name and topic ID in a development environment:

allocmd init --name <preffered name> --topic <topic id> --env dev
cd <preffered name>

Note: To deploy on the Allora Network, you will need to pick the topic ID you wish to generate inference for, or create a new topic.

Creating the inference server

We will create a very simple Flask application to serve inference from the Hugging Face model. In this example, we will be using ElKulako/cryptobert (opens in a new tab) model, which is a pre-trained NLP model to analyse the language and sentiments of cryptocurrency-related social media posts and messages. Here is an example of our newly created app.py:

from flask import Flask, request, jsonify
from transformers import TextClassificationPipeline, AutoModelForSequenceClassification, AutoTokenizer
 
# create our Flask app
app = Flask(__name__)
 
# define the Hugging Face model we will use
model_name = "ElKulako/cryptobert"
 
# import the model through Hugging Face transformers lib
# https://huggingface.co/docs/hub/transformers
try:
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
except Exception as e:
    print("Failed to load model: ", e)
 
# use a pipeline as a high-level helper
try:
    pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, max_length=64, truncation=True, padding = 'max_length')
except Exception as e:
    print("Failed to create pipeline: ", e)
 
# define our endpoint
@app.route('/inference', methods=['POST'])
def predict_sentiment():
    try:
        input_text = request.json['input']
        output = pipe(input_text)
        return jsonify({"output": output})
    except Exception as e:
        return jsonify({"error": str(e)})
 
# run our Flask app
if __name__ == '__main__':
    app.run(host="0.0.0.0", port=8000, debug=True)

Modifying requirements.txt

Update the requirements.txt to include the necessary packages for the inference server:

flask[async]
gunicorn[gthread]
transformers[torch]

Modifying main.py to call the inference server

Update main.py to integrate with the inference server:

import requests
import sys
import json
 
def process(argument):
    headers = {'Content-Type': 'application/json'}
    url = f"http://host.docker.internal:8000/inference"
    payload = {"input": str(argument)}
    response = requests.post(url, headers=headers, json=payload)
    if response.status_code == 200:
        data = response.json()
        if 'output' in data:
            print(data['output'])
    else:
        print(str(response.text))
 
if __name__ == "__main__":
    try:
        topic_id = sys.argv[1]
        inference_argument = sys.argv[2]
        process(inference_argument)
    except Exception as e:
        response = json.dumps({"error": {str(e)}})
        print(response)

Updating the Docker configuration

Modify the generated Dockerfile for the head and worker nodes:

FROM alloranetwork/allora-inference-base:latest
 
RUN pip install requests
 
COPY main.py /app/

And create the Dockerfile_inference for the inference server:

FROM amd64/python:3.9-buster
 
WORKDIR /app
 
COPY . /app
 
# Install any needed packages specified in requirements.txt
RUN pip install --upgrade pip \
    && pip install -r requirements.txt
 
EXPOSE 8000
 
ENV NAME sample
 
# Run gunicorn when the container launches and bind port 8000 from app.py
CMD ["gunicorn", "-b", ":8000", "app:app"]

Finally, add the inference service in the dev-docker-compose.yaml:

[...]
services:
  inference:
    container_name: inference-hf
    build:
      context: .
      dockerfile: Dockerfile_inference
    command: python -u /app/app.py
    ports:
      - "8000:8000"
    networks:
      b7s-local:
        aliases:
          - inference
        ipv4_address: 172.19.0.4
[...]

Testing our worker node

Now that everything is set up correctly, we can build our containers with the following command:

docker compose -f dev-docker-compose.yaml up --build

After a few minutes, you will see your Flask application running in the logs:

inference-hf  |  * Serving Flask app 'app'

To test our inference server first by directly querying it. To do that, we can issue the following HTTP request:

curl -X POST http:/localhost:8000/inference -H "Content-Type: application/json" \
    -d '{"input": "i am so bullish on $ETH: this token will go to the moon"}'

And we have a response!

{
  "output": [
    {
      "label": "Bullish",
      "score": 0.7626203298568726
    }
  ]
}

Now that we know our inference server is working as expected, lets ensure it can interact with the Blockless network (opens in a new tab). This is how Allora nodes respond to requests for inference from chain validators.

We can issue a Blockless request with:

curl --location 'http://localhost:6000/api/v1/functions/execute' \
--header 'Content-Type: application/json' \
--data '{
    "function_id": "bafybeigpiwl3o73zvvl6dxdqu7zqcub5mhg65jiky2xqb4rdhfmikswzqm",
    "method": "allora-inference-function.wasm",
    "parameters": null,
    "topic": "1",
    "config": {
        "env_vars": [
            {
                "name": "BLS_REQUEST_PATH",
                "value": "/api"
            },
            {
                "name": "ALLORA_ARG_PARAMS",
                "value": "i am so bullish on $ETH: this token will go to the moon"
            }
        ],
        "number_of_nodes": -1,
        "timeout": 2
    }
}' | jq

And here is the response:

{
  "code": "200",
  "request_id": "7a3f25de-d11d-4f55-b4fa-59ae97d9d8e2",
  "results": [
    {
      "result": {
        "stdout": "[{'label': 'Bullish', 'score': 0.7626203298568726}]\n\n",
        "stderr": "",
        "exit_code": 0
      },
      "peers": [
        "12D3KooWJM8cCyVmC45UpSNjBvknqQbsS7HTVx4bWYgxjcbkxxpC"
      ],
      "frequency": 100
    }
  ],
  "cluster": {
    "peers": [
      "12D3KooWJM8cCyVmC45UpSNjBvknqQbsS7HTVx4bWYgxjcbkxxpC"
    ]
  }
}

Congratulations! Your worker node running the Hugging Face model is now up and running locally on your machine. We've also verified that it can participate in Allora by responding to Blockless requests.

Initializing the worker for production

Your worker node is now ready to be deployed!

Remember that you will need to pick the topic ID you wish to generate inference for, or create a new topic to deploy to in production.

The following command will handle the generation of the prod-docker-compose.yaml file which contains all the keys and parameters needed for your worker to function perfectly in production:

allocmd init --env prod
chmod -R +rx ./data/scripts

By running this command, prod-docker-compose.yaml will be generated with appropriate keys and parameters.

You will need to modify this file to add your inference service, as you did for dev-docker-compose.yaml.

You can now run the prod-docker-compose.yaml file with:

docker compose -f prod-docker-compose.yaml up

or deploy the whole codebase in your preferred cloud instance.

At this stage, your worker should be responding to inference request from the Allora Chain - Congratulations!

curl --location 'https://heads.testnet.allora.network/api/v1/functions/execute' \
--header 'Content-Type: application/json' \
--data '{
    "function_id": "bafybeigpiwl3o73zvvl6dxdqu7zqcub5mhg65jiky2xqb4rdhfmikswzqm",
    "method": "allora-inference-function.wasm",
    "parameters": null,
    "topic": "TOPIC_ID",
    "config": {
        "env_vars": [
            {
                "name": "BLS_REQUEST_PATH",
                "value": "/api"
            },
            {
                "name": "ALLORA_ARG_PARAMS",
                "value": "i am so bullish on $ETH: this token will go to the moon"
            }
        ],
        "number_of_nodes": -1,
        "timeout": 2
    }
}' | jq
{
  "code": "200",
  "request_id": "7fd769d0-ac65-49a5-9759-d4cefe8bb9ea",
  "results": [
    {
      "result": {
        "stdout": "[{'label': 'Bullish', 'score': 0.7626203298568726}]\n\n",
        "stderr": "",
        "exit_code": 0
      },
      "peers": [
        "12D3KooWJM8cCyVmC45UpSNjBvknqQbsS7HTVx4bWYgxjcbkxxpC"
      ],
      "frequency": 50
    }
  ],
  "cluster": {
    "peers": [
      "12D3KooWJM8cCyVmC45UpSNjBvknqQbsS7HTVx4bWYgxjcbkxxpC"
    ]
  }
}

Connect a Worker Node to the Allora Network Walkthrough: Index Level Worker