Deploying a Transformers Model Inference with FastAPI

FastAPI is a modern, fast, web-based framework for building APIs with Python 3.6+ based on standard Python type hints. It’s highly efficient and easy to use, and it comes with automatic interactive API documentation. In this blog post, we will walk through the process of deploying a Python web service that hosts a Transformers model inference using FastAPI. We will also discuss how to run Uvicorn workers as a systemd service on Debian environments.

What is FastAPI?

FastAPI is a high-performance web framework for building APIs with Python 3.6+ type hints. Key features of FastAPI include:

  • Fast: Very high performance, on par with NodeJS and Go (thanks to Starlette and Pydantic).
  • Fast to code: Increase the speed to develop features by about 200% to 300%.
  • Fewer bugs: Reduce about 40% of human (developer) induced errors.
  • Intuitive: Great editor support. Completion everywhere. Less time debugging.
  • Easy: Designed to be easy to use and learn. Less time reading docs.
  • Short: Minimize code duplication. Multiple features from each parameter declaration.
  • Robust: Get production-ready code. With automatic interactive documentation.
  • Standards-based: Based on (and fully compatible with) the open standards for APIs: OpenAPI and JSON Schema.
  • Django-friendly: Easy to integrate with Django ORM, Django Filters, and more.

Deploying a Python Web Service with FastAPI

Let’s start by creating a new FastAPI project. First, install FastAPI and Uvicorn, an ASGI server, with pip:

pip install fastapi uvicorn

Next, create a new Python file (e.g., and import FastAPI:

from fastapi import FastAPI

app = FastAPI()

def read_root():
    return {"Hello": "World"}

You can now start the Uvicorn server with:

uvicorn main:app --reload

This will start a local server at

Hosting a Transformers Model Inference

To host a Transformers model inference, we first need to install the Transformers library:

pip install transformers

Next, we can load a pre-trained model and create an endpoint for model inference. For example, let’s use the distilbert-base-uncased-finetuned-sst-2-english model for sentiment analysis:

from fastapi import FastAPI
from transformers import pipeline

app = FastAPI()

nlp_model = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')"/predict")
def predict_sentiment(text: str):
    result = nlp_model(text)[0]
    return {"label": result['label'], "score": round(result['score'], 4)}

This will create a POST endpoint at /predict that takes a text string as input and returns the predicted sentiment and score.

Running Uvicorn Workers as a systemd Service

To run Uvicorn workers as a systemd service on Debian environments, we first need to create a new systemd service file (e.g., /etc/systemd/system/uvicorn.service):

Description=Uvicorn server instance

ExecStart=/usr/local/bin/uvicorn main:app --host --port 8000


Next, start the service with:

sudo systemctl start uvicorn

And enable it to start on boot with:

sudo systemctl enable uvicorn

You can check the status of the service with:

sudo systemctl status uvicorn

And that’s it! You now have a Python web service running a Transformers model inference with FastAPI, and Uvicorn workers running as a systemd service on a Debian environment.



, ,



Leave a Reply