Skip to main content
Documentation

Providers

Understand how providers connect your models to devices and clouds.

Every embedl-hub command follows the structure:

embedl-hub <command> <toolchain> <provider>

The provider determines where and how the command is executed. Some providers run locally, some dispatch to a managed device cloud, and some connect to your own hardware over SSH.

Available providers

local

Runs the operation on your local machine. No remote device or cloud account is required.

Currently supported for TFLite compilation only, where it uses onnx2tf to convert an ONNX model to the TFLite format. This is useful for getting a .tflite file quickly without needing cloud access.

CLI:

embedl-hub compile tflite local -m model.onnx

The local provider does not require a device, so there is no corresponding DeviceManager method.

qai-hub

Dispatches jobs to Qualcomm AI Hub, a managed cloud service from Qualcomm. Supports compilation (with quantization) and profiling on Snapdragon-powered devices (e.g., Samsung Galaxy S25, Google Pixel 9 Pro) as well as some automotive SoCs.

Requires a Qualcomm AI Hub account and API token. To set it up:

  1. Create an account on Qualcomm AI Hub.

  2. Log in and click Settings in the top-right corner to find your API token.

  3. Configure the token locally (the qai-hub package is installed automatically with embedl-hub):

    qai-hub configure --api_token YOUR_API_TOKEN

CLI:

embedl-hub compile tflite qai-hub \
    -m model.onnx \
    -d "Samsung Galaxy S24" \
    -s 1,3,224,224

Python API:

DeviceManager.get_qai_hub_device("Samsung Galaxy S24", name="galaxy-s24")

aws

Dispatches profiling jobs to the Embedl device cloud, backed by AWS Device Farm. This is the default cloud for profiling TFLite models and does not require any additional setup beyond your Embedl Hub account.

CLI:

embedl-hub profile tflite aws \
    -m model.tflite \
    -d "Samsung Galaxy S25"

Python API:

DeviceManager.get_aws_device("Samsung Galaxy S25", name="galaxy-s25")

embedl-onnxruntime

Connects to a remote device over SSH and runs the embedl-onnxruntime backend on it. Use this provider to compile, profile, and invoke ONNX models on your own hardware — for example, a Raspberry Pi, Jetson board, or any Linux device you have SSH access to.

The target device must have the embedl-onnxruntime wheel installed. See the embedl-onnxruntime repository for installation instructions.

Together with trtexec, this is one of the fastest backends available.

CLI:

embedl-hub compile onnxruntime embedl-onnxruntime \
    -m model.onnx \
    --host 192.168.1.42 \
    --user pi

Python API:

from embedl_hub.core.device.ssh import SSHConfig
DeviceManager.get_embedl_onnxruntime_device(
    SSHConfig(host="192.168.1.42", username="pi"),
    name="rpi",
)

trtexec

Connects to a remote NVIDIA device over SSH and runs NVIDIA’s trtexec tool on it. Use this provider to compile, profile, and invoke TensorRT models on your own GPU-equipped hardware.

Together with embedl-onnxruntime, this is one of the fastest backends available.

CLI:

embedl-hub compile tensorrt trtexec \
    -m model.onnx \
    --host 192.168.1.10 \
    --user nvidia

Python API:

from embedl_hub.core.device.ssh import SSHConfig
DeviceManager.get_tensorrt_device(
    SSHConfig(host="192.168.1.10", username="nvidia"),
    name="jetson",
)

You can browse all supported devices on the Supported devices page.

Supported combinations

Not every provider is available for every toolchain and command. The tables below show which combinations are supported.

Compile

localqai-hubembedl-onnxruntimetrtexec
tflite
onnxruntime
tensorrt

Profile

qai-hubawsembedl-onnxruntimetrtexec
tflite
onnxruntime
tensorrt

Invoke

qai-hubembedl-onnxruntimetrtexec
tflite
onnxruntime
tensorrt

Note: The local provider does not require a device name. All other providers require a --device flag or device configuration.

Choosing a provider

  • Just need a TFLite file? Use compile tflite local for quick local conversion with FP16 quantization — no device or cloud account needed.
  • Want INT8-quantized, device-optimized models? Use qai-hub for Qualcomm devices, embedl-onnxruntime for ONNX Runtime on your own hardware, or trtexec for TensorRT on NVIDIA GPUs.
  • Profiling on real devices? Use aws for the Embedl device cloud, or qai-hub for Qualcomm AI Hub devices.
  • Have your own hardware? Use embedl-onnxruntime or trtexec to compile, profile, and invoke models on any device you can reach over SSH.