Skip to main content
Documentation

Quickstart

Run a model from start to finish with embedl-hub.

This guide shows you how to go from having an idea for an application to profiling a model on remote hardware. To showcase this, we will compile and profile a model that will run on a Samsung Galaxy S24 mobile phone.

You will learn how to compile and profile a model using the Embedl Hub CLI and Python API.

Prerequisites

If you haven’t already done so, follow the instructions in the setup guide to:

  • Create an Embedl Hub account
  • Install the Embedl Hub Python library
  • Configure an API Key

Creating a project

Create a project for the application:

embedl-hub init \
    --project "Quickstart"

This also sets the project as default for future commands. You can view your current settings at any time:

embedl-hub show

For alternative ways to configure project context, see the configuration guide.

Configuring the artifact directory

Compilation and profiling produce artifacts such as compiled models, logs, and performance reports. The artifact directory is where these are stored on disk.

In the CLI, the artifact directory is configured globally using embedl-hub init and persisted across sessions. If not set, it defaults to a platform-specific data directory.

embedl-hub init \
    --artifact-dir ~/my-artifacts

This is important for the CLI because later commands may need access to previously produced artifacts — for example, profiling a model that was compiled in an earlier step.

Selecting a provider and target device

Embedl Hub supports running operations on a variety of backends, from your local machine to managed device clouds and your own hardware over SSH. Which backend is used is determined by the provider you select. Some providers also require a target device — the specific hardware the model will run on.

Every CLI command follows the structure embedl-hub <command> <toolchain> <provider>, where you select a provider as the last positional argument:

embedl-hub compile tflite local
$ embedl-hub compile tflite qai-hub
$ embedl-hub profile tflite aws

Some providers (such as qai-hub and aws) require you to select a target device by name — for example, "Samsung Galaxy S24". You can list the available target devices using list-devices:

embedl-hub list-devices

You can also browse the full list on the Supported devices page.

In this guide we’ll use local for local compilation, qai-hub for device-targeted compilation, and aws for profiling on the Embedl device cloud. For the full list of providers and supported combinations, see the providers guide.

Compiling a model

The first step is to compile the model into a format that can run on the target hardware.

The compile step expects an ONNX file. You can save your existing PyTorch model in ONNX format using torch.onnx.export.

For this guide, we will convert the Torchvision MobileNet V2 model to ONNX using scripting:

import torch
from torchvision.models import mobilenet_v2
# Define the model and example input
model = mobilenet_v2(weights="IMAGENET1K_V2")
example_input = torch.rand(1, 3, 224, 224)
# Save model in ONNX format
torch.onnx.export(
    model,
    example_input,
    "path/to/mobilenet_v2.onnx",
    input_names=["input"],
    output_names=["output"],
    opset_version=18,
    external_data=False,
    dynamo=False,
)

Compile the saved model to TFLite format using local compilation:

embedl-hub compile tflite local \
    --model /path/to/mobilenet_v2.onnx

Since we haven’t set an output name, the compiled model will be saved as mobilenet_v2.tflite in the artifact directory configured by embedl-hub init --artifact-dir.

Quantizing a model

Every provider quantizes the model as part of compilation — lowering the precision of weights and activations to reduce memory and compute on the target device. The local provider applies FP16 quantization by default. Cloud and SSH providers such as qai-hub, embedl-onnxruntime, and trtexec go further with INT8 quantization, which requires a device name and input shape.

To compile with quantization, provide the model, device name, and input shape. The example below uses qai-hub, but the same flags apply to other providers:

embedl-hub compile tflite qai-hub \
    --model /path/to/mobilenet_v2.onnx \
    --device "Samsung Galaxy S24" \
    --input-shape 1,3,224,224

Providing calibration data

Although quantization reduces the model’s precision, you can mitigate the accuracy loss by providing calibration data — a small set of representative input samples. You don’t need a large dataset; usually, a few hundred samples are more than enough. If no calibration data is provided, random data is used.

embedl-hub compile tflite qai-hub \
    --model /path/to/mobilenet_v2.onnx \
    --device "Samsung Galaxy S24" \
    --input-shape 1,3,224,224 \
    --calibration-data /path/to/dataset

The --calibration-data flag points to a directory of .npy files, one file per sample. For single-input models, place the files directly in the directory. For multi-input models, create one subdirectory per input tensor (named after the input), each with the same number of files.

Note: Some models have operations that are notoriously difficult to quantize, which can lead to a huge drop in accuracy. One example is the softmax function in the attention layers of large language models (LLMs).

Profiling a model

Let’s evaluate how well the model performs on remote hardware:

embedl-hub profile tflite aws \
    --model /path/to/mobilenet_v2.tflite \
    --device "Samsung Galaxy S24"

You can also profile a model from a previous compile run without specifying the model path:

embedl-hub profile tflite aws \
    --from-run latest \
    --device "Samsung Galaxy S24"

Profiling the model gives useful information such as the model’s latency on the hardware platform, which layers are slowest, the number of layers executed on each compute unit type, and more! We can use this information for advanced debugging and for iterating on the model’s design. We can answer questions like:

  • Can we optimize the slowest layer?
  • Why aren’t certain layers executed on the correct compute unit?

Viewing your runs

Every compile, profile, and invoke operation is recorded as a run. Each run is stored locally in the artifact directory and automatically synced to your Hub page, where you can inspect results, compare runs side-by-side, and share them with your team.

On the Hub page

Open your project on hub.embedl.com to see a timeline of all runs. Each run shows the component type (compiler, profiler, invoker), status, device, metrics such as latency and memory usage, and the artifacts that were produced. You can click into any run to see layer-by-layer breakdowns and download compiled models.

From the CLI

Use embedl-hub log to list recent runs directly in your terminal:

embedl-hub log

embedl-hub log reads runs from the artifact directory configured with embedl-hub init --artifact-dir. It only shows runs stored in that directory, so if you used a different artifact directory for some runs — or created runs via the Python API with a custom path — they won’t appear unless you point --artifact-dir to the same location.

Each entry shows the run ID, model name, component type, timestamp, status, and a direct link to the run on your Hub page:

abc12345  mobilenet_v2  (TFLiteCompiler)
  2026-03-25 14:19:24 · FINISHED · qai-hub
  url: https://hub.embedl.com/projects/.../runs/abc12345
  dir: artifacts/TFLiteCompiler_20260325_141924/
  $ embedl-hub compile tflite qai-hub --model mobilenet_v2.onnx …

Run embedl-hub log --help for the full list of options.

Naming and tagging runs

Run names

Every run gets a name that appears in the Hub. By default the name is the component class name (e.g. TFLiteCompiler). You can override it to make runs easier to identify:

Pass --run-name (or -rn) to any command:

embedl-hub compile tflite qai-hub \
    --model model.onnx \
    --device "Samsung Galaxy S24" \
    --run-name "mobilenet-int8"

Tags

You can also attach tags to runs as key-value pairs. Tags are useful for organizing runs by model variant, dataset, experiment, or any other dimension you care about.

Pass one or more --tag flags to any compile or profile command:

embedl-hub compile tflite qai-hub \
    --model model.onnx \
    --device "Samsung Galaxy S24" \
    --tag model=mobilenet_v2 \
    --tag dataset=imagenet

On the Hub page, tags appear on each run’s detail page and power the Benchmarks view. The benchmarks page collects all runs in a project and plots them in a scatter chart where you can color-code data points by any tag — for example, coloring by model to compare latency across model variants, or by dataset to see how calibration data affects accuracy. You can also filter runs by tag values to focus on specific experiments.

Next steps

  • See the profiling models guide to learn how to interpret profiling results — layerwise latency, compute unit breakdown, and more.
  • See the providers guide for the full list of supported providers, toolchains, and how to connect your own hardware over SSH.