External Data#

Loading an ONNX Model with External Data#

  • [Default] If the external data is under the same directory of the model, simply use onnx.load()

import onnx

onnx_model = onnx.load("path/to/the/model.onnx")
  • If the external data is under another directory, use load_external_data_for_model() to specify the directory path and load after using onnx.load()

import onnx
from onnx.external_data_helper import load_external_data_for_model

onnx_model = onnx.load("path/to/the/model.onnx", load_external_data=False)
load_external_data_for_model(onnx_model, "data/directory/path/")
# Then the onnx_model has loaded the external data from the specific directory

Converting an ONNX Model to External Data#

import onnx
from onnx.external_data_helper import convert_model_to_external_data

onnx_model = ... # Your model in memory as ModelProto
convert_model_to_external_data(onnx_model, all_tensors_to_one_file=True, location="filename", size_threshold=1024, convert_attribute=False)
# Must be followed by save_model to save the converted model to a specific path
onnx.save_model(onnx_model, "path/to/save/the/model.onnx")
# Then the onnx_model has converted raw data as external data and saved to specific directory

Converting and Saving an ONNX Model to External Data#

import onnx

onnx_model = ... # Your model in memory as ModelProto
onnx.save_model(onnx_model, "path/to/save/the/model.onnx", save_as_external_data=True, all_tensors_to_one_file=True, location="filename", size_threshold=1024, convert_attribute=False)
# Then the onnx_model has converted raw data as external data and saved to specific directory

onnx.checker for Models with External Data#

Models with External Data (<2GB)#

Current checker supports checking models with external data. Specify either loaded onnx model or model path to the checker.

Large models >2GB#

However, for those models larger than 2GB, please use the model path for onnx.checker and the external data needs to be under the same directory.

import onnx

onnx.checker.check_model("path/to/the/model.onnx")
# onnx.checker.check_model(loaded_onnx_model) will fail if given >2GB model

TensorProto: data_location and external_data fields#

There are two fields related to the external data in TensorProto message type.

data_location field#

data_location field stores the location of data for this tensor. Value MUST be one of:

  • MESSAGE - data stored in type-specific fields inside the protobuf message.

  • RAW - data stored in raw_data field.

  • EXTERNAL - data stored in an external location as described by external_data field.

  • value not set - legacy value. Assume data is stored in raw_data (if set) otherwise in message.

external_data field#

external_data field stores key-value pairs of strings describing data location

Recognized keys are:

  • "location" (required) - file path relative to the filesystem directory where the ONNX protobuf model was stored. Up-directory path components such as .. are disallowed and should be stripped when parsing.

  • "offset" (optional) - position of byte at which stored data begins. Integer stored as string. Offset values SHOULD be multiples 4096 (page size) to enable mmap support.

  • "length" (optional) - number of bytes containing data. Integer stored as string.

  • "checksum" (optional) - SHA1 digest of file specified in under ‘location’ key.

After an ONNX file is loaded, all external_data fields may be updated with an additional key ("basepath"), which stores the path to the directory from which he ONNX model file was loaded.

External data files#

Data stored in external data files will be in the same binary bytes string format as is used by the raw_data field in current ONNX implementations.

Reference https://github.com/onnx/onnx/pull/678