{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# S\u00e9rialisation avec protobuf\n", "\n", "[protobuf](https://fr.wikipedia.org/wiki/Protocol_Buffers) optimise la s\u00e9rialisation de deux fa\u00e7ons. Elle acc\u00e9l\u00e8re l'\u00e9criture et la lecture des donn\u00e9es et permet aussi un acc\u00e8s rapide \u00e0 une information pr\u00e9cise dans d\u00e9s\u00e9rialiser les autres. Elle r\u00e9alise cela en imposant un sch\u00e9ma strict de donn\u00e9es."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Sch\u00e9ma\n", "\n", "On r\u00e9cup\u00e8re l'exemple du [tutorial](https://developers.google.com/protocol-buffers/docs/pythontutorial)."]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": ["schema = \"\"\"\n", "syntax = \"proto2\";\n", "\n", "package tutorial;\n", "\n", "message Person {\n", " required string name = 1;\n", " required int32 id = 2;\n", " optional string email = 3;\n", "\n", " enum PhoneType {\n", " MOBILE = 0;\n", " HOME = 1;\n", " WORK = 2;\n", " }\n", "\n", " message PhoneNumber {\n", " required string number = 1;\n", " optional PhoneType type = 2 [default = HOME];\n", " }\n", "\n", " repeated PhoneNumber phones = 4;\n", "}\n", "\n", "message AddressBook {\n", " repeated Person people = 1;\n", "}\n", "\"\"\""]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Compilation\n", "\n", "Il faut d'abord r\u00e9cup\u00e9rer le compilateur. Cela peut se faire depuis le site de [protobuf](https://github.com/google/protobuf/releases) ou sur Linux (Ubuntu/Debian) ``apt-get install protobuf-compiler`` pour obtenir le programme ``protoc``."]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"data": {"text/plain": ["'3.5.1'"]}, "execution_count": 4, "metadata": {}, "output_type": "execute_result"}], "source": ["import google.protobuf as gp\n", "version = gp.__version__\n", "if version == \"3.5.2.post1\":\n", " version = \"3.5.1\"\n", "version"]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["protoc-3.5.1-win32.zip\n"]}], "source": ["import sys, os\n", "\n", "if sys.platform.startswith(\"win\"):\n", " url = \"https://github.com/google/protobuf/releases/download/v{0}/protoc-{0}-win32.zip\".format(version)\n", " name = \"protoc-{0}-win32.zip\".format(version)\n", " exe = 'protoc.exe'\n", "else:\n", " url = \"https://github.com/google/protobuf/releases/download/v{0}/protoc-{0}-linux-x86_64.zip\".format(version)\n", " exe = 'protoc'\n", " name = \"protoc-{0}-linux-x86_64.zip\".format(version)\n", "\n", "protoc = os.path.join(\"bin\", exe)\n", "if not os.path.exists(name):\n", " from pyquickhelper.filehelper import download\n", " try:\n", " download(url)\n", " except Exception as e:\n", " raise Exception(\"Unable to download '{0}'\\nERROR\\n{1}\".format(url, e))\n", "else:\n", " print(name)"]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": ["if not os.path.exists(protoc):\n", " from pyquickhelper.filehelper import unzip_files\n", " unzip_files(name,where_to='.')"]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": ["if not os.path.exists(protoc):\n", " raise FileNotFoundError(protoc)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On \u00e9crit le format sur disque."]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": ["with open('schema.proto', 'w') as f:\n", " f.write(schema)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Et on peut compiler."]}, {"cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["\n", "----\n", "\n"]}], "source": ["from pyquickhelper.loghelper import run_cmd\n", "cmd = '{0} --python_out=. schema.proto'.format(protoc)\n", "try:\n", " out, err = run_cmd(cmd=cmd, wait=True)\n", "except PermissionError as e:\n", " # Sous Linux si ne marche pas avec bin/protoc, on utilise\n", " # protoc directement \u00e0 supposer que le package\n", " # protobuf-compiler a \u00e9t\u00e9 install\u00e9.\n", " if not sys.platform.startswith(\"win\"):\n", " protoc = \"protoc\"\n", " cmd = '{0} --python_out=. schema.proto'.format(protoc)\n", " try:\n", " out, err = run_cmd(cmd=cmd, wait=True)\n", " except Exception as e:\n", " mes = \"CMD: {0}\".format(cmd)\n", " raise Exception(\"Unable to use {0}\\n{1}\".format(protoc, mes)) from e\n", " else:\n", " mes = \"CMD: {0}\".format(cmd)\n", " raise Exception(\"Unable to use {0}\\n{1}\".format(protoc, mes)) from e\n", "print(\"\\n----\\n\".join([out, err]))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Un fichier a \u00e9t\u00e9 g\u00e9n\u00e9r\u00e9."]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [{"data": {"text/plain": ["['schema_pb2.py']"]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["[_ for _ in os.listdir(\".\") if '.py' in _]"]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["# Generated by the protocol buffer compiler. DO NOT EDIT!\n", "# source: schema.proto\n", "\n", "import sys\n", "_b=sys.version_info[0]<3 and (lambda x:x) or (lambda x:x.encode('latin1'))\n", "from google.protobuf import descriptor as _descriptor\n", "from google.protobuf import message as _message\n", "from google.protobuf import reflection as _reflection\n", "from google.protobuf import symbol_database as _symbol_database\n", "from google.protobuf import descriptor_pb2\n", "# @@protoc_insertion_point(imports)\n", "\n", "_sym_db = _symbol_database.Default()\n", "\n", "\n", "\n", "\n", "DESCRIPTOR = _descriptor.FileDescriptor(\n", " name='schema.proto',\n", " package='tutorial',\n", " syntax='proto2',\n", " serialized_pb=_b('\\n\\x0cschema.proto\\x12\\x08tutorial\\\"\\xdb\\x01\\n\\x06Person\\x12\\x0c\\n\\x04name\\x18\\x01 \\x02(\\t\\x12\\n\\n\\x02id\\x18\\x02 \\x02(\\x05\\x12\\r\\n\\x05\\x65mail\\x18\\x03 \\x01(\\t\\x12,\\n\\x06phones\\x18\\x04 \\x03(\\x0b\\x32\\x1c.tutorial.Person.PhoneNumber\\x1aM\\n\\x0bPhoneNumber\\x12\\x0e\\n\\x06number\\x18\\x01 \\x02(\\t\\x12.\\n\\x04type\\x18\\x02 \\x01(\\x0e\\x32\\x1a.tutorial.Person.PhoneType:\\x04HOME\\\"\n"]}], "source": ["with open('schema_pb2.py', 'r') as f:\n", " content = f.read()\n", "print(content[:1000])"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Import du module cr\u00e9\u00e9 \n", "\n", "Pour utliser *protobuf*, il faut importer le module cr\u00e9\u00e9."]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": ["import schema_pb2"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On cr\u00e9\u00e9 un enregistrement."]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": ["person = schema_pb2.Person()\n", "person.id = 1234\n", "person.name = \"John Doe\"\n", "person.email = \"jdoe@example.com\"\n", "phone = person.phones.add()\n", "phone.number = \"555-4321\"\n", "phone.type = schema_pb2.Person.HOME"]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [{"data": {"text/plain": ["name: \"John Doe\"\n", "id: 1234\n", "email: \"jdoe@example.com\"\n", "phones {\n", " number: \"555-4321\"\n", " type: HOME\n", "}"]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["person"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## S\u00e9rialisation en cha\u00eene de caract\u00e8res"]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [{"data": {"text/plain": ["(bytes,\n", " b'\\n\\x08John Doe\\x10\\xd2\\t\\x1a\\x10jdoe@example.com\"\\x0c\\n\\x08555-4321\\x10\\x01')"]}, "execution_count": 15, "metadata": {}, "output_type": "execute_result"}], "source": ["res = person.SerializeToString()\n", "type(res), res"]}, {"cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["4.56 \u00b5s \u00b1 447 ns per loop (mean \u00b1 std. dev. of 7 runs, 100000 loops each)\n"]}], "source": ["%timeit person.SerializeToString()"]}, {"cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [{"data": {"text/plain": ["name: \"John Doe\"\n", "id: 1234\n", "email: \"jdoe@example.com\"\n", "phones {\n", " number: \"555-4321\"\n", " type: HOME\n", "}"]}, "execution_count": 17, "metadata": {}, "output_type": "execute_result"}], "source": ["pers = schema_pb2.Person.FromString(res)\n", "pers"]}, {"cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [{"data": {"text/plain": ["name: \"John Doe\"\n", "id: 1234\n", "email: \"jdoe@example.com\"\n", "phones {\n", " number: \"555-4321\"\n", " type: HOME\n", "}"]}, "execution_count": 18, "metadata": {}, "output_type": "execute_result"}], "source": ["pers = schema_pb2.Person()\n", "pers.ParseFromString(res)\n", "pers"]}, {"cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["3.44 \u00b5s \u00b1 696 ns per loop (mean \u00b1 std. dev. of 7 runs, 100000 loops each)\n"]}], "source": ["%timeit schema_pb2.Person.FromString(res)"]}, {"cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["3.13 \u00b5s \u00b1 633 ns per loop (mean \u00b1 std. dev. of 7 runs, 100000 loops each)\n"]}], "source": ["%timeit pers.ParseFromString(res)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Plusieurs cha\u00eenes de caract\u00e8res"]}, {"cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": ["db = []\n", "\n", "person = schema_pb2.Person()\n", "person.id = 1234\n", "person.name = \"John Doe\"\n", "person.email = \"jdoe@example.com\"\n", "phone = person.phones.add()\n", "phone.number = \"555-4321\"\n", "phone.type = schema_pb2.Person.HOME\n", "db.append(person)\n", "\n", "person = schema_pb2.Person()\n", "person.id = 5678\n", "person.name = \"Johnette Doette\"\n", "person.email = \"jtdoet@example2.com\"\n", "phone = person.phones.add()\n", "phone.number = \"777-1234\"\n", "phone.type = schema_pb2.Person.MOBILE\n", "db.append(person)"]}, {"cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [{"data": {"text/plain": ["b'-\\x00\\x00\\x00\\n\\x08John Doe\\x10\\xd2\\t\\x1a\\x10jdoe@example.com\"\\x0c\\n\\x08555-4321\\x10\\x017\\x00\\x00\\x00\\n\\x0fJohnette Doette\\x10\\xae,\\x1a\\x13jtdoet@example2.com\"\\x0c\\n\\x08777-1234\\x10\\x00'"]}, "execution_count": 22, "metadata": {}, "output_type": "execute_result"}], "source": ["import struct\n", "from io import BytesIO\n", "buffer = BytesIO()\n", "for p in db:\n", " size = p.ByteSize()\n", " buffer.write(struct.pack('i', size))\n", " buffer.write(p.SerializeToString())\n", "res = buffer.getvalue()\n", "res"]}, {"cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": ["from google.protobuf.internal.decoder import _DecodeVarint32\n", "db2 = []\n", "buffer = BytesIO(res)\n", "n = 0\n", "while True:\n", " bsize = buffer.read(4)\n", " if len(bsize) == 0:\n", " # C'est fini.\n", " break\n", " size = struct.unpack('i', bsize)[0]\n", " data = buffer.read(size)\n", " p = schema_pb2.Person.FromString(data)\n", " db2.append(p) "]}, {"cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [{"data": {"text/plain": ["(name: \"John Doe\"\n", " id: 1234\n", " email: \"jdoe@example.com\"\n", " phones {\n", " number: \"555-4321\"\n", " type: HOME\n", " }, name: \"Johnette Doette\"\n", " id: 5678\n", " email: \"jtdoet@example2.com\"\n", " phones {\n", " number: \"777-1234\"\n", " type: MOBILE\n", " })"]}, "execution_count": 24, "metadata": {}, "output_type": "execute_result"}], "source": ["db2[0], db2[1]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## S\u00e9rialisation JSON"]}, {"cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": ["from google.protobuf.json_format import MessageToJson"]}, {"cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["{\n", " \"name\": \"John Doe\",\n", " \"id\": 1234,\n", " \"email\": \"jdoe@example.com\",\n", " \"phones\": [\n", " {\n", " \"number\": \"555-4321\",\n", " \"type\": \"HOME\"\n", " }\n", " ]\n", "}\n"]}], "source": ["print(MessageToJson(pers))"]}, {"cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["76.4 \u00b5s \u00b1 7.48 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10000 loops each)\n"]}], "source": ["%timeit MessageToJson(pers)"]}, {"cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [{"data": {"text/plain": ["name: \"John Doe\"\n", "id: 1234\n", "email: \"jdoe@example.com\"\n", "phones {\n", " number: \"555-4321\"\n", " type: HOME\n", "}"]}, "execution_count": 28, "metadata": {}, "output_type": "execute_result"}], "source": ["from google.protobuf.json_format import Parse as ParseJson\n", "js = MessageToJson(pers)\n", "res = ParseJson(js, message=schema_pb2.Person())\n", "res"]}, {"cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["75 \u00b5s \u00b1 7.77 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10000 loops each)\n"]}], "source": ["%timeit ParseJson(js, message=schema_pb2.Person())"]}, {"cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0"}}, "nbformat": 4, "nbformat_minor": 2}