module pandashelper.tblformat

Short summary

module pyquickhelper.pandashelper.tblformat

To format a pandas dataframe

source on GitHub

Functions

function

truncated documentation

df2html

Converts the table into a html string.

df2rst

Builds a string in RST format from a dataframe.

enumerate_split_df

Splits a dataframe by columns to display shorter dataframes.

Documentation

To format a pandas dataframe

source on GitHub

pyquickhelper.pandashelper.tblformat.df2html(self, class_table=None, class_td=None, class_tr=None, class_th=None)[source]

Converts the table into a html string.

Parameters:
  • self – dataframe (to be added as a class method)

  • class_table – adds a class to the tag table (None for none)

  • class_td – adds a class to the tag td (None for none)

  • class_tr – adds a class to the tag tr (None for none)

  • class_th – adds a class to the tag th (None for none)

Returns:

HTML

source on GitHub

pyquickhelper.pandashelper.tblformat.df2rst(df, add_line=True, align='l', column_size=None, index=False, list_table=False, title=None, header=True, sep=',', number_format=None, replacements=None, split_row=None, split_row_level='+', split_col_common=None, split_col_subsets=None, filter_rows=None, label_pattern=None)[source]

Builds a string in RST format from a dataframe.

Parameters:
  • df – dataframe

  • add_line – (bool) add a line separator between each row

  • alignr or l or c

  • column_size – something like [1, 2, 5] to multiply the column size, a dictionary (if list_table is False) to overwrite a column size like {'col_name1': 20} or {3: 20}

  • index – add the index

  • list_table – use the list_table

  • title – used only if list_table is True

  • header – add one header

  • sep – separator if df is a string and is a filename to load

  • number_format – formats number in a specific way, if number_format is an integer, the pattern is replaced by {numpy.float64: '{:.2g}'} (if number_format is 2), see also pyformat.info>`__

  • replacements – replacements just before converting into RST (dictionary)

  • split_row – displays several table, one column is used as the name of each section

  • split_row_level – title level if option split_row is used

  • split_col_common – splits the dataframe by columns, see enumerate_split_df

  • split_col_subsets – splits the dataframe by columns, see enumerate_split_df

  • filter_rows – None or function to removes rows, signature def filter_rows(df: DataFrame) -> DataFrame

  • label_pattern – if split_row is used, the function may insert a label in front of every section, example: ".. _lpy-{section}:"

Returns:

string

If list_table is False, the format is the following.

None values are replaced by empty string (4 spaces). It produces the following results:

+------------------------+------------+----------+----------+
| Header row, column 1   | Header 2   | Header 3 | Header 4 |
| (header rows optional) |            |          |          |
+========================+============+==========+==========+
| body row 1, column 1   | column 2   | column 3 | column 4 |
+------------------------+------------+----------+----------+
| body row 2             | ...        | ...      |          |
+------------------------+------------+----------+----------+

If list_table is True, the format is the following:

.. list-table:: title
    :widths: 15 10 30
    :header-rows: 1

    * - Treat
      - Quantity
      - Description
    * - Albatross
      - 2.99
      - anythings
    ...

Convert a dataframe into RST

<<<

from pandas import DataFrame
from pyquickhelper.pandashelper import df2rst

df = DataFrame([{'A': 0, 'B': 'text'},
                {'A': 1e-5, 'C': 'longer text'}])
print(df2rst(df))

>>>

    +-------+------+-------------+
    | A     | B    | C           |
    +=======+======+=============+
    | 0.0   | text |             |
    +-------+------+-------------+
    | 1e-05 |      | longer text |
    +-------+------+-------------+

Convert a dataframe into markdown

<<<

from io import StringIO
from textwrap import dedent
import pandas

from_excel = dedent('''
Op;axes;shape;SpeedUp
ReduceMax;(3,);(8, 24, 48, 8);2.96
ReduceMax;(3,);(8, 24, 48, 16);2.57
ReduceMax;(3,);(8, 24, 48, 32);2.95
ReduceMax;(3,);(8, 24, 48, 64);3.28
ReduceMax;(3,);(8, 24, 48, 100);3.05
ReduceMax;(3,);(8, 24, 48, 128);3.11
ReduceMax;(3,);(8, 24, 48, 200);2.86
ReduceMax;(3,);(8, 24, 48, 256);2.50
ReduceMax;(3,);(8, 24, 48, 400);2.48
ReduceMax;(3,);(8, 24, 48, 512);2.90
ReduceMax;(3,);(8, 24, 48, 1024);2.76
ReduceMax;(0,);(8, 24, 48, 8);19.29
ReduceMax;(0,);(8, 24, 48, 16);11.83
ReduceMax;(0,);(8, 24, 48, 32);5.69
ReduceMax;(0,);(8, 24, 48, 64);5.49
ReduceMax;(0,);(8, 24, 48, 100);6.13
ReduceMax;(0,);(8, 24, 48, 128);6.27
ReduceMax;(0,);(8, 24, 48, 200);5.46
ReduceMax;(0,);(8, 24, 48, 256);4.76
ReduceMax;(0,);(8, 24, 48, 400);2.21
ReduceMax;(0,);(8, 24, 48, 512);4.52
ReduceMax;(0,);(8, 24, 48, 1024);4.38
ReduceSum;(3,);(8, 24, 48, 8);1.79
ReduceSum;(3,);(8, 24, 48, 16);0.79
ReduceSum;(3,);(8, 24, 48, 32);1.67
ReduceSum;(3,);(8, 24, 48, 64);1.19
ReduceSum;(3,);(8, 24, 48, 100);2.08
ReduceSum;(3,);(8, 24, 48, 128);2.96
ReduceSum;(3,);(8, 24, 48, 200);1.66
ReduceSum;(3,);(8, 24, 48, 256);2.26
ReduceSum;(3,);(8, 24, 48, 400);1.76
ReduceSum;(3,);(8, 24, 48, 512);2.61
ReduceSum;(3,);(8, 24, 48, 1024);2.21
ReduceSum;(0,);(8, 24, 48, 8);2.56
ReduceSum;(0,);(8, 24, 48, 16);2.05
ReduceSum;(0,);(8, 24, 48, 32);3.04
ReduceSum;(0,);(8, 24, 48, 64);2.57
ReduceSum;(0,);(8, 24, 48, 100);2.41
ReduceSum;(0,);(8, 24, 48, 128);2.77
ReduceSum;(0,);(8, 24, 48, 200);2.02
ReduceSum;(0,);(8, 24, 48, 256);1.61
ReduceSum;(0,);(8, 24, 48, 400);1.59
ReduceSum;(0,);(8, 24, 48, 512);1.48
ReduceSum;(0,);(8, 24, 48, 1024);1.50
''')

df = pandas.read_csv(StringIO(from_excel), sep=";")
print(df.columns)

sub = df[["Op", "axes", "shape", "SpeedUp"]]
piv = df.pivot_table(values="SpeedUp", index=['axes', "shape"], columns="Op")
piv = piv.reset_index(drop=False)

print(piv.to_markdown(index=False))

>>>

    Index(['Op', 'axes', 'shape', 'SpeedUp'], dtype='object')
    | axes   | shape             |   ReduceMax |   ReduceSum |
    |:-------|:------------------|------------:|------------:|
    | (0,)   | (8, 24, 48, 100)  |        6.13 |        2.41 |
    | (0,)   | (8, 24, 48, 1024) |        4.38 |        1.5  |
    | (0,)   | (8, 24, 48, 128)  |        6.27 |        2.77 |
    | (0,)   | (8, 24, 48, 16)   |       11.83 |        2.05 |
    | (0,)   | (8, 24, 48, 200)  |        5.46 |        2.02 |
    | (0,)   | (8, 24, 48, 256)  |        4.76 |        1.61 |
    | (0,)   | (8, 24, 48, 32)   |        5.69 |        3.04 |
    | (0,)   | (8, 24, 48, 400)  |        2.21 |        1.59 |
    | (0,)   | (8, 24, 48, 512)  |        4.52 |        1.48 |
    | (0,)   | (8, 24, 48, 64)   |        5.49 |        2.57 |
    | (0,)   | (8, 24, 48, 8)    |       19.29 |        2.56 |
    | (3,)   | (8, 24, 48, 100)  |        3.05 |        2.08 |
    | (3,)   | (8, 24, 48, 1024) |        2.76 |        2.21 |
    | (3,)   | (8, 24, 48, 128)  |        3.11 |        2.96 |
    | (3,)   | (8, 24, 48, 16)   |        2.57 |        0.79 |
    | (3,)   | (8, 24, 48, 200)  |        2.86 |        1.66 |
    | (3,)   | (8, 24, 48, 256)  |        2.5  |        2.26 |
    | (3,)   | (8, 24, 48, 32)   |        2.95 |        1.67 |
    | (3,)   | (8, 24, 48, 400)  |        2.48 |        1.76 |
    | (3,)   | (8, 24, 48, 512)  |        2.9  |        2.61 |
    | (3,)   | (8, 24, 48, 64)   |        3.28 |        1.19 |
    | (3,)   | (8, 24, 48, 8)    |        2.96 |        1.79 |

Nan value are replaced by empty string even if number_format is not None.

source on GitHub

pyquickhelper.pandashelper.tblformat.enumerate_split_df(df, common, subsets)[source]

Splits a dataframe by columns to display shorter dataframes.

Parameters:
  • df – dataframe

  • common – common columns

  • subsets – subsets of columns

Returns:

split dataframes

<<<

from pandas import DataFrame
from pyquickhelper.pandashelper.tblformat import enumerate_split_df

df = DataFrame([{'A': 0, 'B': 'text'},
                {'A': 1e-5, 'C': 'longer text'}])
res = list(enumerate_split_df(df, ['A'], [['B'], ['C']]))
print(res[0])
print('-----')
print(res[1])

>>>

             A     B
    0  0.00000  text
    1  0.00001   NaN
    -----
             A            C
    0  0.00000          NaN
    1  0.00001  longer text

source on GitHub