module mlhelper.table_formula

Inheritance diagram of pyensae.mlhelper.table_formula

Short summary

module pyensae.mlhelper.table_formula

Adds functionalities to a dataframe.

source on GitHub

Classes

class

truncated documentation

TableFormula

Extends class :epkg:`pandas:DataFrame` or proposes extensions to existing functions using lambda functions. See …

Properties

property

truncated documentation

_can_fast_transpose

Can we transpose this DataFrame without creating any new array objects.

_constructor

_data

_info_axis

_is_homogeneous_type

Whether all the columns in a DataFrame have the same type. Returns ——- bool See Also …

_is_mixed_type

_is_view

Return boolean indicating if self is view of another array

_series

_stat_axis

_values

Analogue to ._values that may return a 2D ExtensionArray.

at

Access a single value for a row/column label pair. Similar to loc, in that both provide label-based lookups. …

attrs

Dictionary of global attributes of this dataset.

axes

Return a list representing the axes of the DataFrame. It has the row axis labels and column axis labels as the …

dtypes

Return the dtypes in the DataFrame. This returns a Series with the data type of each column. The result’s …

empty

Indicator whether Series/DataFrame is empty. True if Series/DataFrame is entirely empty (no items), meaning any …

flags

Get the properties associated with this pandas object. The available flags are

iat

Access a single value for a row/column pair by integer position. Similar to iloc, in that both provide integer-based …

iloc

Purely integer-location based indexing for selection by position. .iloc[] is primarily integer position based …

loc

Access a group of rows and columns by label(s) or a boolean array. .loc[] is primarily label based, but may …

ndim

Return an int representing the number of axes / array dimensions. Return 1 if Series. Otherwise return 2 if DataFrame. …

shape

Return a tuple representing the dimensionality of the DataFrame. See Also ——– ndarray.shape …

size

Return an int representing the number of elements in this object. Return the number of rows if Series. Otherwise …

style

Returns a Styler object. Contains methods for building a styled HTML representation of the DataFrame. …

T

The transpose of the DataFrame. Returns ——- DataFrame The transposed DataFrame. …

values

Return a Numpy representation of the DataFrame.

Methods

method

truncated documentation

add_column_index

Changes the index.

add_column_vector

Adds a column knowing its name and a vector of values.

addc

Adds a column knowing its name and a lambda function.

fgroupby

Groups information based on columns defined by lambda functions.

graph_XY

sort

Sorts rows based on the values returned by function_sort.

Documentation

Adds functionalities to a dataframe.

source on GitHub

class pyensae.mlhelper.table_formula.TableFormula(data=None, index: Axes | None = None, columns: Axes | None = None, dtype: Dtype | None = None, copy: bool | None = None)

Bases: DataFrame

Extends class :epkg:`pandas:DataFrame` or proposes extensions to existing functions using lambda functions. See Extending Pandas.

source on GitHub

property _constructor

Used when a manipulation result has the same dimensions as the original.

_mgr: BlockManager | ArrayManager
add_column_index(index, name=None)

Changes the index.

Parameters:
  • index – new_index

  • name – name of the index

The changes happen inplace.

source on GitHub

add_column_vector(name, values)

Adds a column knowing its name and a vector of values.

Parameters:
  • name – name of the column

  • values – values

The changes happen inplace.

source on GitHub

addc(name, function_value)

Adds a column knowing its name and a lambda function.

Parameters:
  • name – name of the column

  • function_value – function

The changes happen inplace.

source on GitHub

fgroupby(function_key, function_values, columns=None, function_agg=None, function_weight=None)

Groups information based on columns defined by lambda functions.

Parameters:
  • function_key – defines the key

  • function_values – defines the values

  • columns – name of the columns, if None, new ones will be created

  • function_agg – how to aggregate the data, if None, the default is :epkg:`pandas:DataFrame:sum`.

  • function_weight – defines weights, can be None

The function uses columns __key__, __weight__. You should not use these names. Others columns are created __value_{0}__ and __weight_{0}__. All of them are created and removed before returning the result.

Example:

group = table.groupby(lambda v: v["name"],
          [lambda v: v["d_a"]],
          ["sum_d_a"],
          [lambda vec, w: sum(vec) / w],
          lambda v: v["d_b"])

source on GitHub

graph_XY(curves, xlabel=None, ylabel=None, marker=True, link_point=False, title=None, format_date='%Y-%m-%d', legend_loc=0, figsize=None, ax=None)
Parameters:
  • curves – list of 3-uples (generator for X, generator for Y, label) for some layout, it can also be: (generator for X, generator for Y, generator for labels, label)

  • xlabel – label for X axis

  • ylabel – label for Y axis

  • marker – add a marker for each point

  • link_point – link points between them

  • title – graph title

  • format_date – if X axis is a datetime object, the function will use this format to print dates

  • legend_loc – location of the legend

  • figsize – size of the figure

  • ax:epkg:`matplotlib:Axis` or None to create a new one

Returns:

:epkg:`matplotlib:Axis`

For the legend position, see matplotlib.

Example:

table.graph_XY ( [ [ lambda v: v["sum_a"], lambda v: v["sum_b"], "xy label 1"],
                   [ lambda v: v["sum_b"], lambda v: v["sum_c"], "xy label 2"],
                    ])

source on GitHub

sort(function_sort, reverse=False)

Sorts rows based on the values returned by function_sort.

Parameters:
  • function_sort – lambda function

  • reverse – reverse order

The function creates a column __key__ and removes it later. The changes happen inplace.

source on GitHub