glide.extensions.pandas module

https://pandas.pydata.org/

class glide.extensions.pandas.DataFrameApplyMap(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.Node

Apply a transform to a Pandas DataFrame

run(df, func, **kwargs)[source]

Use applymap() on a DataFrame

Parameters
  • df (pandas.DataFrame) – The pandas DataFrame to apply func to

  • func (callable) – A callable that will be passed to df.applymap

  • **kwargs – Keyword arguments passed to applymap

class glide.extensions.pandas.DataFrameBollingerBands(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.extensions.pandas.DataFrameRollingNode

Compute bollinger bands for the specified columns in a DataFrame

compute_stats(df, rolling, column_name)[source]

Override this to implement logic to manipulate the DataFrame

class glide.extensions.pandas.DataFrameCSVExtract(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.extensions.pandas.DataFramePush

Extract data from a CSV using Pandas

run(f, **kwargs)[source]

Extract data for input file and push as a DataFrame

Parameters
  • f – file or buffer to be passed to pandas.read_csv

  • **kwargs – kwargs to be passed to pandas.read_csv

class glide.extensions.pandas.DataFrameCSVLoad(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.Node

Load data into a CSV from a Pandas DataFrame

begin()[source]

Initialize state for CSV writing

end()[source]

Reset state in case the node gets reused

run(df, f, push_file=False, dry_run=False, **kwargs)[source]

Use Pandas to_csv to output a DataFrame

Parameters
  • df (pandas.DataFrame) – DataFrame to load to a CSV

  • f (file or buffer) – File to write the DataFrame to

  • push_file (bool, optional) – If true, push the file forward instead of the data

  • dry_run (bool, optional) – If true, skip actually loading the data

  • **kwargs – Keyword arguments passed to DataFrame.to_csv

class glide.extensions.pandas.DataFrameExcelExtract(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.extensions.pandas.DataFramePush

Extract data from an Excel file using Pandas

run(f, **kwargs)[source]

Extract data for input file and push as a DataFrame. This will push a DataFrame or dict of DataFrames in the case of reading multiple sheets from an Excel file.

Parameters
  • f – file or buffer to be passed to pandas.read_excel

  • **kwargs – kwargs to be passed to pandas.read_excel

class glide.extensions.pandas.DataFrameExcelLoad(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.Node

Load data into an Excel file from a Pandas DataFrame

run(df_or_dict, f, push_file=False, dry_run=False, **kwargs)[source]

Use Pandas to_excel to output a DataFrame

Parameters
  • df_or_dict – DataFrame or dict of DataFrames to load to an Excel file. In the case of a dict the keys will be the sheet names.

  • f (file or buffer) – File to write the DataFrame to

  • push_file (bool, optional) – If true, push the file forward instead of the data

  • dry_run (bool, optional) – If true, skip actually loading the data

  • **kwargs – Keyword arguments passed to DataFrame.to_excel

class glide.extensions.pandas.DataFrameHTMLExtract(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.Node

Extract data from HTML tables using Pandas

run(f, **kwargs)[source]

Extract data for input file and push as a DataFrame

Parameters
  • f – file or buffer to be passed to pandas.read_html

  • **kwargs – kwargs to be passed to pandas.read_html

class glide.extensions.pandas.DataFrameHTMLLoad(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.Node

run(df, f, push_file=False, dry_run=False, **kwargs)[source]

Use Pandas to_html to output a DataFrame

Parameters
  • df (pandas.DataFrame) – DataFrame to load to an HTML file

  • f (file or buffer) – File to write the DataFrame to

  • push_file (bool, optional) – If true, push the file forward instead of the data

  • dry_run (bool, optional) – If true, skip actually loading the data

  • **kwargs – Keyword arguments passed to DataFrame.to_html

class glide.extensions.pandas.DataFrameMethod(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.Node

Helper to execute any pandas DataFrame method

run(df, method, **kwargs)[source]

Helper to execute any pandas DataFrame method

Parameters
  • df (pandas.DataFrame) – DataFrame object used to run the method

  • method (str) – A name of a valid DataFrame method

  • **kwargs – Arguments to pass to the DataFrame method

class glide.extensions.pandas.DataFrameMovingAverage(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.extensions.pandas.DataFrameRollingNode

Compute a moving average on a DataFrame

compute_stats(df, rolling, column_name)[source]

Override this to implement logic to manipulate the DataFrame

class glide.extensions.pandas.DataFramePush(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.Node, glide.extensions.pandas.DataFramePushMixin

Base class for DataFrame-based nodes

class glide.extensions.pandas.DataFramePushMixin[source]

Bases: object

Shared logic for DataFrame-based nodes

do_push(df, chunksize=None)[source]

Push the DataFrame to the next node, obeying chunksize if passed

Parameters
  • df (pandas.DataFrame) – DataFrame to push, or chunks of a DataFrame if the chunksize argument is passed and truthy.

  • chunksize (int, optional) – If truthy the df argument is expected to be chunks of a DataFrame that will be pushed individually.

class glide.extensions.pandas.DataFrameRollingNode(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.Node

Apply df.rolling to a DataFrame

compute_stats(df, rolling, column_name)[source]

Override this to implement logic to manipulate the DataFrame

run(df, windows, columns=None, suffix=None, **kwargs)[source]

Use df.rolling to apply a rolling window calculation on a dataframe

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html

Parameters
  • df (pandas.DataFrame) – The pandas DataFrame to process

  • windows (int or list of ints) – Size(s) of the moving window(s). If a list, all windows will be calculated and the window size will be appended as a suffix.

  • columns (list, optional) – A list of columns to calculate values for

  • suffix (str, optional) – A suffix to add to the column names of calculated values

  • **kwargs – Keyword arguments passed to df.rolling

class glide.extensions.pandas.DataFrameRollingStd(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.extensions.pandas.DataFrameRollingNode

Compute a rolling standard deviation on a DataFrame

compute_stats(df, rolling, column_name)[source]

Override this to implement logic to manipulate the DataFrame

class glide.extensions.pandas.DataFrameRollingSum(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.extensions.pandas.DataFrameRollingNode

Compute a rolling window sum on a DataFrame

compute_stats(df, rolling, column_name)[source]

Override this to implement logic to manipulate the DataFrame

class glide.extensions.pandas.DataFrameSQLExtract(*args, **kwargs)[source]

Bases: glide.extensions.pandas.PandasSQLNode

Extract data from a SQL db using Pandas

run(sql, conn, **kwargs)[source]

Extract data for input query and push as a DataFrame

Parameters
  • sql – SQL query to pass to pandas.read_sql

  • conn – A SQL database connection

  • **kwargs – kwargs to be passed to pandas.read_sql

class glide.extensions.pandas.DataFrameSQLLoad(*args, **kwargs)[source]

Bases: glide.extensions.pandas.PandasSQLNode

Load data into a SQL db from a Pandas DataFrame

run(df, conn, table, push_table=False, dry_run=False, **kwargs)[source]

Use Pandas to_sql to output a DataFrame

Parameters
  • df (pandas.DataFrame) – DataFrame to load to a SQL table

  • conn – Database connection

  • table (str) – Name of a table to write the data to

  • push_table (bool, optional) – If true, push the table forward instead of the data

  • dry_run (bool, optional) – If true, skip actually loading the data

  • **kwargs – Keyword arguments passed to DataFrame.to_sql

class glide.extensions.pandas.DataFrameSQLTableExtract(*args, **kwargs)[source]

Bases: glide.extensions.pandas.PandasSQLNode

Extract data from a SQL table using Pandas

run(table, conn, where=None, limit=None, **kwargs)[source]

Extract data for input table and push as a DataFrame

Parameters
  • table (str) – SQL table to query

  • conn – A SQL database connection

  • where (str, optional) – A SQL where clause

  • limit (int, optional) – Limit to put in SQL limit clause

  • **kwargs – kwargs to be passed to pandas.read_sql

class glide.extensions.pandas.DataFrameSQLTempLoad(*args, **kwargs)[source]

Bases: glide.extensions.pandas.PandasSQLNode

Load data into a SQL temp table from a Pandas DataFrame

run(df, conn, schema=None, dry_run=False, **kwargs)[source]

Use Pandas to_sql to output a DataFrame to a temporary table. Push a reference to the temp table forward.

Parameters
  • df (pandas.DataFrame) – DataFrame to load to a SQL table

  • conn – Database connection

  • schema (str, optional) – schema to create the temp table in

  • dry_run (bool, optional) – If true, skip actually loading the data

  • **kwargs – Keyword arguments passed to DataFrame.to_sql

class glide.extensions.pandas.FromDataFrame(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.Node

run(df, orient='records', **kwargs)[source]

Push the DataFrame to the next node, obeying chunksize if passed

Parameters
  • df – A DataFrame to convert to an iterable of records

  • orient – The orient arg passed to df.to_dict()

  • **kwargs – Keyword arguments passed to df.to_dict()

class glide.extensions.pandas.PandasSQLNode(*args, **kwargs)[source]

Bases: glide.sql.BaseSQLNode, glide.extensions.pandas.DataFramePushMixin

Captures the connection types allowed to work with Pandas to_sql/from_sql

allowed_conn_types = [<class 'sqlalchemy.engine.base.Connection'>, <class 'sqlalchemy.engine.interfaces.Connectable'>, <class 'sqlite3.Connection'>]
class glide.extensions.pandas.ToDataFrame(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.Node

run(rows, **kwargs)[source]

Convert the rows to a DataFrame

Parameters
  • rows – An iterable of rows to convert to a DataFrame

  • **kwargs – Keyword arguments passed to from_records()