glide.extensions.dask module

https://docs.dask.org/en/latest/

class glide.extensions.dask.DaskClientMap(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.PoolSubmit

Apply a transform to a Pandas DataFrame using dask Client

check_data(data)[source]

Optional input data check

get_executor(**executor_kwargs)[source]

Override this to return the parallel executor

get_results(futures, timeout=None)[source]

Override this to fetch results from an asynchronous task

get_worker_count(executor)[source]

Override this to return a count of workers active in the executor

shutdown_executor(executor)[source]

Override this to shutdown the executor

submit(executor, func, splits, **kwargs)[source]

Override this to submit work to the executor

class glide.extensions.dask.DaskClientPush(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.flow.FuturesPush

Use a dask Client to do a parallel push

as_completed_func = None
executor_class = None
run(*args, **kwargs)[source]

Subclasses will override this method to implement core node logic

class glide.extensions.dask.DaskDataFrameApply(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.Node

Apply a transform to a Pandas DataFrame using dask dataframe

run(df, func, from_pandas_kwargs=None, **kwargs)[source]

Convert to dask dataframe and use apply()

NOTE: it may be more efficient to not convert to/from Dask Dataframe in this manner depending on the pipeline

Parameters
  • df (pandas.DataFrame) – The pandas DataFrame to apply func to

  • func (callable) – A callable that will be passed to Dask DataFrame.apply

  • from_pandas_kwargs (optional) – Keyword arguments to pass to dask.dataframe.from_pandas

  • **kwargs – Keyword arguments passed to Dask DataFrame.apply

class glide.extensions.dask.DaskDelayedPush(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.core.PushNode

Use dask delayed to do a parallel push

class glide.extensions.dask.DaskFuturesReduce(name, _log=False, _debug=False, **default_context)[source]

Bases: glide.flow.Reduce

Collect the asynchronous results before pushing

end()[source]

Do the push once all Futures results are in.

Warning

Dask futures will not work if you have closed your client connection!

class glide.extensions.dask.DaskParaGlider(*args, executor_kwargs=None, **kwargs)[source]

Bases: glide.core.ParaGlider

A ParaGlider that uses a dask Client to execute parallel calls to consume()

get_executor()[source]

Override this method to create the parallel executor

get_results(futures, timeout=None)[source]

Override this method to get the asynchronous results

get_worker_count(executor)[source]

Override this method to get the active worker count from the executor