glide.extensions.dask module¶
https://docs.dask.org/en/latest/
-
class
glide.extensions.dask.
DaskClientMap
(name, _log=False, _debug=False, **default_context)[source]¶ Bases:
glide.core.PoolSubmit
Apply a transform to a Pandas DataFrame using dask Client
-
get_results
(futures, timeout=None)[source]¶ Override this to fetch results from an asynchronous task
-
-
class
glide.extensions.dask.
DaskClientPush
(name, _log=False, _debug=False, **default_context)[source]¶ Bases:
glide.flow.FuturesPush
Use a dask Client to do a parallel push
-
as_completed_func
= None¶
-
executor_class
= None¶
-
-
class
glide.extensions.dask.
DaskDataFrameApply
(name, _log=False, _debug=False, **default_context)[source]¶ Bases:
glide.core.Node
Apply a transform to a Pandas DataFrame using dask dataframe
-
run
(df, func, from_pandas_kwargs=None, **kwargs)[source]¶ Convert to dask dataframe and use apply()
NOTE: it may be more efficient to not convert to/from Dask Dataframe in this manner depending on the pipeline
- Parameters
df (pandas.DataFrame) – The pandas DataFrame to apply func to
func (callable) – A callable that will be passed to Dask DataFrame.apply
from_pandas_kwargs (optional) – Keyword arguments to pass to dask.dataframe.from_pandas
**kwargs – Keyword arguments passed to Dask DataFrame.apply
-
-
class
glide.extensions.dask.
DaskDelayedPush
(name, _log=False, _debug=False, **default_context)[source]¶ Bases:
glide.core.PushNode
Use dask delayed to do a parallel push
-
class
glide.extensions.dask.
DaskFuturesReduce
(name, _log=False, _debug=False, **default_context)[source]¶ Bases:
glide.flow.Reduce
Collect the asynchronous results before pushing
-
class
glide.extensions.dask.
DaskParaGlider
(*args, executor_kwargs=None, **kwargs)[source]¶ Bases:
glide.core.ParaGlider
A ParaGlider that uses a dask Client to execute parallel calls to consume()