abstract_dataloader.ext.graph
¶
Composing transforms for data processing pipelines.
Programming Model
- Data is represented as a dictionary with string keys and arbitrary values which are atomic from the perspective of transform composition.
- Transforms are created from a directed acyclic graph (DAG) of nodes,
where each node (
Node
) is a callable which takes a set of inputs and produces a set of outputs.
abstract_dataloader.ext.graph.Node
dataclass
¶
Node specification for a graph-based data processing transform.
Example Hydra Config
Attributes:
Name | Type | Description |
---|---|---|
transform |
Callable
|
callable to apply to the inputs. |
output |
str | Sequence[str]
|
output data key (or output data keys for a node which returns multiple outputs). |
inputs |
Mapping[str, str]
|
mapping of data keys to input argument names. |
optional |
Mapping[str, str]
|
mapping of optional data keys to input argument names (i.e., they are only passed if present). |
Source code in src/abstract_dataloader/ext/graph.py
apply
¶
Apply the node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
dict[str, Any]
|
input data to process. |
required |
name
|
str
|
node name (for error messages). |
''
|
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Updated data, with any new keys added to the input data. |
Source code in src/abstract_dataloader/ext/graph.py
abstract_dataloader.ext.graph.Transform
¶
Bases: Transform[dict[str, Any], dict[str, Any]]
Compose multiple callables forming a DAG into a transform.
Warning
Since the input data specifications are not provided at initialization, the graph execution order (or graph validity) is not statically determined, and result in runtime errors if invalid.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs
|
Mapping[str, str] | None
|
output data keys to produce as a mapping of output keys to
graph data keys. If |
None
|
keep_all
|
bool
|
keep references to all intermediate values and return them instead of decref-ing values which are no longer needed. |
False
|
nodes
|
Node | dict[str, Any]
|
nodes in the graph, as keyword arguments where the key indicates
a reference name for the node; any |
{}
|
Source code in src/abstract_dataloader/ext/graph.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
|
__call__
¶
Execute the transform graph on the input data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
dict[str, Any]
|
input data to process. |
required |
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Processed data. |