abstract_dataloader.ext.sample
¶
Dataset sampling, including a low discrepancy subset sampler.
Dataset sampling is implemented using a SampledDataset,
which transparently wraps an existing Dataset.
abstract_dataloader.ext.sample.SampledDataset
¶
Bases: Dataset[TSample], Generic[TSample]
Dataset wrapper which only exposes a subset of values.
The sampling mode can be one of:
random: Uniform random sampling, withnp.random.default_rngand the supplied seed; ifseedis afloat, it is converted into an integer by multiplying bylen(dataset)and rounding.ld: Low discrepancy sampling; seesample_ld.uniform: Uniformly spaced sampling, withlinspace(0, n, samples).Callable: A callable which takes the total number of samples, and returns an array of indices to sample from the dataset.
Info
This SampledDataset is fully ADL-compliant, and acts as a passthrough
to an ADL-compliant Dataset: if the
input dataset is a Dataset[Sample], then the wrapped dataset is also
a Dataset[Sample].
Type Parameters
Sample: dataset sample type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset
|
Dataset[TSample]
|
underlying dataset. |
required |
samples
|
int | float
|
target number of samples; if greater than the dataset size,
it will be capped at the dataset size. If a |
required |
seed
|
int | float
|
sampler seed. |
0
|
mode
|
Literal['ld', 'uniform', 'random'] | Callable[[int], Integer[ndarray, N]]
|
sampling mode. |
'ld'
|
Source code in src/abstract_dataloader/ext/sample.py
abstract_dataloader.ext.sample.sample_ld
¶
sample_ld(
total: int,
samples: float | int,
seed: float | int = 0,
alpha: float | int = 2 / (sqrt(5) + 1),
) -> Int64[ndarray, samples]
Compute deterministic low-discrepancy subset mask.
Uses a simple alpha * n % 1 formulation, described
here,
with a modification to work with integer samples:
- For a given
total, find the integer closest tototal * alphawhich is co-prime with the total. Use this as the step size. - Then,
1...total * alpha (mod total)is guaranteed to visit each index up tototalexactly once.
Note
The default alpha = 1 / phi where phi is the golden ratio
(1 + sqrt(5)) / 2 has strong low-discrepancy sampling properties
;
due to the quantized nature of this function, the discrepancy may be
larger when total is small.
Tip
Each of the parameters (samples, seed, alpha) can be
specified as a float [0, 1], and a proportion of the total will
be used instead. For example, if seed = 0.7 and total=100, then
seed = 70 will be used.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
total
|
int
|
total number of samples to sample from, i.e. maximum index. |
required |
samples
|
float | int
|
number of samples to generate. Should be less than |
required |
seed
|
float | int
|
initial offset for the sampling sequence. Can leave this at |
0
|
alpha
|
float | int
|
step size in the sequence; the default value is the inverse
golden ratio |
2 / (sqrt(5) + 1)
|
Returns:
| Type | Description |
|---|---|
Int64[ndarray, samples]
|
Array, in mixed order, of |