API reference

tinybaker

class tinybaker.Transform(input_paths, output_paths, overwrite=False)

Abstract base class for all transformations in TinyBaker

Parameters
  • input_paths (Dict[str, Union[str, Iterable[str]]]) – Dictionary of input tags to files.

  • output_paths (Dict[str, Union[str, Iterable[str]]]) – Dictionary of output tags to files.

  • context (optional) – The BakerDriverContext to use for this transformation

  • overwrite (optional) – Whether or not to configure the transformation to overwrite output files on execution

static from_dict(dic)

Convert a dictionary to a transform. This isn’t intended as a standard developer path, but rather a helper for interop’s sake

Parameters

namespace – The name

Return type

TransformMeta

static from_namespace(ns)

Convert a namespace to a transform. This is currently partially supported, as it’s undocumented and somewhat second-class to the standard definition path.

Parameters

namespace – The name

Return type

TransformMeta

run()

Run the transform instance in the default context

abstract script()

The script to be run on execution. This is in essence where what the transform actually does is specified

classmethod structure()

Returns a JSON-serializable dictionary describing the nested structure of the transform Hopefully, this is not useful beyond developing new tools for analyzing tinybaker transforms, e.g. you shouldn’t have to use this for tinybaker to be useful. Open an issue if you do.

Returns

Dict

tinybaker.sequence(seq_steps, exposed_intermediates={}, name=None)

Sequence several transforms together, hooking inputs and outputs together via tagname

Parameters
  • seq_steps (Iterable[Any]) – An iterable of the transforms to be sequenced.

  • exposed_intermediates (Set[str]) – Which intermediate tags generated within the sequence to expose to the top-level sequence of transformations

  • name (optional) – The name of the resulting transform

Return type

TransformMeta

Returns

Transform class representing a sequence of all the transforms

tinybaker.merge(merge_steps, name=None)

Merge several transformations together. Base transformations must not conflict in output.

Parameters
  • merge_steps (Iterable[Any]) – Iterable of Transforms to merge together

  • name (optional) – The name of the resulting transform

Return type

TransformMeta

Returns

Transform class consisting of a merge between the input transforms

tinybaker.map_tags(base_step, input_mapping={}, output_mapping={}, name=None)

Take a transform and create a new, identical transform with the tags renamed.

Parameters
  • base_step (Any) – Base step for the transform.

  • input_mapping (optional) – Mapping of old input tag names to new input tag names

  • output_mapping (optional) – Mapping of old output tag names to new input tag names

  • name (optional) – The name of the resulting transform

Return type

TransformMeta

Returns

Transform class with renamed inputs / outputs

tinybaker.cli(source)

Runs a CLI for the specified transform. Can take any transform-coercable data structure.

Parameters

source (Union[Transform, Dict, Any]) – The transform-coercible object to build a CLI around

class tinybaker.BakerDriverContext(fs_for_intermediates='file', max_threads=8, max_processes=8, parallel_mode='multithreading')

Driver Context for running TinyBaker transforms

Parameters
  • fs_for_intermediates (optional) – Which filesystem to use to store intermediates. You probably want this to be “file” or “memory”

  • max_threads (optional) – The max number of threads that TinyBaker can spawn.

  • parallel_mode (optional) – What parallelism mode to run TinyBaker in. Options are None and “multithreading”. These will probably expand over time. Experimental “multiprocessing” value can also be used.

tinybaker.fileref

class tinybaker.fileref.FileRef(path, read_bit, write_bit, worker_context)

Represents a reference to a file. TinyBaker generates these for use in the script() function

exists()

Determine whether the file specified by the FileRef exists

Return type

bool

Returns

Whether the file exists

open()

Open the FileRef for use with textual data

Return type

TextIOWrapper

Returns

The stream object for interacting with the FileRef

openbin()

Open the FileRef for use with binary data

Return type

Union[BufferedWriter, BufferedReader]

Returns

The stream for interacting with the FileRef

touch()

Mark the FileRef as being opened, without actually opening it.

This is useful if you want to perform some operation on a fileref outside of TinyBaker’s file abstractions, e.g. tag.touch(), followed by make_some_app_specific_mutation_on(tag.path)