Dataset#

class spark.spark.SPARK#

Bases: object

A class for reading and manipulating the instruments of the SPARK dataset.

  1. Begin by intialising the class with the path to the SPARK dataset directory and the instruments you would like to manipulate.

  2. Then you can construct dataframes of features indexed by the primary key using the join() method.

from spark import SPARK, Inst, Feat

ds = SPARK(
    spark_pathname=spark_pathname,
    instruments=[Inst.RBSR],
)

df = ds.join(features: [Feat.RBSR_TOTAL_FINAL_SCORE])
static init_and_join(spark_pathname: str, features: list[Feat], how: Literal['left', 'right', 'inner', 'outer', 'cross'] = 'outer') tuple[SPARK, DataFrame, list[Inst]]#

Initializes the SPARK dataset and joins the specified features into a dataframe.

This is a convenience method that combines the initialization of the SPARK dataset with the joining of features, preventing mismatches between requested features and the instruments loaded.

Parameters:
  • spark_pathname – The SPARK data release directory pathname.

  • features – The features to join.

  • how – The type of join to perform. Refer to pandas documentation for more details.

Returns:

A tuple containing the SPARK dataset instance, the joined dataframe, and a list of instruments used in the join.

__init__(spark_pathname: str, instruments: list[Inst] = None)#

Initializes the SPARK dataset with the specified instruments.

Parameters:
  • spark_pathname – The SPARK data release directory, which should end with a date delimited by an underscore.

  • instruments – A list of instrument names to include. If None, all instruments will be loaded.

join(features: list[Feat], how: Literal['left', 'right', 'inner', 'outer', 'cross'] = 'outer', rename: bool = True) DataFrame#

Joins the specified features from the SPARK dataset into a single dataframe.

Parameters:
  • features – A list of features to join.

  • how

    The type of join to perform. Refer to pandas documentation for more details.

Returns:

A dataframe containing the joined features.

instruments: dict[str, DataFrame]#

A dictionary mapping instrument codes to their corresponding dataframes.