ilovebandits package

Subpackages

Submodules

ilovebandits.agents module

Core and main classes for the MAB problem.

class ilovebandits.agents.BaseContextualAgent(arms: int, n_rounds_random=200, rng_seed=None)

Bases: object

Base class for Contextual Bandit Agents.

Parameters

armsint

Number of arms (actions) available to the agent.

n_rounds_randomint, optional

Number of rounds to take random actions.

rngnp.random.Generator, optional

Random number generator for reproducibility. Default is a random generator with seed 42.

reset_agent()

Reset agent parameters such as arm counters.

take_action(context) Tuple[int, float]

Takes one step for the agent.

According to the current context, self.n_rounds_random and self.update_agent_counts, returns the action the agent chooses at that time step.

Parameters

contextnp.ndarray (2D)

The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

take_agent_action(context: ndarray) Tuple[int, float]

Takes one step for the agent according to the current context.

Parameters

contextnp.ndarray (2D)

The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

take_random_action() Tuple[int, float]

Returns a random action from the available arms.

Returns

int - the index of the current action. float - the probability of selecting the action.

update_agent(contexts: ndarray, actions: ndarray, rewards: ndarray) None

Update the agent’s parameters based on the historical data. This method should be implemented by subclasses.

Parameters

c_trainnp.ndarray (2D). Shape (n_samples, n_context_features).

The context features for the training data samples.

a_trainnp.ndarray (2D). Shape (n_samples, ).

The arms selected for each training data sample.

r_trainnp.ndarray. Shape (n_samples, ).

The obtained reward for each training data sample.

class ilovebandits.agents.BaseTreeEnsembleContextualAgent(arms: int, n_rounds_random: int = 200, base_model: RandomForestClassifier | RandomForestRegressor = RandomForestClassifier(criterion='log_loss', max_depth=3, min_samples_leaf=20, random_state=42), vpar: float = 1.0, one_model_per_arm: bool = True, rng_seed=None, min_rewards_per_arm: int = 2, min_samples_to_ignore_arm: int = 100)

Bases: GreedyConAgent

Epsilon Greedy Agent. Take Greedy action 1-epsilon% of times. Take random action epsilon% of times.

Parameters

armsint

Number of arms (actions) available to the agent.

n_rounds_randomint, optional

Number of rounds to take random actions.

rngnp.random.Generator, optional

Random number generator for reproducibility. Default is a random generator with seed 42.

epsilonfloat

Probability of taking a random action. Default is 0.1.

one_model_per_armbool

If True, the agent will maintain a separate model for each arm. If False, a single model will be used for all arms.

base_estimatorUnion[RandomForestRegressor, RandomForestClassifier]

The base estimator to be used for fitting the reward model.

estimate_means_vars(context: ndarray) List[ndarray]

Estimate Q-values for each arm given the context.

Parameters

contextnp.ndarray (2D)

The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

List[np.ndarray] - A list of Q-values/rewards for each arm.

Each element is a numpy array of shape (1,). It seems that this arrays with single elements work as a float number but with the advantages of being a numpy type, so we left the input this way.

partial_fast_update(c_train: ndarray, a_train: ndarray, r_train: ndarray) None

Fast Update of the agent’s parameters based on the new data.

Parameters

c_trainnp.ndarray (2D). Shape (n_samples, n_context_features).

The context features for the training data samples. Just the new data to be added.

a_trainnp.ndarray (2D). Shape (n_samples, ). Just the new data to be added.

The arms selected for each training data sample.

r_trainnp.ndarray. Shape (n_samples, ).

The obtained reward for each training data sample. Just the new data to be added.

reset_agent()

Reset agent parameters such as arm counters.

sample_qvals()

Sample Q-values from current distribution.

take_agent_action(context: ndarray) Tuple[int, float]

Takes one step for the agent according to the current context.

Parameters

contextnp.ndarray (2D)

The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

class ilovebandits.agents.BootStrapConAgent(base_estimator: RandomForestRegressor | RandomForestClassifier, arms: int, n_rounds_random: int = 200, rng_seed=None, divisor_bootstrap: int = 1)

Bases: BaseContextualAgent

Bootstrap Agent. Disjoints models that are updated according to a bootstrapped sample for their training data associated to that arm. It tries to simulate Thompson Sampling by using bootstrapping.

Parameters

armsint

Number of arms (actions) available to the agent.

n_rounds_randomint, optional

Number of rounds to take random actions.

rngnp.random.Generator, optional

Random number generator for reproducibility. Default is a random generator with seed 42.

base_estimatorUnion[RandomForestRegressor, RandomForestClassifier]

The base estimator to be used for fitting the reward model.

divisor_bootstrapint

if 1 we always bootstrap when take_agent_action(). If 2, we bootstrap half of the times we use take_agent_action(). If 3, we bootstrap a third of the times we use take_agent_action(), etc.

See the paper for more details: https://www.auai.org/uai2017/proceedings/papers/171.pdf

estimate_qvals(context: ndarray) List[ndarray]

Estimate Q-values for each arm given the context.

Parameters

contextnp.ndarray (2D)

The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

List[np.ndarray] - A list of Q-values/rewards for each arm.

Each element is a numpy array of shape (1,). It seems that this arrays with single elements work as a float number but with the advantages of being a numpy type, so we left the input this way.

reset_agent()

Reset agent parameters such as arm counters.

take_agent_action(context: ndarray) Tuple[int, float]

Takes one step for the agent.

According to the current context, returns the action the agent chooses at that time step.

Parameters

contextnp.ndarray (2D)

The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

update_agent(c_train: ndarray, a_train: ndarray, r_train: ndarray) None

Update the agent’s parameters based on the historical data. This method should be implemented by subclasses.

Parameters

c_trainnp.ndarray (2D). Shape (n_samples, n_context_features).

The context features for the training data samples.

a_trainnp.ndarray (2D). Shape (n_samples, ).

The arms selected for each training data sample.

r_trainnp.ndarray. Shape (n_samples, ).

The obtained reward for each training data sample.

class ilovebandits.agents.EpsGreedyConAgent(base_estimator: RandomForestRegressor | RandomForestClassifier, arms: int, n_rounds_random: int = 200, epsilon: float = 0.1, one_model_per_arm: bool = True, rng_seed=None, min_rewards_per_arm: int = 2, min_samples_to_ignore_arm: int = 100)

Bases: GreedyConAgent

Epsilon Greedy Agent. Take Greedy action 1-epsilon% of times. Take random action epsilon% of times.

Parameters

armsint

Number of arms (actions) available to the agent.

n_rounds_randomint, optional

Number of rounds to take random actions.

rngnp.random.Generator, optional

Random number generator for reproducibility. Default is a random generator with seed 42.

epsilonfloat

Probability of taking a random action. Default is 0.1.

one_model_per_armbool

If True, the agent will maintain a separate model for each arm. If False, a single model will be used for all arms.

base_estimatorUnion[RandomForestRegressor, RandomForestClassifier]

The base estimator to be used for fitting the reward model.

take_agent_action(context: ndarray) Tuple[int, float]

Takes one step for the agent.

According to the current context, returns the action the agent chooses at that time step.

Parameters

contextnp.ndarray (2D)

The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

class ilovebandits.agents.GreedyConAgent(base_estimator: RandomForestRegressor | RandomForestClassifier, arms: int, n_rounds_random: int = 200, one_model_per_arm: bool = True, rng_seed=None, min_rewards_per_arm: int = 2, min_samples_to_ignore_arm: int = 100)

Bases: BaseContextualAgent

Epsilon Greedy Agent. Take Greedy action 1-epsilon% of times. Take random action epsilon% of times.

Parameters

armsint

Number of arms (actions) available to the agent.

n_rounds_randomint, optional

Number of rounds to take random actions.

rngnp.random.Generator, optional

Random number generator for reproducibility. Default is a random generator with seed 42.

epsilonfloat

Probability of taking a random action. Default is 0.1.

one_model_per_armbool

If True, the agent will maintain a separate model for each arm. If False, a single model will be used for all arms.

base_estimatorUnion[RandomForestRegressor, RandomForestClassifier]

The base estimator to be used for fitting the reward model.

estimate_qvals(context: ndarray) List[ndarray]

Estimate Q-values for each arm given the context.

Parameters

contextnp.ndarray (2D)

The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

List[np.ndarray] - A list of Q-values/rewards for each arm.

Each element is a numpy array of shape (1,). It seems that this arrays with single elements work as a float number but with the advantages of being a numpy type, so we left the input this way.

reset_agent()

Reset agent parameters such as arm counters.

take_agent_action(context: ndarray) Tuple[int, float]

Takes one step for the agent.

According to the current context, returns the action the agent chooses at that time step.

Parameters

contextnp.ndarray (2D)

The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

update_agent(c_train: ndarray, a_train: ndarray, r_train: ndarray) None

Update the agent’s parameters based on the historical data.This method should be implemented by subclasses.

Parameters

c_trainnp.ndarray (2D). Shape (n_samples, n_context_features).

The context features for the training data samples.

a_trainnp.ndarray (2D). Shape (n_samples, ).

The arms selected for each training data sample.

r_trainnp.ndarray. Shape (n_samples, ).

The obtained reward for each training data sample.

class ilovebandits.agents.RandomForestTsAgent(samples_for_freq_est: int = 100, **kwargs)

Bases: BaseTreeEnsembleContextualAgent

sample_qvals()

Sample Q-values from current distribution.

take_agent_action(context: ndarray) Tuple[int, float]

Takes one step for the agent.

According to the current context, returns the action the agent chooses at that time step.

Parameters

contextnp.ndarray (2D)

The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

class ilovebandits.agents.RandomForestUcbAgent(arms: int, n_rounds_random: int = 200, base_model: RandomForestClassifier | RandomForestRegressor = RandomForestClassifier(criterion='log_loss', max_depth=3, min_samples_leaf=20, random_state=42), vpar: float = 1.0, one_model_per_arm: bool = True, rng_seed=None, min_rewards_per_arm: int = 2, min_samples_to_ignore_arm: int = 100)

Bases: BaseTreeEnsembleContextualAgent

reset_agent()

Reset agent parameters such as arm counters.

sample_qvals()

Sample Q-values from current distribution.

take_agent_action(context: ndarray) Tuple[int, float]

Takes one step for the agent.

According to the current context, returns the action the agent chooses at that time step.

Parameters

contextnp.ndarray (2D)

The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

ilovebandits.sim module

Classes to perform simulations.

exception ilovebandits.sim.NoRewardsReceivedError

Bases: Exception

Exception raised when no rewards were received during the simulation.

exception ilovebandits.sim.NotAbleToUpdateBanditError(info_last_ite_failed: Tuple)

Bases: Exception

Exception raised when it was not possible to update the bandit during the whole simulation.

class ilovebandits.sim.SimContBandit(min_ites_to_train: int, update_factor: int, agent, model_env)

Bases: object

Perform a simulation with delays for contextual bandits.

reset_agent_and_env()

Reset agent and environment.

simulate(iterations: int = 1000)

Perform the simulation for the given number of iterations.

Parameters

iterations: int

number of iterations for the simulation.

class ilovebandits.sim.SimMabBandit(agent, model_env)

Bases: object

Perform a simulation with delays for MAB bandits. Environment should have binary rewards in 0-1.

reset_agent_and_env()

Reset agent and environment.

simulate(iterations: int = 1000)

Perform the simulation for the given number of iterations.

Parameters

iterations: int

number of iterations for the simulation.

ilovebandits.utils module

Utils functions for the package.

ilovebandits.utils.argmax(q_values: List, rng: Generator) Tuple[int, float, List[int]]

Takes in a list of q_values and returns the index of the item with the highest value. Breaks ties randomly.

Returns

int - the index of the highest value in q_values. float - the probability of selecting the action. list[int] - the list of indices that are tied for the highest value.

ilovebandits.utils.find_max_indices(numbers: List) List

Returns a list with the index of the max number. In case of tie, the indexs of the tied numbers are returned.

ilovebandits.utils.find_max_numbers(numbers: List) List

Returns a list with the max number. In case of tie, the tied numbers are returned.

ilovebandits.utils.is_fitted(model) bool

Module contents

Configure submodules say the library what to do when calling “ilovebandits.submodule.