ilovebandits package

Subpackages

Submodules

ilovebandits.agents module

Core and main classes for the MAB problem.

class ilovebandits.agents.BaseContextualAgent(arms: int, n_rounds_random=200, rng_seed=None)

Bases: object

Base class for Contextual Bandit Agents.

Parameters

armsint: Number of arms (actions) available to the agent.
n_rounds_randomint, optional: Number of rounds to take random actions.
rngnp.random.Generator, optional: Random number generator for reproducibility. Default is a random generator with seed 42.

reset_agent(): Reset agent parameters such as arm counters.

take_action(context) → Tuple[int, float]

Takes one step for the agent.

According to the current context, self.n_rounds_random and self.update_agent_counts, returns the action the agent chooses at that time step.

Parameters

contextnp.ndarray (2D): The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

take_agent_action(context: ndarray) → Tuple[int, float]

Takes one step for the agent according to the current context.

Parameters

contextnp.ndarray (2D): The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

take_random_action() → Tuple[int, float]: Returns a random action from the available arms.

Returns

int - the index of the current action. float - the probability of selecting the action.

update_agent(contexts: ndarray, actions: ndarray, rewards: ndarray) → None

Update the agent’s parameters based on the historical data. This method should be implemented by subclasses.

Parameters

c_trainnp.ndarray (2D). Shape (n_samples, n_context_features).: The context features for the training data samples.
a_trainnp.ndarray (2D). Shape (n_samples, ).: The arms selected for each training data sample.
r_trainnp.ndarray. Shape (n_samples, ).: The obtained reward for each training data sample.

class ilovebandits.agents.BaseTreeEnsembleContextualAgent(arms: int, n_rounds_random: int = 200, base_model: RandomForestClassifier | RandomForestRegressor = RandomForestClassifier(criterion='log_loss', max_depth=3, min_samples_leaf=20, random_state=42), vpar: float = 1.0, one_model_per_arm: bool = True, rng_seed=None, min_rewards_per_arm: int = 2, min_samples_to_ignore_arm: int = 100)

Bases: GreedyConAgent

Epsilon Greedy Agent. Take Greedy action 1-epsilon% of times. Take random action epsilon% of times.

Parameters

armsint: Number of arms (actions) available to the agent.
n_rounds_randomint, optional: Number of rounds to take random actions.
rngnp.random.Generator, optional: Random number generator for reproducibility. Default is a random generator with seed 42.
epsilonfloat: Probability of taking a random action. Default is 0.1.
one_model_per_armbool: If True, the agent will maintain a separate model for each arm. If False, a single model will be used for all arms.
base_estimatorUnion[RandomForestRegressor, RandomForestClassifier]: The base estimator to be used for fitting the reward model.

estimate_means_vars(context: ndarray) → List[ndarray]

Estimate Q-values for each arm given the context.

Parameters

contextnp.ndarray (2D): The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

List[np.ndarray] - A list of Q-values/rewards for each arm.: Each element is a numpy array of shape (1,). It seems that this arrays with single elements work as a float number but with the advantages of being a numpy type, so we left the input this way.

partial_fast_update(c_train: ndarray, a_train: ndarray, r_train: ndarray) → None

Fast Update of the agent’s parameters based on the new data.

Parameters

c_trainnp.ndarray (2D). Shape (n_samples, n_context_features).: The context features for the training data samples. Just the new data to be added.
a_trainnp.ndarray (2D). Shape (n_samples, ). Just the new data to be added.: The arms selected for each training data sample.
r_trainnp.ndarray. Shape (n_samples, ).: The obtained reward for each training data sample. Just the new data to be added.

reset_agent(): Reset agent parameters such as arm counters.

sample_qvals(): Sample Q-values from current distribution.

take_agent_action(context: ndarray) → Tuple[int, float]

Takes one step for the agent according to the current context.

Parameters

contextnp.ndarray (2D): The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

class ilovebandits.agents.BootStrapConAgent(base_estimator: RandomForestRegressor | RandomForestClassifier, arms: int, n_rounds_random: int = 200, rng_seed=None, divisor_bootstrap: int = 1)

Bases: BaseContextualAgent

Bootstrap Agent. Disjoints models that are updated according to a bootstrapped sample for their training data associated to that arm. It tries to simulate Thompson Sampling by using bootstrapping.

Parameters

armsint: Number of arms (actions) available to the agent.
n_rounds_randomint, optional: Number of rounds to take random actions.
rngnp.random.Generator, optional: Random number generator for reproducibility. Default is a random generator with seed 42.
base_estimatorUnion[RandomForestRegressor, RandomForestClassifier]: The base estimator to be used for fitting the reward model.
divisor_bootstrapint: if 1 we always bootstrap when take_agent_action(). If 2, we bootstrap half of the times we use take_agent_action(). If 3, we bootstrap a third of the times we use take_agent_action(), etc.

See the paper for more details: https://www.auai.org/uai2017/proceedings/papers/171.pdf

estimate_qvals(context: ndarray) → List[ndarray]

Estimate Q-values for each arm given the context.

Parameters

contextnp.ndarray (2D): The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

List[np.ndarray] - A list of Q-values/rewards for each arm.: Each element is a numpy array of shape (1,). It seems that this arrays with single elements work as a float number but with the advantages of being a numpy type, so we left the input this way.

reset_agent(): Reset agent parameters such as arm counters.

take_agent_action(context: ndarray) → Tuple[int, float]

Takes one step for the agent.

According to the current context, returns the action the agent chooses at that time step.

Parameters

contextnp.ndarray (2D): The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

update_agent(c_train: ndarray, a_train: ndarray, r_train: ndarray) → None

Update the agent’s parameters based on the historical data. This method should be implemented by subclasses.

Parameters

c_trainnp.ndarray (2D). Shape (n_samples, n_context_features).: The context features for the training data samples.
a_trainnp.ndarray (2D). Shape (n_samples, ).: The arms selected for each training data sample.
r_trainnp.ndarray. Shape (n_samples, ).: The obtained reward for each training data sample.

class ilovebandits.agents.EpsGreedyConAgent(base_estimator: RandomForestRegressor | RandomForestClassifier, arms: int, n_rounds_random: int = 200, epsilon: float = 0.1, one_model_per_arm: bool = True, rng_seed=None, min_rewards_per_arm: int = 2, min_samples_to_ignore_arm: int = 100)

Bases: GreedyConAgent

Epsilon Greedy Agent. Take Greedy action 1-epsilon% of times. Take random action epsilon% of times.

Parameters

armsint: Number of arms (actions) available to the agent.
n_rounds_randomint, optional: Number of rounds to take random actions.
rngnp.random.Generator, optional: Random number generator for reproducibility. Default is a random generator with seed 42.
epsilonfloat: Probability of taking a random action. Default is 0.1.
one_model_per_armbool: If True, the agent will maintain a separate model for each arm. If False, a single model will be used for all arms.
base_estimatorUnion[RandomForestRegressor, RandomForestClassifier]: The base estimator to be used for fitting the reward model.

take_agent_action(context: ndarray) → Tuple[int, float]

Takes one step for the agent.

According to the current context, returns the action the agent chooses at that time step.

Parameters

contextnp.ndarray (2D): The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

class ilovebandits.agents.GreedyConAgent(base_estimator: RandomForestRegressor | RandomForestClassifier, arms: int, n_rounds_random: int = 200, one_model_per_arm: bool = True, rng_seed=None, min_rewards_per_arm: int = 2, min_samples_to_ignore_arm: int = 100)

Bases: BaseContextualAgent

Epsilon Greedy Agent. Take Greedy action 1-epsilon% of times. Take random action epsilon% of times.

Parameters

armsint: Number of arms (actions) available to the agent.
n_rounds_randomint, optional: Number of rounds to take random actions.
rngnp.random.Generator, optional: Random number generator for reproducibility. Default is a random generator with seed 42.
epsilonfloat: Probability of taking a random action. Default is 0.1.
one_model_per_armbool: If True, the agent will maintain a separate model for each arm. If False, a single model will be used for all arms.
base_estimatorUnion[RandomForestRegressor, RandomForestClassifier]: The base estimator to be used for fitting the reward model.

estimate_qvals(context: ndarray) → List[ndarray]

Estimate Q-values for each arm given the context.

Parameters

contextnp.ndarray (2D): The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

List[np.ndarray] - A list of Q-values/rewards for each arm.: Each element is a numpy array of shape (1,). It seems that this arrays with single elements work as a float number but with the advantages of being a numpy type, so we left the input this way.

reset_agent(): Reset agent parameters such as arm counters.

take_agent_action(context: ndarray) → Tuple[int, float]

Takes one step for the agent.

According to the current context, returns the action the agent chooses at that time step.

Parameters

contextnp.ndarray (2D): The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

update_agent(c_train: ndarray, a_train: ndarray, r_train: ndarray) → None

Update the agent’s parameters based on the historical data.This method should be implemented by subclasses.

Parameters

c_trainnp.ndarray (2D). Shape (n_samples, n_context_features).: The context features for the training data samples.
a_trainnp.ndarray (2D). Shape (n_samples, ).: The arms selected for each training data sample.
r_trainnp.ndarray. Shape (n_samples, ).: The obtained reward for each training data sample.

class ilovebandits.agents.RandomForestTsAgent(samples_for_freq_est: int = 100, **kwargs)

Bases: BaseTreeEnsembleContextualAgent

sample_qvals(): Sample Q-values from current distribution.

take_agent_action(context: ndarray) → Tuple[int, float]

Takes one step for the agent.

According to the current context, returns the action the agent chooses at that time step.

Parameters

contextnp.ndarray (2D): The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

class ilovebandits.agents.RandomForestUcbAgent(arms: int, n_rounds_random: int = 200, base_model: RandomForestClassifier | RandomForestRegressor = RandomForestClassifier(criterion='log_loss', max_depth=3, min_samples_leaf=20, random_state=42), vpar: float = 1.0, one_model_per_arm: bool = True, rng_seed=None, min_rewards_per_arm: int = 2, min_samples_to_ignore_arm: int = 100)

Bases: BaseTreeEnsembleContextualAgent

reset_agent(): Reset agent parameters such as arm counters.

sample_qvals(): Sample Q-values from current distribution.

take_agent_action(context: ndarray) → Tuple[int, float]

Takes one step for the agent.

According to the current context, returns the action the agent chooses at that time step.

Parameters

contextnp.ndarray (2D): The context features for which to estimate rewards of each arm. Shape (1, n_context_features).

Returns

int - the index of the current action. float - the probability of selecting the action.

ilovebandits.sim module

Classes to perform simulations.

exception ilovebandits.sim.NoRewardsReceivedError

Bases: Exception

Exception raised when no rewards were received during the simulation.

exception ilovebandits.sim.NotAbleToUpdateBanditError(info_last_ite_failed: Tuple)

Bases: Exception

Exception raised when it was not possible to update the bandit during the whole simulation.

class ilovebandits.sim.SimContBandit(min_ites_to_train: int, update_factor: int, agent, model_env)

Bases: object

Perform a simulation with delays for contextual bandits.

reset_agent_and_env(): Reset agent and environment.

simulate(iterations: int = 1000)

Perform the simulation for the given number of iterations.

Parameters

iterations: int: number of iterations for the simulation.

class ilovebandits.sim.SimMabBandit(agent, model_env)

Bases: object

Perform a simulation with delays for MAB bandits. Environment should have binary rewards in 0-1.

reset_agent_and_env(): Reset agent and environment.

simulate(iterations: int = 1000)

Perform the simulation for the given number of iterations.

Parameters

iterations: int: number of iterations for the simulation.

ilovebandits.utils module

Utils functions for the package.

ilovebandits.utils.argmax(q_values: List, rng: Generator) → Tuple[int, float, List[int]]: Takes in a list of q_values and returns the index of the item with the highest value. Breaks ties randomly.

Returns

int - the index of the highest value in q_values. float - the probability of selecting the action. list[int] - the list of indices that are tied for the highest value.

ilovebandits.utils.find_max_indices(numbers: List) → List: Returns a list with the index of the max number. In case of tie, the indexs of the tied numbers are returned.

ilovebandits.utils.find_max_numbers(numbers: List) → List: Returns a list with the max number. In case of tie, the tied numbers are returned.

ilovebandits.utils.is_fitted(model) → bool

Module contents

Configure submodules say the library what to do when calling “ilovebandits.submodule.