ilovebandits package
Subpackages
- ilovebandits.data_bandits package
- ilovebandits.mab package
Submodules
ilovebandits.agents module
Core and main classes for the MAB problem.
- class ilovebandits.agents.BaseContextualAgent(arms: int, n_rounds_random=200, rng_seed=None)
Bases:
objectBase class for Contextual Bandit Agents.
Parameters
- armsint
Number of arms (actions) available to the agent.
- n_rounds_randomint, optional
Number of rounds to take random actions.
- rngnp.random.Generator, optional
Random number generator for reproducibility. Default is a random generator with seed 42.
- reset_agent()
Reset agent parameters such as arm counters.
- take_action(context) Tuple[int, float]
Takes one step for the agent.
According to the current context, self.n_rounds_random and self.update_agent_counts, returns the action the agent chooses at that time step.
Parameters
- contextnp.ndarray (2D)
The context features for which to estimate rewards of each arm. Shape (1, n_context_features).
Returns
int - the index of the current action. float - the probability of selecting the action.
- take_agent_action(context: ndarray) Tuple[int, float]
Takes one step for the agent according to the current context.
Parameters
- contextnp.ndarray (2D)
The context features for which to estimate rewards of each arm. Shape (1, n_context_features).
Returns
int - the index of the current action. float - the probability of selecting the action.
- take_random_action() Tuple[int, float]
Returns a random action from the available arms.
Returns
int - the index of the current action. float - the probability of selecting the action.
- update_agent(contexts: ndarray, actions: ndarray, rewards: ndarray) None
Update the agent’s parameters based on the historical data. This method should be implemented by subclasses.
Parameters
- c_trainnp.ndarray (2D). Shape (n_samples, n_context_features).
The context features for the training data samples.
- a_trainnp.ndarray (2D). Shape (n_samples, ).
The arms selected for each training data sample.
- r_trainnp.ndarray. Shape (n_samples, ).
The obtained reward for each training data sample.
- class ilovebandits.agents.BaseTreeEnsembleContextualAgent(arms: int, n_rounds_random: int = 200, base_model: RandomForestClassifier | RandomForestRegressor = RandomForestClassifier(criterion='log_loss', max_depth=3, min_samples_leaf=20, random_state=42), vpar: float = 1.0, one_model_per_arm: bool = True, rng_seed=None, min_rewards_per_arm: int = 2, min_samples_to_ignore_arm: int = 100)
Bases:
GreedyConAgentEpsilon Greedy Agent. Take Greedy action 1-epsilon% of times. Take random action epsilon% of times.
Parameters
- armsint
Number of arms (actions) available to the agent.
- n_rounds_randomint, optional
Number of rounds to take random actions.
- rngnp.random.Generator, optional
Random number generator for reproducibility. Default is a random generator with seed 42.
- epsilonfloat
Probability of taking a random action. Default is 0.1.
- one_model_per_armbool
If True, the agent will maintain a separate model for each arm. If False, a single model will be used for all arms.
- base_estimatorUnion[RandomForestRegressor, RandomForestClassifier]
The base estimator to be used for fitting the reward model.
- estimate_means_vars(context: ndarray) List[ndarray]
Estimate Q-values for each arm given the context.
Parameters
- contextnp.ndarray (2D)
The context features for which to estimate rewards of each arm. Shape (1, n_context_features).
Returns
- List[np.ndarray] - A list of Q-values/rewards for each arm.
Each element is a numpy array of shape (1,). It seems that this arrays with single elements work as a float number but with the advantages of being a numpy type, so we left the input this way.
- partial_fast_update(c_train: ndarray, a_train: ndarray, r_train: ndarray) None
Fast Update of the agent’s parameters based on the new data.
Parameters
- c_trainnp.ndarray (2D). Shape (n_samples, n_context_features).
The context features for the training data samples. Just the new data to be added.
- a_trainnp.ndarray (2D). Shape (n_samples, ). Just the new data to be added.
The arms selected for each training data sample.
- r_trainnp.ndarray. Shape (n_samples, ).
The obtained reward for each training data sample. Just the new data to be added.
- reset_agent()
Reset agent parameters such as arm counters.
- sample_qvals()
Sample Q-values from current distribution.
- take_agent_action(context: ndarray) Tuple[int, float]
Takes one step for the agent according to the current context.
Parameters
- contextnp.ndarray (2D)
The context features for which to estimate rewards of each arm. Shape (1, n_context_features).
Returns
int - the index of the current action. float - the probability of selecting the action.
- class ilovebandits.agents.BootStrapConAgent(base_estimator: RandomForestRegressor | RandomForestClassifier, arms: int, n_rounds_random: int = 200, rng_seed=None, divisor_bootstrap: int = 1)
Bases:
BaseContextualAgentBootstrap Agent. Disjoints models that are updated according to a bootstrapped sample for their training data associated to that arm. It tries to simulate Thompson Sampling by using bootstrapping.
Parameters
- armsint
Number of arms (actions) available to the agent.
- n_rounds_randomint, optional
Number of rounds to take random actions.
- rngnp.random.Generator, optional
Random number generator for reproducibility. Default is a random generator with seed 42.
- base_estimatorUnion[RandomForestRegressor, RandomForestClassifier]
The base estimator to be used for fitting the reward model.
- divisor_bootstrapint
if 1 we always bootstrap when take_agent_action(). If 2, we bootstrap half of the times we use take_agent_action(). If 3, we bootstrap a third of the times we use take_agent_action(), etc.
See the paper for more details: https://www.auai.org/uai2017/proceedings/papers/171.pdf
- estimate_qvals(context: ndarray) List[ndarray]
Estimate Q-values for each arm given the context.
Parameters
- contextnp.ndarray (2D)
The context features for which to estimate rewards of each arm. Shape (1, n_context_features).
Returns
- List[np.ndarray] - A list of Q-values/rewards for each arm.
Each element is a numpy array of shape (1,). It seems that this arrays with single elements work as a float number but with the advantages of being a numpy type, so we left the input this way.
- reset_agent()
Reset agent parameters such as arm counters.
- take_agent_action(context: ndarray) Tuple[int, float]
Takes one step for the agent.
According to the current context, returns the action the agent chooses at that time step.
Parameters
- contextnp.ndarray (2D)
The context features for which to estimate rewards of each arm. Shape (1, n_context_features).
Returns
int - the index of the current action. float - the probability of selecting the action.
- update_agent(c_train: ndarray, a_train: ndarray, r_train: ndarray) None
Update the agent’s parameters based on the historical data. This method should be implemented by subclasses.
Parameters
- c_trainnp.ndarray (2D). Shape (n_samples, n_context_features).
The context features for the training data samples.
- a_trainnp.ndarray (2D). Shape (n_samples, ).
The arms selected for each training data sample.
- r_trainnp.ndarray. Shape (n_samples, ).
The obtained reward for each training data sample.
- class ilovebandits.agents.EpsGreedyConAgent(base_estimator: RandomForestRegressor | RandomForestClassifier, arms: int, n_rounds_random: int = 200, epsilon: float = 0.1, one_model_per_arm: bool = True, rng_seed=None, min_rewards_per_arm: int = 2, min_samples_to_ignore_arm: int = 100)
Bases:
GreedyConAgentEpsilon Greedy Agent. Take Greedy action 1-epsilon% of times. Take random action epsilon% of times.
Parameters
- armsint
Number of arms (actions) available to the agent.
- n_rounds_randomint, optional
Number of rounds to take random actions.
- rngnp.random.Generator, optional
Random number generator for reproducibility. Default is a random generator with seed 42.
- epsilonfloat
Probability of taking a random action. Default is 0.1.
- one_model_per_armbool
If True, the agent will maintain a separate model for each arm. If False, a single model will be used for all arms.
- base_estimatorUnion[RandomForestRegressor, RandomForestClassifier]
The base estimator to be used for fitting the reward model.
- take_agent_action(context: ndarray) Tuple[int, float]
Takes one step for the agent.
According to the current context, returns the action the agent chooses at that time step.
Parameters
- contextnp.ndarray (2D)
The context features for which to estimate rewards of each arm. Shape (1, n_context_features).
Returns
int - the index of the current action. float - the probability of selecting the action.
- class ilovebandits.agents.GreedyConAgent(base_estimator: RandomForestRegressor | RandomForestClassifier, arms: int, n_rounds_random: int = 200, one_model_per_arm: bool = True, rng_seed=None, min_rewards_per_arm: int = 2, min_samples_to_ignore_arm: int = 100)
Bases:
BaseContextualAgentEpsilon Greedy Agent. Take Greedy action 1-epsilon% of times. Take random action epsilon% of times.
Parameters
- armsint
Number of arms (actions) available to the agent.
- n_rounds_randomint, optional
Number of rounds to take random actions.
- rngnp.random.Generator, optional
Random number generator for reproducibility. Default is a random generator with seed 42.
- epsilonfloat
Probability of taking a random action. Default is 0.1.
- one_model_per_armbool
If True, the agent will maintain a separate model for each arm. If False, a single model will be used for all arms.
- base_estimatorUnion[RandomForestRegressor, RandomForestClassifier]
The base estimator to be used for fitting the reward model.
- estimate_qvals(context: ndarray) List[ndarray]
Estimate Q-values for each arm given the context.
Parameters
- contextnp.ndarray (2D)
The context features for which to estimate rewards of each arm. Shape (1, n_context_features).
Returns
- List[np.ndarray] - A list of Q-values/rewards for each arm.
Each element is a numpy array of shape (1,). It seems that this arrays with single elements work as a float number but with the advantages of being a numpy type, so we left the input this way.
- reset_agent()
Reset agent parameters such as arm counters.
- take_agent_action(context: ndarray) Tuple[int, float]
Takes one step for the agent.
According to the current context, returns the action the agent chooses at that time step.
Parameters
- contextnp.ndarray (2D)
The context features for which to estimate rewards of each arm. Shape (1, n_context_features).
Returns
int - the index of the current action. float - the probability of selecting the action.
- update_agent(c_train: ndarray, a_train: ndarray, r_train: ndarray) None
Update the agent’s parameters based on the historical data.This method should be implemented by subclasses.
Parameters
- c_trainnp.ndarray (2D). Shape (n_samples, n_context_features).
The context features for the training data samples.
- a_trainnp.ndarray (2D). Shape (n_samples, ).
The arms selected for each training data sample.
- r_trainnp.ndarray. Shape (n_samples, ).
The obtained reward for each training data sample.
- class ilovebandits.agents.RandomForestTsAgent(samples_for_freq_est: int = 100, **kwargs)
Bases:
BaseTreeEnsembleContextualAgent- sample_qvals()
Sample Q-values from current distribution.
- take_agent_action(context: ndarray) Tuple[int, float]
Takes one step for the agent.
According to the current context, returns the action the agent chooses at that time step.
Parameters
- contextnp.ndarray (2D)
The context features for which to estimate rewards of each arm. Shape (1, n_context_features).
Returns
int - the index of the current action. float - the probability of selecting the action.
- class ilovebandits.agents.RandomForestUcbAgent(arms: int, n_rounds_random: int = 200, base_model: RandomForestClassifier | RandomForestRegressor = RandomForestClassifier(criterion='log_loss', max_depth=3, min_samples_leaf=20, random_state=42), vpar: float = 1.0, one_model_per_arm: bool = True, rng_seed=None, min_rewards_per_arm: int = 2, min_samples_to_ignore_arm: int = 100)
Bases:
BaseTreeEnsembleContextualAgent- reset_agent()
Reset agent parameters such as arm counters.
- sample_qvals()
Sample Q-values from current distribution.
- take_agent_action(context: ndarray) Tuple[int, float]
Takes one step for the agent.
According to the current context, returns the action the agent chooses at that time step.
Parameters
- contextnp.ndarray (2D)
The context features for which to estimate rewards of each arm. Shape (1, n_context_features).
Returns
int - the index of the current action. float - the probability of selecting the action.
ilovebandits.sim module
Classes to perform simulations.
- exception ilovebandits.sim.NoRewardsReceivedError
Bases:
ExceptionException raised when no rewards were received during the simulation.
- exception ilovebandits.sim.NotAbleToUpdateBanditError(info_last_ite_failed: Tuple)
Bases:
ExceptionException raised when it was not possible to update the bandit during the whole simulation.
- class ilovebandits.sim.SimContBandit(min_ites_to_train: int, update_factor: int, agent, model_env)
Bases:
objectPerform a simulation with delays for contextual bandits.
- reset_agent_and_env()
Reset agent and environment.
ilovebandits.utils module
Utils functions for the package.
- ilovebandits.utils.argmax(q_values: List, rng: Generator) Tuple[int, float, List[int]]
Takes in a list of q_values and returns the index of the item with the highest value. Breaks ties randomly.
Returns
int - the index of the highest value in q_values. float - the probability of selecting the action. list[int] - the list of indices that are tied for the highest value.
- ilovebandits.utils.find_max_indices(numbers: List) List
Returns a list with the index of the max number. In case of tie, the indexs of the tied numbers are returned.
- ilovebandits.utils.find_max_numbers(numbers: List) List
Returns a list with the max number. In case of tie, the tied numbers are returned.
- ilovebandits.utils.is_fitted(model) bool
Module contents
Configure submodules say the library what to do when calling “ilovebandits.submodule.