ilovebandits.mab package
Submodules
ilovebandits.mab.agents module
Core and main classes for the MAB problem.
- class ilovebandits.mab.agents.BaseAgent(q_estimator: QEstMean | QEstFixedStep)
Bases:
objectBase class for the agents. This base class is just a blueprint.
- reset_agent()
Reset agent parameters such as arm counters and q estimations.
- reset_only_estimator()
Reset q_estimator of the agent.
- take_action()
Takes one step for the agent.
It takes in a reward and observation and returns the action the agent chooses at that time step.
- class ilovebandits.mab.agents.EpsilonGreedyAgent(q_estimator: QEstMean | QEstFixedStep, epsilon: float)
Bases:
BaseAgentEpsilon Greedy Agent. Take Greedy action 1-epsilon% of times. Take random action epsilon% of times.
- take_action() Tuple[int, int, float]
Takes one step for the agent.
It takes in a reward and observation and returns the action the agent chooses at that time step.
Returns
int - the index of the current action. int - the number of times the current action has been chosen. float - the probability of selecting the action.
- class ilovebandits.mab.agents.GreedyAgent(q_estimator: QEstMean | QEstFixedStep)
Bases:
BaseAgentPure Greedy Agent. Takes always greedy action.
- take_action() Tuple[int, int, float]
Takes one step for the agent.
It takes in a reward and observation and returns the action the agent chooses at that time step.
Returns
int - the index of the current action. int - the number of times the current action has been chosen. float - the probability of selecting the action.
- class ilovebandits.mab.agents.RandomAgent(q_estimator: QEstMean | QEstFixedStep)
Bases:
BaseAgentRandom Agent. It can be used as a baseline.
- take_action() Tuple[int, int, float]
Takes one step for the agent.
It takes in a reward and observation and returns the action the agent chooses at that time step.
Returns
int - the index of the current action. int - the number of times the current action has been chosen. float - the probability of selecting the action.
- class ilovebandits.mab.agents.TSAgent(arms: int, a_init: float = 1, b_init: float = 1, samples_for_freq_est: int = 100000)
Bases:
BaseAgentImplements Thompson Sampling Agent for discrete 0-1 rewards. Default a_init=1 and b_init=1 correspond to unfiform distribution.
- reset_agent()
Reset agent parameters such as arm counters and q estimations.
- take_action() Tuple[int, int, float]
Takes one step for the agent.
It takes in a reward and observation and returns the action the agent chooses at that time step.
Returns
int - the index of the current action. int - the number of times the current action has been chosen. float - the probability of selecting the action.
- class ilovebandits.mab.agents.UCBAgent(q_estimator: QEstMean | QEstFixedStep, c: float)
Bases:
BaseAgentImplements the UCB1 Agent.
- reset_agent()
Reset agent parameters such as arm counters and q estimations.
- take_action() Tuple[int, int, float]
Takes one step for the agent.
It takes in a reward and observation and returns the action the agent chooses at that time step.
Returns
int - the index of the current action. int - the number of times the current action has been chosen. float - the probability of selecting the action.
ilovebandits.mab.q_estimators module
Core and main classes for the MAB problem.
- class ilovebandits.mab.q_estimators.BaseScaler
Bases:
objectBase class for the reward scalers. This base class lets the rewards as it is.
- scale(reward)
Scale reward.
- class ilovebandits.mab.q_estimators.BernoulliBinarizationScaler(seed=42)
Bases:
BaseScalerClass to convert rewards from [0,1] domain to {0,1} domain.
- scale(reward)
Scale reward from [0,1] to {0,1}.
- with_proba(epsilon)
Bernoulli test, with probability \(\varepsilon\), return True, and with probability \(1 - \varepsilon\), return False.
- class ilovebandits.mab.q_estimators.QEstBase(arms: int, qvals_init=None, reward_scaler: BaseScaler | BernoulliBinarizationScaler | None = None)
Bases:
objectBase class for the Q estimators. This base class is just a blueprint.
- estimate(action)
Base estimate: update the arm count for the updates.
- reset_arm_counts()
Reset arm count updates.
- class ilovebandits.mab.q_estimators.QEstFixedStep(arms: int, step_size: float, qvals_init=None, reward_scaler: BaseScaler | BernoulliBinarizationScaler | None = None)
Bases:
QEstBaseQ estimator to estimate reward with constant step size.
- estimate(reward, action)
Estimate reward for the given arm/action.
- class ilovebandits.mab.q_estimators.QEstMean(arms: int, qvals_init=None, reward_scaler: BaseScaler | BernoulliBinarizationScaler | None = None)
Bases:
QEstBaseQ estimator to estimate reward with sample average estimates.
- estimate(reward, action)
Estimate reward for the given arm/action.
- class ilovebandits.mab.q_estimators.QThompSamp(alphas: array, betas: array, reward_scaler: BaseScaler | BernoulliBinarizationScaler | None = None)
Bases:
QEstBaseQ estimator to estimate the reward for the Thompson Sampling agent assuming a bernourlli RV for the rewards. In this case, beta is the prior.
- estimate(reward, action)
Estimate reward for the given arm/action.
- get_expected_values()
Get expected values of the assumed distributions.
- sample_thetas()
Sample thetas from current distribution.
ilovebandits.mab.utils module
Utils functions for the package.
- ilovebandits.mab.utils.argmax(q_values: List) Tuple[int, float, List[int]]
Takes in a list of q_values and returns the index of the item with the highest value. Breaks ties randomly.
Returns
int - the index of the highest value in q_values. float - the probability of selecting the action. list[int] - the list of indices that are tied for the highest value.
- ilovebandits.mab.utils.find_max_indices(numbers: List) List
Returns a list with the index of the max number. In case of tie, the indexs of the tied numbers are returned.
- ilovebandits.mab.utils.find_max_numbers(numbers: List) List
Returns a list with the max number. In case of tie, the tied numbers are returned.