AutoRegressive Bandits (ARBs) is a novel model of a sequential decision-making problem as an autoregressive (AR) process. In this online learning setting, the observed reward follows an autoregressive process, whose action parameters are unknown to the agent and create an AR dynamic that depends on actions the agent chooses. This study empirically demonstrates how assigning the extreme values of systemic stability indexes and other reward-governing parameters severely impairs the ARBs learning in the respective environment. We show that this algorithm suffers numerically larger regrets of higher forms under a weakly stable environment and a strictly exponential regret under the unstable environment over the considered optimization horizon. We also test ARBs against other bandit baselines in both weakly stable and unstable systems to investigate the deteriorating effect of dropping systemic stability on their performance and demonstrate the potential advantage of choosing other competing algorithms in case of weakened stability. Finally, we measure the discussed bandit under various assigned values of key input parameters to study how we can possibly improve this algorithm's performance under these extreme environmental conditions.