XCS Algorithms for a Linear Combination of Discounted and Undiscounted Reward Markovian Decision Processes

Maryam Moghimi

Mémoire de maîtrise (2018)

Document en libre accès dans PolyPublie

Affichage préliminaire

Libre accès au plein texte de ce document
Conditions d'utilisation: Tous droits réservés
Télécharger (2MB)

Afficher le résumé

Cacher le résumé

Résumé

Plusieurs études ont montré que combiner certains prédicteurs ensemble peut améliorer la justesse de la prédiction dans certains domaines comme la psychologie, les statistiques ou les sciences du management. Toutefois, aucune de ces études n'ont testé la combinaison de techniques d'apprentissage par renforcement. Notre étude vise à développer un algorithme basé sur deux algorithmes qui sont des formes approximatives d'apprentissage par renforcement répétés dans XCS. Cet algorithme, MIXCS, est une combinaison des techniques de Q-learning et de R-learning pour calculer la combinaison linéaire du payoff résultant des actions de l'agent, et aussi la correspondance entre la prédiction au niveau du système et la valeur réelle des actions de l'agent. MIXCS fait une prévision du payoff espéré pour chacune des actions disponibles pour l'agent. Nous avons testé MIXCS dans deux environnements à deux dimensions, Environment1 et Environment2, qui reproduisent les actions possibles dans un marché financier (acheter, vendre, ne rien faire) pour évaluer les performances d'un agent qui veut obtenir un profit espéré. Nous avons calculé le payoff optimal moyen dans nos deux environnements et avons comparé avec les résultats obtenus par MIXCS. Nous avons obtenu deux résultats. En premier, les résultats de MIXCS sont semblables au payoff optimal moyen pour Environments1, mais pas pour Environment2. Deuxièmement, l'agent obtient le payoff optimal moyen quand il prend l'action "vendre" dans les deux environnements.

Abstract

Many studies have shown that combining individual predictors improved the accuracy of predictions in different domains such as psychology, statistics and management sciences. However, these studies have not tested the combination of reinforcement learning techniques. This study aims to develop an algorithm based on two iterative approximate forms of reinforcement learning algorithm in XCS. This algorithm, named MIXCS, is a combination of Q-learning and R-learning techniques to compute the linear combination payoff and the correspondence between the system prediction and the action value. As such, MIXCS predicts the payoff to be expected for each possible action. We test MIXCS in two two-dimensional grids called Environment1 and Environment2, which represent financial markets actions of buying, selling and holding to evaluate the performance of an agent as a trader to gain the desired profit. We calculate the optimum average payoff to predict the value of the next movement in both Environment1 and Environment2 and compare the results with those obtained with MIXCS. The results show that the performance of MIXCS is close to optimum average reward in Environment1, but not in Environment2. Also, the agent reaches the maximum reward by taking selling actions in both Environments.

Département:	Département de mathématiques et de génie industriel
Programme:	Maîtrise recherche en génie industriel
Directeurs ou directrices:	Samuel Bassetto et Jean-Marc Frayret
URL de PolyPublie:	https://publications.polymtl.ca/3694/
Université/École:	École Polytechnique de Montréal
Date du dépôt:	22 févr. 2019 11:41
Dernière modification:	01 oct. 2024 00:46

Citer en APA 7:	Moghimi, M. (2018). XCS Algorithms for a Linear Combination of Discounted and Undiscounted Reward Markovian Decision Processes [Mémoire de maîtrise, École Polytechnique de Montréal]. PolyPublie. https://publications.polymtl.ca/3694/

Statistiques

Total des téléchargements à partir de PolyPublie

Téléchargements par année

Provenance des téléchargements

Actions réservées au personnel

Afficher document