Blackjack Trained Agent
Today I am highlighting the python project that can be found at https://github.com/DrewGoldsberry/PythonBlackjackWithTransformerAgent
This project is a simple blackjack game that uses a transformer model to train an agent to play the game.
The agent is trained using reinforcement learning, where the agent learns to maximize its rewards by playing the game multiple times.
Seeing The End Result
Here you see a UI where an agent plays Blackjack by itself.
You are also able to play the game on your own. The UI isnt perfect and split has an issue where the player only plays the first split. You also are able to double after hitting. The agent cant do that but humans can.
Running the scripts Locally
To run the scripts locally, you will need to have python 3.10 or higher installed on your machine. Open up a shell
py -m pip install -r requirements.txt
This will install all the required packages to run the scripts.
Once you have the packages installed, you can launch the game with the trained agent.
py main.py
This will start the game and you will see the agent playing the game.
How to manually play the game:
You will need to make changes to the main.py file and uncomment the 2 lines on line 16 and 17

player= Player("Human")
ui = BlackjackUI(agent=player, is_agent=False)
This will allow you to play the game manually.
How to train the agent:
To train the agent, you will need to run the train_reinforce.py script.
py train_reinforce.py
This will train the train agent and save the model by default to ./Models/blackjack_agent_ep.pt
How to run the agent outside the environment:
To run the agent outside the environment, you will need to run the run_agent_example.py script.
py run_agent_example.py
You should see the following output:

How Does Rewards Work?
The agent is trained using reinforcement learning, where the agent learns to maximize its rewards by playing the game multiple times.
You can look at this PythonBlackjackWithTransformerAgent/action_rewards.py to see how each action is rewarded.
This is the code are the definitions for the rewards for the agent after a round based on some basic strategy rules. It only adds the reward on the last step when the player is of type AgentPlayer in the blackjack environment class.
The rewards are around playing by the book. There are probably better rewards you can choose I am new at this so probably making some mistakes.
The Transformer Agent
Here is the code that is the TransformerAgent class this is what ultimatly takes in a series of tokenized inputs and produces an output.
class TransformerAgent(nn.Module):
def __init__(self,
vocab_size=256,
d_model=128,
nhead=4,
num_layers=2,
max_seq_len=20,
num_actions=4):
super().__init__()
self.max_seq_len = max_seq_len
self.vocab_size = vocab_size
self.num_actions = num_actions
self.token_embedding = nn.Embedding(vocab_size, d_model)
self.pos_embedding = nn.Embedding(max_seq_len, d_model)
encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=nhead, batch_first=True)
self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
self.policy_head = nn.Sequential(
nn.Linear(d_model, d_model),
nn.ReLU(),
nn.Linear(d_model, num_actions)
)
...
I suggest pasting the file into GPT if you want a description and then you will be able to ask questions.
What I Might update in the Future
- Fix the split workflow, right now the agent only plays one hand and is rewarded for the single hand
- Explore increasing the size of the Agent's neural nets to see if results get better
- Rework the rewards that are calculate. Sometimes the agent loses but made all the right decisions.
- Another post if i can train a good enough agent of it playing live bj.
Feel free to make a pull request if you believe there are some better approaches in achieving the same goal.