MDP Framework in python to take optimum decision

  keras, markov-decision-process, python, python-2.7

I am trying to model the following problem as a Markov decision process.

In a steel melting shop of a steel plant, iron pipes are used. These pipes generate rust over time. Adding an anti-rusting solution can delay the rusting process. If there is too much rust, we have to mechanically clean the pipe.

I have categorized the rusting states as StateA, StateB, StateC, StateD. StateE with increasing rusting from A to E . StateA is absolute clean state with almost no rust and StateE is the most rust / Degraded state


       StateA      -----> StateB ---> StateC ---> StateD --->  StateE
 ∆    ∆   ∆   ∆              |           |          |             | 
 |    |   |   |              |           |          |             | 
Mnt  Mnt Mnt  Mnt            |           |          |             | 
 |    |   |    |____clean____|           |          |             | 
 |    |   |_______clean__________________|          |             | 
 |    |_____________________________________________|             | 
 |                          clean                                 | 
 |________________________________________________________________|
                              clean

We can take any one of the following transitions in the above diagram:
StateB -> clean -> StateA
StateC -> clean -> StateA
StateD -> clean -> StateA
StateE -> clean -> StateA

We can take 3 possible actions:

  • No Maintenance
  • Clean
  • Adding Anti Rusting Agent

The transition probabilities are mentioned below: The states degrades from StateA to StateE. State degrades with rusting with certain amount of rust denoted by transition probabilities. Adding Anti Rusting Agent decreases the probabilty of degradation of state

The transition probabilities from StateA to StateB is 0.6 with No Maintenance
The transition probabilities from StateA to StateB is 0.5 with adding an anti-rusting agent.

The transition probabilities from StateB to StateC is 0.7 with No Maintenance
The transition probabilities from StateB to StateC is 0.6 with adding an anti-rusting agent.

The transition probabilities from StateC to StateD is 0.8 with No Maintenance.
The transition probabilities from StateC to StateD is 0.7 with an anti-rusting agent.

The transition probabilities from StateC to StateD is 0.8 with No Maintenance.
The transition probabilities from StateC to StateD is 0.7 with an anti-rusting agent.

Action clean will move any state to state Mnt with probability 1 .
State Mnt move to StateA (absolute clean state) with probabilty 1 .

Rewards for StateA is 0.6, StateB is 0.5, StateC is 0.4, StateD is 0.3, StateE is 0.2

Clean action lead to Maintenance (Mnt) state which has 0.1 reward. The Maintenance state will lead to increase in productivity after cleaning which is good, but there will be shutdown while Maintenance, so there will be loss of production. So reward is less.

It will be helpful if anyone can help me in developing a framework in python .

We can take any one of the following : 
StateB -> clean -> StateA
StateC -> clean -> StateA
StateD -> clean -> StateA
StateE -> clean -> StateA

The framework shall give output that can help in taking decision, whether to take clean action at StateB, StateC, StateD, StateE. We can use value iteration to generate optimal decision.

Source: Python Questions

LEAVE A COMMENT