SMERL classes for DIAYN and DADS¶
Bases: DIAYN
DIAYNSMERL refers to a family of methods that combine the DIAYN's diversity
reward with some environment extrinsic reward, using SMERL method, see
https://arxiv.org/abs/2010.14484.
Most methods are inherited from the DIAYN algorithm, the only change is the
way the reward is computed (a combination of the DIAYN reward and
the extrinsic reward).
Source code in qdax/baselines/diayn_smerl.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
update(training_state, replay_buffer)
¶
Performs a training step to update the policy, the critic and the discriminator parameters.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Source code in qdax/baselines/diayn_smerl.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
Bases: DADS
DADSSMERL refers to a family of methods that combine the DADS's diversity
reward with some environment extrinsic reward, using the proper SMERL method,
see https://arxiv.org/abs/2010.14484.
Most of the methods are inherited from the DADS algorithm, the only change is
the way the reward is computed (a combination of the DADS reward and the extrinsic
reward).
Source code in qdax/baselines/dads_smerl.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
update(training_state, replay_buffer)
¶
Performs a training step to update the policy, the critic and the dynamics network parameters.
| Parameters: |
|
|---|
| Returns: |
|
|---|
Source code in qdax/baselines/dads_smerl.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |