|
Hindsight Experience Replay
|
OpenAI
|
2017-07-05 07:00
|
2026-02-28 05:56
|
—
|
|
Teacher–student curriculum learning
|
OpenAI
|
2017-07-01 07:00
|
2026-02-28 05:56
|
—
|
|
Faster physics in Python
|
OpenAI
|
2017-06-28 07:00
|
2026-02-28 05:56
|
—
|
|
Report from the self-organizing conference
|
OpenAI
|
2016-10-13 07:00
|
2026-02-28 05:56
|
—
|
|
Learning to cooperate, compete, and communicate
|
OpenAI
|
2017-06-08 07:00
|
2026-02-28 05:56
|
—
|
|
UCB exploration via Q-ensembles
|
OpenAI
|
2017-06-05 07:00
|
2026-02-28 05:56
|
—
|
|
Unsupervised sentiment neuron
|
OpenAI
|
2017-04-06 07:00
|
2026-02-28 05:56
|
—
|
|
Prediction and control with temporal segment models
|
OpenAI
|
2017-03-12 08:00
|
2026-02-28 05:56
|
—
|
|
OpenAI Baselines: DQN
|
OpenAI
|
2017-05-24 07:00
|
2026-02-28 05:56
|
—
|
|
Robots that learn
|
OpenAI
|
2017-05-16 07:00
|
2026-02-28 05:56
|
—
|
|
Equivalence between policy gradients and soft Q-learning
|
OpenAI
|
2017-04-21 07:00
|
2026-02-28 05:56
|
—
|
|
Stochastic Neural Networks for hierarchical reinforcement learning
|
OpenAI
|
2017-04-10 07:00
|
2026-02-28 05:56
|
—
|
|
Spam detection in the physical world
|
OpenAI
|
2017-04-01 07:00
|
2026-02-28 05:56
|
—
|
|
Evolution strategies as a scalable alternative to reinforcement learning
|
OpenAI
|
2017-03-24 07:00
|
2026-02-28 05:56
|
—
|
|
Distill
|
OpenAI
|
2017-03-20 07:00
|
2026-02-28 05:56
|
—
|
|
Learning to communicate
|
OpenAI
|
2017-03-16 07:00
|
2026-02-28 05:56
|
—
|
|
Third-person imitation learning
|
OpenAI
|
2017-03-06 08:00
|
2026-02-28 05:56
|
—
|
|
Adversarial attacks on neural network policies
|
OpenAI
|
2017-02-08 08:00
|
2026-02-28 05:56
|
—
|
|
Team update
|
OpenAI
|
2017-01-30 08:00
|
2026-02-28 05:56
|
—
|
|
Faulty reward functions in the wild
|
OpenAI
|
2016-12-21 08:00
|
2026-02-28 05:56
|
—
|