Job description:
- Designing and deploying RL environments for large-scale agent evaluation and reinforcement learning experiments.
- Create pipelines for task generation, dynamic datasets, and scripted environments with controlled complexity and stochasticity.
- Develop validators and reward models to automatically evaluate trajectories and assess model inference.
- Collaborate with infrastructure and systems engineers to ensure scalability, reproducibility, and equip environments with tools for detailed telemetry.
- Design API interfaces and orchestration structures for running, resetting, and evaluating agents in various environments.
- Optimization of environment performance, reward logging, and reproducibility in distributed configurations.
Desired skills & experience:
- Over 5 years of experience in software engineering in Python.
- At least 3 years of experience in the position of Data Scientist, Machine Learning/Environment Engineering.
- Working hours from 2:00 PM to 10:00 PM.
- Practical knowledge of AI frameworks (Langchain, Langraph, mcp-server).
- Extensive practical experience in working with artificial intelligence, including instant engineering and climate coding.
We offer:
- Attractive salaries
- Possibility of full remote work
- Participation in interesting projects
Good to have:
- Knowledge of the Code of Conduct or Claude's Code.
- Experience in integrating artificial intelligence with the system will be an additional asset.
- Understanding of RL concepts - reward modeling, environmental dynamics,
verifiability, evaluation, and agent interaction loops.
- Knowledge of tools, metrics, and data channels for evaluating RL.
- Expertise in planning own work.




















































