TinyCodeTest
Python, RL evaluation, sandboxed execution, benchmarking, Vercel
Deterministic code-evaluation environment with sandboxed verifiers, pass@k scoring, and a browser-based eval runner.
Python, RL evaluation, sandboxed execution, benchmarking, Vercel
Deterministic code-evaluation environment with sandboxed verifiers, pass@k scoring, and a browser-based eval runner.
Python, Parquet, Arrow, SQL, data validation
Versioned Parquet data pipeline with provenance tracking, deterministic leakage audits, and drift gating.
C++, SIMD, multithreading, CMake, benchmarking
Compression benchmark suite implementing classic algorithms from scratch with SIMD acceleration and reproducible reporting.
C++, pthreads, AVX2, performance profiling
Thread-safe allocator with multiple allocation strategies and benchmark coverage against production-grade allocators.
Python, RL environments, deterministic verification, evaluation design
Deterministic race-strategy environment with verifiers, baselines, ablations, and stress tests for deeper reasoning.
Python, Flask, telemetry simulation, offline RL
F1 telemetry simulator and replay tooling for mini-season strategy analysis with offline RL baselines.
Deterministic verifiers, difficulty buckets, and multi-model pass@k benchmarking
Notes on how the TinyCodeTest evaluation stack turns model runs into reproducible, inspectable benchmark results.
Scenario design, deterministic grading, ablations, and stress tests for strategic reasoning
Experiment notes from building and evaluating an RL-style environment for race-strategy reasoning.
Repeatable measurement, SIMD-aware implementation work, and fair algorithm comparisons
A closer look at the benchmarking discipline behind Compression Bench.
Lead Developer, Tonfans
Led end-to-end delivery of a production Telegram mini app integrated with the TON blockchain.
Systems Engineer, Lifestores Pharmacy
Built real-time C++ systems and observability tooling for high-throughput order processing.
Backend Developer, Ultra Cloud Technologies
Built Python and Flask APIs, optimized PostgreSQL queries, and improved deployment reliability with Docker.