TinyCodeTest
Python, RL evaluation, sandboxed execution, benchmarking, Vercel
Deterministic code-evaluation environment with sandboxed verifiers, pass@k scoring, and a browser-based eval runner.
Python, RL evaluation, sandboxed execution, benchmarking, Vercel
Deterministic code-evaluation environment with sandboxed verifiers, pass@k scoring, and a browser-based eval runner.
Python, Parquet, Arrow, SQL, data validation
Versioned Parquet data pipeline with provenance tracking, deterministic leakage audits, and drift gating.
C++, SIMD, multithreading, CMake, benchmarking
Compression benchmark suite implementing classic algorithms from scratch with SIMD acceleration and reproducible reporting.
C++, pthreads, AVX2, performance profiling
Thread-safe allocator with multiple allocation strategies and benchmark coverage against production-grade allocators.
Python, RL environments, deterministic verification, evaluation design
Deterministic race-strategy environment with verifiers, baselines, ablations, and stress tests for deeper reasoning.