Blog

Learning AI

Recent LLM releases such as Opus 4.5 and GPT-5.1 Codex (used via Codex CLI) have once again pushed the boundaries of what we thought possible. These models generate remarkably high-quality code and demonstrate capabilities that would have seemed out of...

Agentic Testing - The New Testing Approach

Over the last year or so, agentic coding has gone from novelty to serious contender for “default way of working” in many engineering teams. Tools like Claude Code, Cursor, Copilot or Codex are no longer just autocomplete toys – they’re...

Playwright MCP - Security Best Practices

Agentic tooling like Playwright MCP or Chrome DevTools MCP are the kind of thing that makes testers’ eyes light up: you point an AI at a browser, give it some tools, and it starts exploring your app, filing bugs, and...

Building RAG with Gemini File Search

Over the past months I’ve been sharing how I code with LLMs day-to-day and, crucially, how I test the systems based on them. Today I want to take that a step further and tie those strands together around a topic...

Testing LLM-based Systems

Large Language Models (and AI in general) broke the rules traditional QA was built on. Even at temperature=0, you can’t expect the same output twice. Under the hood, GPUs run thousands of floating-point operations in parallel; tiny rounding differences or...