Hotpatching ClickHouse in production with XRay
UD6.215 | Day 1 | 15:25 - 15:45 | Speakers: Pablo Marcos
Abstract
Ever been debugging a production issue and wished you'd added just one more log statement? Now you have to rebuild, wait for CI, deploy... all that time wasted. We've all been there, cursing our past selves.
We've integrated LLVM's XRay into ClickHouse to solve this. It lets us hot-patch running production systems to inject logging, profiling, and even deliberate delays into any function. No rebuild required.
XRay reserves space at function entry/exit that can be atomically patched with custom handlers at runtime. We built three handler types: LOG to add the trace points you forgot, SLEEP to reproduce (or prevent) timing-sensitive bugs, and PROFILE for deterministic profiling to complement our existing sampling profiler. The performance overhead when inactive is negligible.
Control is simple. Send a SQL query as SYSTEM INSTRUMENT ADD LOG 'QueryMetricLog::startQuery' 'This message will be logged at the start of the function' to patch the function instantly. Results show up in system.trace_log. Remove it just as easily when you're done.
I'll cover the integration challenges (ELF parsing, thread-safety, atomic patching), performance numbers (4-7% binary size, near-zero runtime cost), and real production war stories.
Attachments
Links
External Links
Notice: The placeholder video image is licensed under CC BY-SA 4.0. The original image can be found hereChanges made to the image are: Cropped the image to a new ratio, part of the image was cut off.
