SpyWeb injects several global helper functions into the Lua environment for networking, notifications, logging, and storage.
Unlike languages where async requires await or
.then(), the async/sync difference here is purely technical and usually only matters inside defer. The only difference is that async bindings yield the Lua VM and won't block the thread while waiting for I/O, while sync bindings run to completion without yielding.
async - frees the thread to run other tasks while waiting
for I/O or network response. Your hook appears to block, but the runtime stays responsive.
sync - holds the thread until the binding finishes; only used for instant operations like computation or simple reads where holding the thread is not a concern.
Storage & Persistence
SpyWeb supports state management through both Runtime Memory
(transient)
and an Embedded Database (persistent). Memory is fast but resets on
reload; the database survives restarts.
💡 Race Condition Note: Both Runtime Memory and
Job-Local Storage are safe by default. Hooks for a single job are
execution-locked (sequential), so race conditions are impossible within a single
job.
Use global_store_incr strictly when you need to mutate state shared
across
multiple concurrent jobs.
| Function |
Description |
store_set(key, value) async |
Save a string (prefixed with job name) |
store_get(key) async |
Retrieve a string or nil |
store_delete(key) async |
Remove a key |
Example: Job-Scoped Counter
local c = tonumber(store_get("count") or "0")
store_set("count", tostring(c + 1))
| Function |
Description |
global_store_set(key, value) async |
Save a string (shared across all jobs) |
global_store_get(key) async |
Retrieve a shared string or nil |
global_store_delete(key) async |
Remove a shared key |
global_store_incr(k, def, delta) async |
Atomic shared increment across all jobs |
Example: Atomic Shared Logic
-- WRONG APPROACH (race condition prone)
local v = tonumber(global_store_get("visits") or "0")
global_store_set("visits", tostring(v + 1))
-- CORRECT APPROACH (race condition safe)
global_store_incr("visits", 0, 1)
⚠️ Resets on hot-reload or restart. Use for transient state
only.
Standard Lua variables persist in memory as long as the job is
active. Because each job runs in its own Isolated VM, these
variables are naturally job-scoped and cannot be accessed by other jobs.
SpyWeb also maintains
ctx.last_fetch on the per-cycle context, which is set on every fetch attempt and
persists for the duration of the cycle. It reflects the last successful or failed
fetch
snapshot for that cycle.
Example: Simple Memory Counter
-- This resets to 1 if you edit hooks.lua or restart SpyWeb
visit_count = (visit_count or 0) + 1
log("Session visit: " .. visit_count)
Example: Inspect the last fetch
in a
later hook
function before_webhook(payload, ctx)
if ctx.last_fetch and ctx.last_fetch.ok and ctx.last_fetch.response and ctx.last_fetch.response.body then
local count = 0
for _ in ctx.last_fetch.response.body:gmatch("promo") do
count = count + 1
end
payload.last_fetch_promo_hits = count
end
return payload
end
http_get(url, [headers]) async
Performs an HTTP GET request. Returns (response, nil) on success
or (nil, error_table) on failure. The headers
argument is an optional Lua table containing key-value pairs for request headers.
Success response fields:
| Field | Type | Description |
status | number | HTTP status code |
body | string | Response body (binary-safe) |
headers | table | Response headers as key-value pairs |
url | string | Final URL (after redirects) |
time_ms | number | Request duration in milliseconds |
size | number | Response body size in bytes |
Error table fields:
| Field | Type | Description |
error | string | Human-readable error message |
kind | string | Classified error kind: dns, timeout, proxy, tls, connect, size, http, or unknown |
proxy | string? | Proxy URL that failed (only present on proxy errors) |
local res, err = http_get("https://api.example.com", {
["Authorization"] = "Bearer token123",
["X-Custom-Header"] = "my-value"
})
if not res then
log("request failed: " .. err.error .. " (" .. err.kind .. ")")
return
end
http_post(url, body, [headers]) async
Performs an HTTP POST request. Returns (response, nil) on success
or (nil, error_table) on failure. The response and error
tables follow the same schema as http_get.
By default, it sets the
Content-Type to application/x-www-form-urlencoded unless
overridden in the headers argument.
local res, err = http_post("https://api.example.com", '{"foo":"bar"}', {
["Content-Type"] = "application/json",
["Accept"] = "application/json"
})
if not res then
log("post failed: " .. err.error)
return
end
http_request(options) async
Performs an arbitrary HTTP request with full control over method, URL, body,
headers, proxy, timeout, and response size limit. Returns (response, nil) on success
or (nil, error_table) on failure. The response and error
tables follow the same schema as http_get.
The options
argument is a Lua table with these fields:
| Field |
Type |
Required |
Description |
method |
string |
Yes |
HTTP method (GET, HEAD, POST, PUT, DELETE, etc.) |
url |
string |
Yes |
Target URL |
body |
string | nil |
No |
Request body (binary-safe, not used for GET/HEAD/DELETE) |
headers |
table | nil |
No |
Optional key-value header pairs |
proxy |
string | nil |
No |
Proxy URL (e.g. "http://user:pass@proxy:8080") |
timeout |
number | nil |
No |
Timeout in seconds (default: 30) |
max_body_size |
number | nil |
No |
Max response body in MB (integer, default: 10) |
-- HEAD request (no body)
local res, err = http_request({ method = "HEAD", url = "https://example.com" })
if not res then
log("head failed: " .. err.error)
return
end
-- POST with JSON body through a proxy
local res, err = http_request({
method = "POST",
url = "https://api.example.com/data",
body = '{"key":"value"}',
headers = { ["Content-Type"] = "application/json" },
proxy = "http://user:pass@residential-proxy:8080",
timeout = 15
})
if not res then
log("request failed: " .. err.kind)
return
end
http_multipart(url, fields, [headers]) async
Performs a multipart/form-data POST request for file uploads.
Returns (response, nil) on success or (nil, error_table)
on failure. The response and error tables follow the same schema as
http_get.
The fields argument is a Lua table where each value
is either:
- A string — sent as a text form field.
- A table — sent as a file attachment, with these keys:
content (required) — file bytes (binary-safe Lua string).
filename (optional) — the server-side file name.
type (optional) — MIME type (e.g. "image/png").
Use together with page:screenshot() and fs_read_binary() for
screenshot upload workflows.
-- Text fields only
local res, err = http_multipart("https://api.example.com/form", {
name = "SpyWeb",
version = "1.0"
})
if not res then
log("upload failed: " .. err.error)
return
end
-- File upload (screenshot)
local page = cdp.launch({ headless = true }):attach()
page:open("https://example.com")
page:screenshot("page.png", { full_page = true })
local image_data = fs_read_binary("page.png")
local res, err = http_multipart("https://hooks.example.com/upload", {
screenshot = { content = image_data, filename = "page.png", type = "image/png" },
caption = "my screenshot"
})
if not res then
log("screenshot upload failed: " .. err.kind)
return
end
db_query(sql, [params]) async
Executes a SELECT query on the SQLite database. Available only on the SQL (SQLite) build variant. Returns an array of row-tables with column-value pairs. Integer/Real/Text/Null SQL types map to Lua number/string/nil. On the KV (redb) variant, calling this returns a clear error directing you to download the SQLite build.
local rows = db_query("SELECT json FROM records WHERE job_id = ? LIMIT 5", { "my-job" })
for _, row in ipairs(rows) do
local data = json_decode(row.json)
log("Title: " .. data.title)
end
db_exec(sql, [params]) async
Executes an INSERT, UPDATE, DELETE, or DDL statement on the SQLite database. Available only on the SQL (SQLite) build variant. Returns the number of rows affected. On the KV (redb) variant, calling this returns a clear error directing you to download the SQLite build.
db_exec("INSERT OR REPLACE INTO markers (id, val) VALUES (?, ?)", { "page", 5 })
json_encode(table) sync
Converts a Lua table or value into a JSON string. Handles nested tables, arrays,
booleans, and nulls automatically using the built-in high-performance JSON
engine.
local payload = {
url = "https://example.com",
options = { waitUntil = "networkidle2" }
}
local body = json_encode(payload)
local resp = http_post("https://api.rendering-service.com/render", body, {["Content-Type"]="application/json"})
json_decode(string) sync
Parses a JSON string into a native Lua table. Returns nil and logs an
error
if the JSON is malformed.
local data = json_decode('{"status": "ok", "count": 42}')
log("Status is: " .. data.status)
defer(fn) sync
Registers a cleanup function to run immediately after the top-level Hook
exits. Essential for preventing zombie processes and memory leaks.
💡 Hook-Scoped: Cleanup runs when the main stage (e.g.,
before_fetch) finishes, even if defer was called inside a
helper function.
⚠️ Synchronous Only: You cannot use async functions (like
http_post or cdp.launch) inside defer. Resource
close methods (such as browser:close() or page:close()) are
synchronous and **fully safe** to use. For async orchestration and post-cycle tasks, use
defer.lua.
function override_fetch(request, ctx)
local browser = cdp.launch({ headless = true })
defer(function() browser:close() end) -- Always closes
local page = browser:attach()
page:open(request.url)
return { status = 200, body = page:content() }
end
View full Execution Lifecycle guide →
dump(value) sync
Formats any Lua value into a readable string for debugging. This is especially
useful for nested hook payloads like fetch_result or
ctx.last_fetch. It handles nested tables, quotes strings, and marks cycles
as <cycle> instead of recursing forever.
function after_fetch(fetch_result, ctx)
print(dump(fetch_result))
log(dump(ctx.last_fetch))
return fetch_result
end
copy(table) sync
Returns a shallow copy of a table. Only the first level of keys and values are
copied; nested tables are still shared by reference.
local original = { a = 1, b = { c = 2 } }
local shallow = copy(original)
shallow.a = 99 -- original.a is still 1
shallow.b.c = 99 -- original.b.c IS now 99
deep_copy(table) sync
Returns a deep, recursive copy of a table. Every nested table is cloned, ensuring
total isolation from the original. It safely handles circular references (cycles).
local original = { a = 1, b = { c = 2 } }
local deep = deep_copy(original)
deep.b.c = 99 -- original.b.c remains 2
notify(title, body, [timeout_ms]) async
Sends a desktop notification immediately from Lua. This is a thin wrapper around
SpyWeb's native notification sender and can be used from any hook, including error
paths that never reach the normal notify stage.
Headless environments: desktop notifications are typically
unavailable
on servers, cloud hosts, containers, and other headless environments. In those cases
notify(...) silently skips the notification and SpyWeb logs the
underlying
OS/backend error.
function after_fetch(fetch_result, ctx)
if not fetch_result.ok then
notify("Network error", fetch_result.error.message, 10000)
return nil
end
return fetch_result
end
log(message) async
Appends a message to hooks.log inside the job's directory. This is the
recommended way to debug hooks without using external libraries.
function after_extract(items, ctx)
log("Extracted " .. #items .. " items from " .. request.url)
return items
end
fs_append(filename, content) async
Appends raw content to a file in the job's directory. All writes are non-blocking and
safe.
Files feature automatic 10MB rotation with timestamped filenames
(e.g., data.20260514-143005.csv), keeping a history of 5 files.
function after_extract(items, ctx)
for _, item in ipairs(items) do
local row = string.format("%s,%s\n", item.fields.title, item.fields.price)
fs_append("data.csv", row)
end
return items
end
fs_overwrite(filename, content) async
Replaces the entire content of a file in the job's directory. Ideal for keeping a
"latest"
snapshot of your scraping results. Use shared/ prefix to write to the
project root's shared folder.
function after_fetch(fetch_result, ctx)
if fetch_result.response then
local etag = fetch_result.response.headers["ETag"]
if etag then
fs_overwrite("last_response.json", json_encode({ etag = etag }))
end
end
return fetch_result
end
fs_read(filename) async
Reads a file from the job's directory and returns its content as a string. Returns
nil if the file does not exist. Falls back to shared/ if
not found locally.
function before_fetch(request, ctx)
local cached = fs_read("last_response.json")
if cached then
local data = json_decode(cached)
request.headers["If-None-Match"] = data.etag
end
return request
end
fs_read_binary(filename) async
Reads a file from the job's directory and returns its content as a binary-safe
string (preserving all bytes). Returns nil if the file does not exist.
Falls back to shared/ if not found locally.
Unlike fs_read, this function handles non-UTF-8 files such as images
and other binary formats.
local png_data = fs_read_binary("screenshot.png")
local resp = http_multipart("https://api.example.com/upload", {
file = { content = png_data, filename = "page.png", type = "image/png" }
})
env_get(key) sync
Retrieves a read-only environment variable from the host system. For security, SpyWeb
automatically prepends the SPYWEB_ prefix to any key provided. This
allows
you to safely store secrets like API tokens outside of your configuration files.
System Setup: To access a variable in Lua, you must prefix it with
SPYWEB_ on your host system. For example:
export SPYWEB_API_KEY="secret-123"
Then in Lua, call
env_get("API_KEY").
function before_fetch(request, ctx)
local api_key = env_get("API_KEY") -- Reads SPYWEB_API_KEY
if api_key then
request.headers["Authorization"] = "Bearer " .. api_key
end
return request
end
ctx.selector_matches (Context)
A read-only number representing how many elements on the page matched the main
CSS selector defined in your config. This is set automatically
after the extraction phase and is available in after_extract and
all subsequent hooks via ctx.selector_matches.
function after_extract(items, ctx)
print("Found " .. #items .. " items (out of " .. ctx.selector_matches .. " selector hits)")
return items
end
ctx.last_fetch (Context)
A read-only FetchAttempt envelope representing the last network fetch, accessible via
ctx.last_fetch. Useful for inspecting fetch results in later hook stages.
Fields:
| Field | Type | Description |
ok | boolean | Whether the fetch succeeded |
request | table | url, method, headers, timeout, proxy, max_body_size |
response | table? | status, url, body, headers, time_ms, size, proxy (only on success) |
error | table? | message, kind (only on failure) |
function after_extract(items, ctx)
if ctx.last_fetch and ctx.last_fetch.ok then
log("Fetch was successful: " .. ctx.last_fetch.response.status)
end
return items
end
ctx.telemetry (Context)
A read-only table containing performance telemetry for the current scraping cycle, accessible via
ctx.telemetry. It is
initialized at cycle start and updated dynamically as each stage executes, allowing
scripts to inspect stage-by-stage resource usage and execution status.
Fields:
| Field | Type | Description |
job_name | string | Name of the job |
start_time | number | Epoch timestamp (seconds) when the cycle started |
total_duration_ms | number | Total duration of the cycle in milliseconds (set at end of run) |
stages | array | Sequence of stage records (see below) |
Stage record fields (entry in stages array):
| Field | Type | Description |
name | string | Stage name (e.g. before_fetch, fetch, override_extract) |
status | string | success, error, or inactive |
type | string | hook for Lua hooks, internal for engine-native stages |
duration_ms | number | Stage execution time in milliseconds |
lua_mem_bytes | number | Total Lua memory usage after the stage |
mem_delta_bytes | number | Change in Lua memory during this stage |
browsers | number | Number of active browser processes |
error | string? | Error message if the stage failed, nil otherwise |
function on_success(ctx)
if ctx.telemetry then
log("Job completed in " .. ctx.telemetry.total_duration_ms .. "ms")
for _, stage in ipairs(ctx.telemetry.stages) do
if stage.status == "error" then
log("Stage " .. stage.name .. " failed: " .. tostring(stage.error))
end
end
end
end