13 agentic behavior runner v1 (#14)

* baseline setup of agent abstract

* feat: new implementation of simple AI agent that can follow a goal and return

* refactored import structure and created full tests

* pytest setup a github workflow to run tests + more ignores

* singularity for pushing

* fixing builds of PDFs

* inital structure of docs

* init styles and docs

* basic style implementation

* 13 create outline for research paper draft (#18)

* updated outline for paper from issue

* extra paper sections and some formalization of series data

* algorithms and acknowledgements

* updated outline for paper from issue

* Refactor docker-compose services to use individual Dockerfiles (#20)

* Initial plan

* Refactor services into individual Dockerfiles

Co-authored-by: velocitatem <60182044+velocitatem@users.noreply.github.com>

* Add EXPOSE directives to all Dockerfiles with port documentation

Co-authored-by: velocitatem <60182044+velocitatem@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: velocitatem <60182044+velocitatem@users.noreply.github.com>

* 2 nextjs scaffold with store mode shop and admin session experiment wiring event emission v1 (#17)

* chore: cleaning gitignore

* formating and env documentation

* feat: context switching of hotel/airline depndent on env var via middleware

* fixed alignment and building

* wrong file

* prods

* fixed applying style

* better session cookie management

* tentative session storage with maybe using airtable

* migrated api of ingestion

* events and products apge

* fixing build

* 13 create outline for research paper draft (#18)

* updated outline for paper from issue

* extra paper sections and some formalization of series data

* algorithms and acknowledgements

* updated outline for paper from issue

* upadted text formating

* event unification

* refactor tracking to ues callbacks instead of refs

* implement a pricing display api with session passing

* moved middleware to proxy according to new changes in Nextjs

* refactoed kafka ingestion to go via backend not web-db

* Refactor docker-compose services to use individual Dockerfiles (#20)

* Initial plan

* Refactor services into individual Dockerfiles

Co-authored-by: velocitatem <60182044+velocitatem@users.noreply.github.com>

* Add EXPOSE directives to all Dockerfiles with port documentation

Co-authored-by: velocitatem <60182044+velocitatem@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: velocitatem <60182044+velocitatem@users.noreply.github.com>

* fixing small bugs and adding exepriments to tracking

* added some doc

* fixing prod

* prod kafka server logging

* topic auto create

* pytest setup a github workflow to run tests + more ignores

* getting data from agents properly

* proper pipeline to handle data and build matrices

* fixing backend dumping

* fixing agents and ignore

* fixing import for tests

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
This commit is contained in:
Daniel Alves Rösel
2025-11-15 16:16:01 +01:00
committed by GitHub
parent 9bb6f842f4
commit ab8b8787a8
16 changed files with 955 additions and 705 deletions

30
.github/workflows/pytest.yml vendored Normal file
View File

@@ -0,0 +1,30 @@
name: Run Tests
on:
push:
paths:
- 'experiments/**'
- 'backend/**'
- 'requirements.txt'
- '.github/workflows/pytest.yml'
pull_request:
paths:
- 'experiments/**'
- 'backend/**'
- 'requirements.txt'
- '.github/workflows/pytest.yml'
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.13'
cache: 'pip'
- name: Install dependencies
run: |
python -m venv .venv
.venv/bin/pip install --upgrade pip
.venv/bin/pip install -r requirements.txt
- name: Run tests
run: .venv/bin/pytest -v

8
.gitignore vendored
View File

@@ -1,6 +1,8 @@
**/.env
**/.venv
PHANTOM.wiki/
**/__pycache__
**/.ipynb_checkpoints/
**/.virtual_documents/
**/__pycache__/
**/.ipynb_checkpoints/
**/session_*.svg
**/*graph.svg
paper/src/bib/auto

View File

@@ -4,6 +4,10 @@ BUILDDIR := build
TEX := main.tex
JOBNAME := main
PDF := paper/$(BUILDDIR)/$(JOBNAME).pdf
VENV := .venv
PYTHON := $(VENV)/bin/python
PIP := $(VENV)/bin/pip
PYTEST := $(VENV)/bin/pytest
.DEFAULT_GOAL := help
@@ -35,5 +39,14 @@ clean:
$(LATEXMK) -C -jobname=$(JOBNAME) -outdir=../$(BUILDDIR) || true
rm -rf paper/$(BUILDDIR)/*
$(VENV):
python3 -m venv $(VENV)
$(PIP) install --upgrade pip
.PHONY: all pdf clean watch run.webapp
install: $(VENV)
$(PIP) install -r requirements.txt
test: $(VENV)
$(PYTEST) -v
.PHONY: all pdf clean watch run.webapp install test

View File

@@ -7,7 +7,7 @@ import uvicorn
import os
import json
from datetime import datetime
from kafka import KafkaProducer, KafkaAdminClient
from kafka import KafkaProducer, KafkaAdminClient, KafkaConsumer
from kafka.admin import NewTopic
from kafka.errors import TopicAlreadyExistsError
from dotenv import load_dotenv
@@ -22,7 +22,7 @@ def get_producer() -> KafkaProducer:
global _producer
if _producer is None:
host = os.getenv('KAFKA_HOST', 'localhost')
port = os.getenv('KAFKA_PORT', '29092') # use internal broker port
port = os.getenv('KAFKA_PORT', '9092')
broker = f'{host}:{port}' if port else host
print(f"[KAFKA_INIT] Connecting to broker: {broker}")
_producer = KafkaProducer(
@@ -61,7 +61,7 @@ app.add_middleware(
async def startup_event():
"""create kafka topics on startup"""
host = os.getenv('KAFKA_HOST', 'localhost')
port = os.getenv('KAFKA_PORT', '29092')
port = os.getenv('KAFKA_PORT', '9092')
broker = f'{host}:{port}'
try:
@@ -125,10 +125,62 @@ async def ingest_logs(event: EventPayload):
raise HTTPException(status_code=500, detail=str(e))
@app.get("/api/kafka/dump")
def dump_logs():
# TODO: implement a dump of logs of time period t_start to t_end (params of get)
# OR: allow for params of last_n logs as a param - creating two modes of the dumping
pass
def dump_logs(
last_n: Optional[int] = None,
t_start: Optional[str] = None,
t_end: Optional[str] = None
):
"""dump all messages from user-interactions topic
params:
last_n: return only last n messages (default: all)
t_start: filter by start timestamp iso format (future use)
t_end: filter by end timestamp iso format (future use)
"""
host = os.getenv('KAFKA_HOST', 'localhost')
port = os.getenv('KAFKA_PORT', '9092')
broker = f'{host}:{port}'
try:
consumer = KafkaConsumer(
'user-interactions',
bootstrap_servers=[broker],
auto_offset_reset='earliest',
enable_auto_commit=False,
value_deserializer=lambda x: json.loads(x.decode('utf-8')),
consumer_timeout_ms=5000
)
events = []
for msg in consumer:
events.append(msg.value)
consumer.close()
# apply filters
if t_start or t_end:
# filter by timestamp range if provided
filtered = []
for e in events:
ts = e.get('ts')
if ts:
if t_start and ts < t_start:
continue
if t_end and ts > t_end:
continue
filtered.append(e)
events = filtered
if last_n and last_n > 0:
events = events[-last_n:]
return {"success": True, "count": len(events), "data": events}
except Exception as e:
import traceback
print(f"[DUMP_ERROR] {e}")
print(traceback.format_exc())
raise HTTPException(status_code=500, detail=str(e))

0
experiments/__init__.py Normal file
View File

View File

@@ -0,0 +1 @@
"""Agentic behavior runner for PHANTOM research platform."""

View File

@@ -0,0 +1,44 @@
from .base import Agent as BaseAgent
from browser_use import Browser, Agent, ChatOpenAI
from enum import Enum
class AgentTypes(str, Enum):
GENERIC_BROWSER_USE_AGENT = "generic_browser_use_agent"
def _build_prompt(goal : str, environment_url : str) -> str:
#TODO: Improve prompt engineering here and experiment with
return f"""You are an autonomous agent tasked with achieving the following goal: {goal}
You have access to a web browser to interact with the environment at {environment_url}.
Use the browser to navigate, gather information, and perform actions necessary to accomplish your goal.
Be thorough and ensure you complete the task fully."""
class GenericBrowserUseAgent(BaseAgent):
def __init__(self,
goal: str,
url: str = "http://localhost:3000",
timeout: int = 300,
llm_model: str = "gpt-5-mini",
headless: bool = True):
super().__init__(goal, url, timeout)
self.llm_model = ChatOpenAI(model=llm_model)
self.browser = Browser(headless=headless)
self.agent = Agent(task=_build_prompt(goal, url),
llm=self.llm_model,
browser=self.browser)
async def act(self) -> str:
self.result = await self.agent.run()
# https://github.com/browser-use/browser-use/blob/main/browser_use/agent/views.py#L301
return self.result.final_result()
def get_agent(agent_type: AgentTypes, **kwargs) -> Agent:
if agent_type == AgentTypes.GENERIC_BROWSER_USE_AGENT:
return GenericBrowserUseAgent(**kwargs)
else:
raise ValueError(f"Unknown agent type: {agent_type}")
if __name__ == "__main__":
import asyncio
JTBD= "Name all the products on this site and try to find out more about each product by clicking into them (they might not open)"
agent = get_agent(AgentTypes.GENERIC_BROWSER_USE_AGENT, goal=JTBD, url="http://localhost:3000/products", timeout=300)
R=asyncio.run(agent.act())
print(R)

View File

@@ -0,0 +1,19 @@
from abc import ABC, abstractmethod
from typing import Optional
class Agent(ABC):
"""Base interface for browser automation agents"""
def __init__(self, goal: str, url: str = "http://localhost:3000", timeout: int = 300):
self.goal = goal
self.url = url
self.timeout = timeout
self.result: Optional[str] = None
@abstractmethod
async def act(self) -> str:
"""Execute goal and return result text"""
pass
def final_result(self) -> Optional[str]:
return self.result

View File

@@ -0,0 +1,30 @@
import pytest
import asyncio
from experiments.agents.agent import get_agent, AgentTypes
import os
def test_agent_init():
agent = get_agent(AgentTypes.GENERIC_BROWSER_USE_AGENT, goal="test", url="http://example.com", timeout=100)
assert agent.goal == "test"
assert agent.url == "http://example.com"
assert agent.timeout == 100
def test_invalid_agent():
with pytest.raises(ValueError):
get_agent("invalid", goal="test")
@pytest.mark.asyncio
@pytest.mark.skipif("OPENAI_API_KEY" not in os.environ, reason="OPENAI_API_KEY not set")
async def test_agent_execution():
agent = get_agent(AgentTypes.GENERIC_BROWSER_USE_AGENT, goal="get page title", url="https://example.com", timeout=60)
result = await agent.act()
assert result
assert agent.final_result()
assert agent.final_result().history[-1].result[-1].is_done == True
assert isinstance(result, str)
assert "example" in result.lower()
assert len(result) > 0

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,84 @@
import pandas as pd
import json
import numpy as np
import os
import requests
from dotenv import load_dotenv
from sklearn.base import BaseEstimator, TransformerMixin
load_dotenv()
BACKEND_URL = os.getenv("BACKEND_URL", "http://localhost:5000")
N_PRICE_BUCKETS = 5
def get_data_from_kafka() -> pd.DataFrame:
"""fetch all events from backend dump endpoint"""
resp = requests.get(f"{BACKEND_URL}/api/kafka/dump")
resp.raise_for_status()
data = resp.json()
if not data.get('success') or not data.get('data'):
return pd.DataFrame()
df = pd.DataFrame(data['data'])
# explode metadata col json
if 'metadata' in df.columns:
df = df.join(pd.json_normalize(df.pop("metadata"), sep=".").add_prefix("metadata_"))
df = df.dropna(subset=['eventName'])
return df
def join_with_experiments(df: pd.DataFrame) -> pd.DataFrame:
# TODO: Get experiments db from supabase and join on session_id
return df
def augment_event_titles(df: pd.DataFrame) -> pd.DataFrame:
# from taking standard view_item_page in eventName to view_item_page_{metadata_schema}
# we want metadata schema to create product specific event names
# only create price buckets if we have enough unique prices
if df["metadata_price"].notnull().sum() > 0:
try:
price_buckets = pd.qcut(
df["metadata_price"],
q=N_PRICE_BUCKETS,
labels=[f"PB_{i+1}" for i in range(N_PRICE_BUCKETS)],
duplicates='drop' # handle duplicate bin edges
)
except ValueError:
# fallback: if still not enough unique values, use cut with fixed ranges or just use raw price
price_buckets = df["metadata_price"].apply(lambda x: f"P_{int(x)}" if pd.notnull(x) else "")
else:
price_buckets = pd.Series([""] * len(df), index=df.index)
# metadata_schema: _product_id@price_bucket_{i} only if we have product metadata otherswise keep original event name
# TODO: make this adaptive, if we have hover_over_title we append the title, if its view_page we say which page
df["metadata_schema"] = np.where(
df["productId"].notnull() & df["metadata_price"].notnull(),
"_" + df["productId"].astype(str) + "@" + price_buckets.astype(str),
""
)
df["eventName"] = df["eventName"] + df["metadata_schema"].astype(str)
return df
def extract() -> pd.DataFrame:
df = get_data_from_kafka()
df = join_with_experiments(df)
df = augment_event_titles(df)
return df
class DataExtractor(BaseEstimator, TransformerMixin):
def fit(self, X=None, y=None):
return self
def transform(self, X=None):
return extract()
if __name__ == "__main__":
df = extract()
print(df.head())
print(df.tail())
print(df.info())

View File

@@ -0,0 +1,158 @@
import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
def build_transition_prob_matrix(df: pd.DataFrame):
df = df.dropna(subset=['eventName'])
events = df['eventName'].tolist()
labels = pd.Index(events).unique().tolist()
idx = {e:i for i,e in enumerate(labels)}
M = np.zeros((len(labels), len(labels)), dtype=float)
for a, b in zip(events, events[1:]):
M[idx[a], idx[b]] += 1
row_sums = M.sum(axis=1, keepdims=True)
with np.errstate(divide='ignore', invalid='ignore'):
P = np.divide(M, row_sums, where=row_sums>0) # row-normalized
return P, labels
# https://medium.com/data-science/time-series-data-markov-transition-matrices-7060771e362b
from graphviz import Digraph
import numpy as np
import pandas as pd
def _as_prob_df(matrix, labels=None):
"""Return a square DataFrame with index=columns=labels."""
if isinstance(matrix, pd.DataFrame):
# Ensure square and aligned
assert (matrix.index == matrix.columns).all(), "Index/columns must match."
return matrix
matrix = np.asarray(matrix, dtype=float)
assert matrix.shape[0] == matrix.shape[1], "Matrix must be square."
if labels is None:
raise ValueError("labels are required when matrix is not a DataFrame")
assert len(labels) == matrix.shape[0], "labels length must match matrix size."
return pd.DataFrame(matrix, index=list(labels), columns=list(labels))
def _df_to_edgelist(P: pd.DataFrame, threshold=0.0, round_digits=2):
"""Build weighted edges > threshold."""
edges = []
for src in P.index:
for dst in P.columns:
w = float(P.loc[src, dst])
if w > threshold:
edges.append((str(src), str(dst), f"{w:.{round_digits}f}"))
return edges
def render_graph(fname, matrix, ls_index=None, threshold=0.0, fmt="svg", view=False):
"""
fname: output file stem (no extension)
matrix: NumPy array or pandas DataFrame of transition PROBABILITIES
ls_index: ordered labels (required if matrix is not a DataFrame)
threshold: hide edges with weight <= threshold
fmt: 'svg'|'png'|'pdf' etc.
view: open after rendering
"""
P = _as_prob_df(matrix, labels=ls_index)
edges = _df_to_edgelist(P, threshold=threshold)
g = Digraph(format=fmt)
g.attr(rankdir="LR", size="30")
g.attr("node", shape="circle")
# ensure isolated nodes appear
for node in P.index:
g.node(str(node), width="1", height="1")
for src, dst, label in edges:
g.edge(src, dst, label=label)
g.render(fname, view=view, cleanup=True)
return g
class TransitionProbMatrixTransformer(BaseEstimator, TransformerMixin):
def __init__(self, threshold=0.0):
self.threshold = threshold
self.P_ = None
self.labels_ = None
def fit(self, X: pd.DataFrame, y=None):
P, labels = build_transition_prob_matrix(X)
self.P_ = P
self.labels_ = labels
return self
def transform(self, X: pd.DataFrame = None):
return self.P_, self.labels_
def render(self, fname: str, fmt="svg", view=False):
if self.P_ is None or self.labels_ is None:
raise ValueError("Transformer has not been fitted yet.")
return render_graph(
fname,
self.P_,
ls_index=self.labels_,
threshold=self.threshold,
fmt=fmt,
view=view
)
class SessionTransitionProbMatrixTransformer(BaseEstimator, TransformerMixin):
def __init__(self, threshold=0.0, session_col='sessionId'):
self.threshold = threshold
self.session_col = session_col
self.session_matrices_ = None
def fit(self, X: pd.DataFrame, y=None):
if self.session_col not in X.columns:
raise ValueError(f"Column '{self.session_col}' not found in DataFrame")
session_matrices = {}
for session_id, grp in X.groupby(self.session_col):
if len(grp) > 1: # need at least 2 events for transitions
P, labels = build_transition_prob_matrix(grp)
session_matrices[session_id] = {'matrix': P, 'labels': labels}
self.session_matrices_ = session_matrices
return self
def transform(self, X: pd.DataFrame = None):
if self.session_matrices_ is None:
raise ValueError("Transformer has not been fitted yet.")
return pd.Series(self.session_matrices_)
def render_session(self, session_id: str, fname: str, fmt="svg", view=False):
if self.session_matrices_ is None:
raise ValueError("Transformer has not been fitted yet.")
if session_id not in self.session_matrices_:
raise ValueError(f"Session '{session_id}' not found in fitted data.")
sess_data = self.session_matrices_[session_id]
return render_graph(
fname,
sess_data['matrix'],
ls_index=sess_data['labels'],
threshold=self.threshold,
fmt=fmt,
view=view
)
if __name__ == "__main__":
# Example usage
data = {
'eventName': [
'A', 'B', 'A', 'C', 'B', 'A', 'A', 'C', 'B', 'C',
'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A'
]
}
df = pd.DataFrame(data)
transformer = TransitionProbMatrixTransformer(threshold=0.1)
transformer.fit(df)
P, labels = transformer.transform(None)
print("Transition Probability Matrix:")
print(pd.DataFrame(P, index=labels, columns=labels))
# Render the graph
transformer.render("transition_graph", fmt="svg", view=False)

View File

@@ -0,0 +1,19 @@
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from extract import DataExtractor
from mapping import SessionTransitionProbMatrixTransformer, render_graph
if __name__ == "__main__":
steps = [
('data_extraction', DataExtractor()),
('transition_matrix', SessionTransitionProbMatrixTransformer(threshold=0.05)),
]
pipeline = Pipeline(steps)
result = pipeline.fit_transform(None)
print(f"Number of sessions: {len(result)}\n")
for session_id, sess_data in result.items():
fname = f"session_{session_id}"
render_graph(fname, sess_data['matrix'], ls_index=sess_data['labels'], threshold=0.05, fmt="svg", view=False)
print(f"Rendered {fname}.svg")

View File

@@ -16,10 +16,11 @@ mkdir -p "$(dirname "$OUTPUT_FILE")"
add_file() {
local filepath="$1"
local relpath="${filepath#$PROJECT_ROOT/}"
local escaped_path="${relpath//_/\\_}"
# Add section header and code listing (no language-specific highlighting)
echo "\\subsection{${relpath}}" >> "$OUTPUT_FILE"
echo "\\begin{lstlisting}[caption={${relpath}}]" >> "$OUTPUT_FILE"
echo "\\subsection{${escaped_path}}" >> "$OUTPUT_FILE"
echo "\\begin{lstlisting}[caption={${escaped_path}}]" >> "$OUTPUT_FILE"
cat "$filepath" >> "$OUTPUT_FILE"
echo "" >> "$OUTPUT_FILE"
echo "\\end{lstlisting}" >> "$OUTPUT_FILE"

7
pytest.ini Normal file
View File

@@ -0,0 +1,7 @@
[pytest]
testpaths = experiments
python_files = test*.py
python_classes = Test*
python_functions = test_*
asyncio_mode = auto
asyncio_default_fixture_loop_scope = function

View File

@@ -5,3 +5,8 @@ jupyter
ipykernel
matplotlib
graphviz
browser-use
pytest
pytest-asyncio
uv
scikit-learn