mirror of
https://github.com/velocitatem/PHANTOM.git
synced 2026-05-31 08:33:36 +00:00
chore: updating datset card with releveant updates nad data
This commit is contained in:
@@ -17,64 +17,107 @@ size_categories:
|
|||||||
- 1K<n<10K
|
- 1K<n<10K
|
||||||
---
|
---
|
||||||
|
|
||||||
# Dataset Card for whoclickedit
|
<img align="right" width="280" src="https://raw.githubusercontent.com/velocitatem/PHANTOM/main/docs/static/images/banner.svg" alt="PHANTOM research banner" />
|
||||||
|
|
||||||
## Dataset Summary
|
# [whoclickedit](https://huggingface.co/datasets/velocitatem/whoclickedit)
|
||||||
whoclickedit is an event-level behavioral dataset for human versus agent interaction analysis in dynamic pricing experiments.
|
|
||||||
It merges interaction logs and price quote logs into one flat CSV (`whoclicked.csv`) with explicit labels for actor type.
|
|
||||||
|
|
||||||
## Dataset Snapshot
|
[](https://huggingface.co/datasets/velocitatem/whoclickedit)
|
||||||
- Rows: `3838`
|

|
||||||
- Columns: `42`
|

|
||||||
- Time range (UTC): `2025-12-05T09:43:31.301000+00:00` to `2026-02-28T19:32:06.444000+00:00`
|

|
||||||
- Unique sessions by actor:
|

|
||||||
- `agent`: 7
|

|
||||||
- `human`: 25
|

|
||||||
- Rows by actor:
|
|
||||||
- `agent`: 3076
|
> **Event-level behavior data for dynamic pricing research.**
|
||||||
- `human`: 762
|
> This dataset captures how humans and automated agents browse, query prices, and move through the PHANTOM storefronts during controlled experiments.
|
||||||
- Rows by record type:
|
|
||||||
- `price_log`: 3331
|
## What this dataset gives you
|
||||||
- `interaction`: 507
|
|
||||||
- Rows by actor x record type:
|
- A single flat file (`whoclicked.csv`) with both interaction and price-log events.
|
||||||
- `agent` / `interaction`: 197
|
- Explicit labels for actor origin: `actor_type` and `is_agent`.
|
||||||
- `agent` / `price_log`: 2879
|
- Provenance fields from Kafka envelopes when available.
|
||||||
- `human` / `interaction`: 310
|
- Metadata flattened into feature-ready `metadata_*` columns.
|
||||||
- `human` / `price_log`: 452
|
|
||||||
- Store modes:
|
## Snapshot
|
||||||
- `hotel`: 3592
|
|
||||||
- `airline`: 196
|
| Metric | Value |
|
||||||
- `shop`: 50
|
| --- | --- |
|
||||||
|
| Rows | `3874` |
|
||||||
|
| Columns | `42` |
|
||||||
|
| Time range (UTC) | `2025-12-05T09:43:31.301000+00:00` -> `2026-03-23T12:08:30.151000+00:00` |
|
||||||
|
| Unique sessions | `36` |
|
||||||
|
|
||||||
|
## Composition
|
||||||
|
|
||||||
|
### Rows by actor
|
||||||
|
| Actor | Rows | Share |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `human` | 798 | 20.6% |
|
||||||
|
| `agent` | 3076 | 79.4% |
|
||||||
|
|
||||||
|
### Rows by actor and record type
|
||||||
|
| Actor | Record type | Rows |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `agent` | `interaction` | 197 |
|
||||||
|
| `agent` | `price_log` | 2879 |
|
||||||
|
| `human` | `interaction` | 328 |
|
||||||
|
| `human` | `price_log` | 470 |
|
||||||
|
|
||||||
|
### Store mode coverage
|
||||||
|
| Store mode | Rows |
|
||||||
|
| --- | --- |
|
||||||
|
| `hotel` | 3628 |
|
||||||
|
| `airline` | 196 |
|
||||||
|
| `shop` | 50 |
|
||||||
|
|
||||||
|
### Top interaction events
|
||||||
|
| Interaction event | Count |
|
||||||
|
| --- | --- |
|
||||||
|
| `page_view` | 246 |
|
||||||
|
| `learn_more_about_item` | 91 |
|
||||||
|
| `view_item_page` | 88 |
|
||||||
|
| `add_item_to_cart` | 47 |
|
||||||
|
| `hover_over_title` | 23 |
|
||||||
|
| `checkout_start` | 20 |
|
||||||
|
| `hover_over_paragraph` | 6 |
|
||||||
|
| `remove_item` | 4 |
|
||||||
|
|
||||||
|
## Collection pipeline
|
||||||
|
|
||||||
|
Data is sourced from two roots inside PHANTOM:
|
||||||
|
|
||||||
## Source and Processing
|
|
||||||
Data is collected from two local roots in the PHANTOM project:
|
|
||||||
- `experiments/collected_data` (human sessions)
|
- `experiments/collected_data` (human sessions)
|
||||||
- `experiments/agents/collected_data` (agent sessions)
|
- `experiments/agents/collected_data` (agent sessions)
|
||||||
|
|
||||||
Each session folder contains:
|
Each session directory contains:
|
||||||
- `int.json` (interaction events)
|
|
||||||
- `price.json` (price quote logs)
|
|
||||||
|
|
||||||
The ETL does the following:
|
- `int.json`: user interaction events
|
||||||
- Normalizes both Kafka-envelope and flat payload formats
|
- `price.json`: price quote observations
|
||||||
- Flattens nested metadata fields into `metadata_*` columns
|
|
||||||
- Preserves all raw rows (no deduplication)
|
ETL behavior:
|
||||||
- Adds labels:
|
|
||||||
- `actor_type` in `{human, agent}`
|
1. Accepts both Kafka-envelope records and flat payload records.
|
||||||
- `is_agent` in `{0, 1}`
|
2. Flattens nested JSON to a tabular schema.
|
||||||
- `record_type` in `{interaction, price_log}`
|
3. Preserves row-level provenance (`source_session_dir`, `source_row_index`, topic fields).
|
||||||
|
4. Adds modeling labels (`actor_type`, `is_agent`, `record_type`).
|
||||||
|
|
||||||
|
## Schema highlights
|
||||||
|
|
||||||
|
Core modeling fields:
|
||||||
|
|
||||||
## Data Fields
|
|
||||||
Core fields used for modeling:
|
|
||||||
- `actor_type`, `is_agent`, `record_type`
|
- `actor_type`, `is_agent`, `record_type`
|
||||||
- `sessionId`, `experimentId`, `storeMode`, `ts`
|
- `sessionId`, `experimentId`, `storeMode`, `ts`
|
||||||
- `eventName`, `page`, `productId`, `price`, `userAgent`
|
- `eventName`, `page`, `productId`, `price`, `userAgent`
|
||||||
|
|
||||||
Kafka provenance fields:
|
Kafka provenance fields:
|
||||||
|
|
||||||
- `kafka_partition_id`, `kafka_offset`, `kafka_timestamp_ms`, `kafka_compression`
|
- `kafka_partition_id`, `kafka_offset`, `kafka_timestamp_ms`, `kafka_compression`
|
||||||
- `kafka_is_transactional`, `kafka_headers`, `kafka_key_*`, `kafka_value_*`
|
- `kafka_is_transactional`, `kafka_headers`, `kafka_key_*`, `kafka_value_*`
|
||||||
|
|
||||||
Flattened metadata fields currently present:
|
<details>
|
||||||
|
<summary>Metadata columns in this release</summary>
|
||||||
|
|
||||||
- `metadata_cabinClass`
|
- `metadata_cabinClass`
|
||||||
- `metadata_dateIndex`
|
- `metadata_dateIndex`
|
||||||
- `metadata_dwellTime`
|
- `metadata_dwellTime`
|
||||||
@@ -89,37 +132,34 @@ Flattened metadata fields currently present:
|
|||||||
- `metadata_total`
|
- `metadata_total`
|
||||||
- `metadata_type`
|
- `metadata_type`
|
||||||
|
|
||||||
Top interaction events:
|
</details>
|
||||||
- `page_view`: 236
|
|
||||||
- `learn_more_about_item`: 88
|
|
||||||
- `view_item_page`: 85
|
|
||||||
- `add_item_to_cart`: 46
|
|
||||||
- `hover_over_title`: 23
|
|
||||||
- `checkout_start`: 19
|
|
||||||
- `hover_over_paragraph`: 6
|
|
||||||
- `remove_item`: 4
|
|
||||||
|
|
||||||
## Intended Uses
|
## Quick start
|
||||||
- Human-vs-agent traffic classification
|
|
||||||
- Session-level behavioral modeling
|
|
||||||
- Dynamic pricing robustness analysis under agent-mediated reconnaissance
|
|
||||||
|
|
||||||
## Out-of-Scope Uses
|
```python
|
||||||
- Identity inference or user-level profiling
|
from datasets import load_dataset
|
||||||
- Credit, employment, insurance, or legal decision making
|
|
||||||
|
|
||||||
## Data Splits
|
ds = load_dataset("velocitatem/whoclickedit")
|
||||||
No official train/validation/test split is provided in the current release.
|
```
|
||||||
Users should create time-aware or session-aware splits to avoid leakage.
|
|
||||||
|
|
||||||
## Privacy and Sensitive Content
|
Recommended split strategy:
|
||||||
- `userAgent` and referrer metadata can be quasi-identifying in small samples.
|
|
||||||
- Use care before publishing derived artifacts that can re-identify participants.
|
|
||||||
|
|
||||||
## Limitations
|
- Prefer session-aware or time-aware splits.
|
||||||
- Data is generated in a controlled experiment platform, not a full production marketplace.
|
- Do not split rows from the same `sessionId` across train and test.
|
||||||
- Agent traffic currently reflects the configured tasking and browser automation setup.
|
|
||||||
- Coverage is stronger for `hotel` than `airline` in the current release.
|
## Intended use
|
||||||
|
|
||||||
|
- Human-vs-agent behavior classification.
|
||||||
|
- Session-level telemetry modeling for dynamic pricing defenses.
|
||||||
|
- Robustness experiments under agent-mediated reconnaissance.
|
||||||
|
|
||||||
|
## Safety and limitations
|
||||||
|
|
||||||
|
- `userAgent` and referrer metadata can be quasi-identifying in very small samples.
|
||||||
|
- Data comes from a controlled research platform, not a full production marketplace.
|
||||||
|
- Current release has stronger coverage for `hotel` flows than `airline` flows.
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
If you use this dataset, cite the PHANTOM thesis project and link this dataset page.
|
|
||||||
|
If you use this dataset, cite the PHANTOM thesis project and link this page:
|
||||||
|
`https://huggingface.co/datasets/velocitatem/whoclickedit`
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ import os
|
|||||||
import sys
|
import sys
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
from urllib.parse import quote
|
||||||
|
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
from huggingface_hub import HfApi
|
from huggingface_hub import HfApi
|
||||||
@@ -93,6 +94,28 @@ def _time_range(df: pd.DataFrame) -> tuple[str, str]:
|
|||||||
return ts.min().isoformat(), ts.max().isoformat()
|
return ts.min().isoformat(), ts.max().isoformat()
|
||||||
|
|
||||||
|
|
||||||
|
def _badge(label: str, value: str, color: str, logo: str | None = None) -> str:
|
||||||
|
encoded_label = quote(label, safe="")
|
||||||
|
encoded_value = quote(value, safe="")
|
||||||
|
base = (
|
||||||
|
"https://img.shields.io/badge/"
|
||||||
|
f"{encoded_label}-{encoded_value}-{color}?style=flat-square"
|
||||||
|
)
|
||||||
|
if logo:
|
||||||
|
base = f"{base}&logo={quote(logo, safe='')}&logoColor=white"
|
||||||
|
return f""
|
||||||
|
|
||||||
|
|
||||||
|
def _md_table(headers: list[str], rows: list[list[str]]) -> str:
|
||||||
|
header = f"| {' | '.join(headers)} |"
|
||||||
|
separator = f"| {' | '.join('---' for _ in headers)} |"
|
||||||
|
if not rows:
|
||||||
|
empty = f"| {' | '.join('n/a' for _ in headers)} |"
|
||||||
|
return "\n".join([header, separator, empty])
|
||||||
|
body = "\n".join(f"| {' | '.join(row)} |" for row in rows)
|
||||||
|
return "\n".join([header, separator, body])
|
||||||
|
|
||||||
|
|
||||||
def _render_card(df: pd.DataFrame) -> str:
|
def _render_card(df: pd.DataFrame) -> str:
|
||||||
total_rows = len(df)
|
total_rows = len(df)
|
||||||
total_cols = len(df.columns)
|
total_cols = len(df.columns)
|
||||||
@@ -112,31 +135,76 @@ def _render_card(df: pd.DataFrame) -> str:
|
|||||||
|
|
||||||
metadata_cols = sorted(c for c in df.columns if c.startswith("metadata_"))
|
metadata_cols = sorted(c for c in df.columns if c.startswith("metadata_"))
|
||||||
|
|
||||||
actor_lines = (
|
total_sessions = sum(session_counts.values())
|
||||||
"\n".join(f"- `{k}`: {v}" for k, v in actor_counts.items()) or "- none"
|
human_rows = actor_counts.get("human", 0)
|
||||||
|
agent_rows = actor_counts.get("agent", 0)
|
||||||
|
|
||||||
|
top_events = list(event_counts.items())[:10]
|
||||||
|
|
||||||
|
snapshot_table = _md_table(
|
||||||
|
["Metric", "Value"],
|
||||||
|
[
|
||||||
|
["Rows", f"`{total_rows}`"],
|
||||||
|
["Columns", f"`{total_cols}`"],
|
||||||
|
["Time range (UTC)", f"`{t_min}` -> `{t_max}`"],
|
||||||
|
["Unique sessions", f"`{total_sessions}`"],
|
||||||
|
],
|
||||||
)
|
)
|
||||||
record_lines = (
|
|
||||||
"\n".join(f"- `{k}`: {v}" for k, v in record_counts.items()) or "- none"
|
actor_table = _md_table(
|
||||||
|
["Actor", "Rows", "Share"],
|
||||||
|
[
|
||||||
|
[
|
||||||
|
"`human`",
|
||||||
|
str(human_rows),
|
||||||
|
f"{(human_rows / total_rows * 100):.1f}%" if total_rows else "0.0%",
|
||||||
|
],
|
||||||
|
[
|
||||||
|
"`agent`",
|
||||||
|
str(agent_rows),
|
||||||
|
f"{(agent_rows / total_rows * 100):.1f}%" if total_rows else "0.0%",
|
||||||
|
],
|
||||||
|
],
|
||||||
)
|
)
|
||||||
pair_lines = (
|
|
||||||
"\n".join(
|
pair_table = _md_table(
|
||||||
f"- `{a}` / `{r}`: {n}"
|
["Actor", "Record type", "Rows"],
|
||||||
for (a, r), n in sorted(
|
[
|
||||||
|
[f"`{actor}`", f"`{record}`", str(n)]
|
||||||
|
for (actor, record), n in sorted(
|
||||||
by_actor_record.items(), key=lambda x: (x[0][0], x[0][1])
|
by_actor_record.items(), key=lambda x: (x[0][0], x[0][1])
|
||||||
)
|
)
|
||||||
|
],
|
||||||
)
|
)
|
||||||
or "- none"
|
|
||||||
|
store_table = _md_table(
|
||||||
|
["Store mode", "Rows"],
|
||||||
|
[
|
||||||
|
[f"`{mode}`", str(n)]
|
||||||
|
for mode, n in sorted(
|
||||||
|
store_counts.items(), key=lambda x: x[1], reverse=True
|
||||||
)
|
)
|
||||||
store_lines = (
|
],
|
||||||
"\n".join(f"- `{k}`: {v}" for k, v in store_counts.items()) or "- none"
|
|
||||||
)
|
)
|
||||||
session_lines = (
|
|
||||||
"\n".join(f"- `{k}`: {v}" for k, v in session_counts.items()) or "- none"
|
event_table = _md_table(
|
||||||
|
["Interaction event", "Count"],
|
||||||
|
[[f"`{name}`", str(n)] for name, n in top_events],
|
||||||
)
|
)
|
||||||
top_events = list(event_counts.items())[:10]
|
|
||||||
event_lines = "\n".join(f"- `{k}`: {v}" for k, v in top_events) or "- none"
|
|
||||||
metadata_lines = "\n".join(f"- `{c}`" for c in metadata_cols) or "- none"
|
metadata_lines = "\n".join(f"- `{c}`" for c in metadata_cols) or "- none"
|
||||||
|
|
||||||
|
dataset_badge = (
|
||||||
|
"[](https://huggingface.co/datasets/velocitatem/whoclickedit)"
|
||||||
|
)
|
||||||
|
rows_badge = _badge("Rows", str(total_rows), "0A9396")
|
||||||
|
cols_badge = _badge("Columns", str(total_cols), "005F73")
|
||||||
|
sessions_badge = _badge("Sessions", str(total_sessions), "1D3557")
|
||||||
|
human_badge = _badge("Human rows", str(human_rows), "2A9D8F")
|
||||||
|
agent_badge = _badge("Agent rows", str(agent_rows), "E76F51")
|
||||||
|
license_badge = _badge("License", "MIT", "111827")
|
||||||
|
|
||||||
return f"""---
|
return f"""---
|
||||||
pretty_name: whoclickedit
|
pretty_name: whoclickedit
|
||||||
license: mit
|
license: mit
|
||||||
@@ -156,85 +224,114 @@ size_categories:
|
|||||||
- {size_cat}
|
- {size_cat}
|
||||||
---
|
---
|
||||||
|
|
||||||
# Dataset Card for whoclickedit
|
<img align="right" width="280" src="https://raw.githubusercontent.com/velocitatem/PHANTOM/main/docs/static/images/banner.svg" alt="PHANTOM research banner" />
|
||||||
|
|
||||||
## Dataset Summary
|
# [whoclickedit](https://huggingface.co/datasets/velocitatem/whoclickedit)
|
||||||
whoclickedit is an event-level behavioral dataset for human versus agent interaction analysis in dynamic pricing experiments.
|
|
||||||
It merges interaction logs and price quote logs into one flat CSV (`whoclicked.csv`) with explicit labels for actor type.
|
|
||||||
|
|
||||||
## Dataset Snapshot
|
{dataset_badge}
|
||||||
- Rows: `{total_rows}`
|
{rows_badge}
|
||||||
- Columns: `{total_cols}`
|
{cols_badge}
|
||||||
- Time range (UTC): `{t_min}` to `{t_max}`
|
{sessions_badge}
|
||||||
- Unique sessions by actor:
|
{human_badge}
|
||||||
{session_lines}
|
{agent_badge}
|
||||||
- Rows by actor:
|
{license_badge}
|
||||||
{actor_lines}
|
|
||||||
- Rows by record type:
|
> **Event-level behavior data for dynamic pricing research.**
|
||||||
{record_lines}
|
> This dataset captures how humans and automated agents browse, query prices, and move through the PHANTOM storefronts during controlled experiments.
|
||||||
- Rows by actor x record type:
|
|
||||||
{pair_lines}
|
## What this dataset gives you
|
||||||
- Store modes:
|
|
||||||
{store_lines}
|
- A single flat file (`whoclicked.csv`) with both interaction and price-log events.
|
||||||
|
- Explicit labels for actor origin: `actor_type` and `is_agent`.
|
||||||
|
- Provenance fields from Kafka envelopes when available.
|
||||||
|
- Metadata flattened into feature-ready `metadata_*` columns.
|
||||||
|
|
||||||
|
## Snapshot
|
||||||
|
|
||||||
|
{snapshot_table}
|
||||||
|
|
||||||
|
## Composition
|
||||||
|
|
||||||
|
### Rows by actor
|
||||||
|
{actor_table}
|
||||||
|
|
||||||
|
### Rows by actor and record type
|
||||||
|
{pair_table}
|
||||||
|
|
||||||
|
### Store mode coverage
|
||||||
|
{store_table}
|
||||||
|
|
||||||
|
### Top interaction events
|
||||||
|
{event_table}
|
||||||
|
|
||||||
|
## Collection pipeline
|
||||||
|
|
||||||
|
Data is sourced from two roots inside PHANTOM:
|
||||||
|
|
||||||
## Source and Processing
|
|
||||||
Data is collected from two local roots in the PHANTOM project:
|
|
||||||
- `experiments/collected_data` (human sessions)
|
- `experiments/collected_data` (human sessions)
|
||||||
- `experiments/agents/collected_data` (agent sessions)
|
- `experiments/agents/collected_data` (agent sessions)
|
||||||
|
|
||||||
Each session folder contains:
|
Each session directory contains:
|
||||||
- `int.json` (interaction events)
|
|
||||||
- `price.json` (price quote logs)
|
|
||||||
|
|
||||||
The ETL does the following:
|
- `int.json`: user interaction events
|
||||||
- Normalizes both Kafka-envelope and flat payload formats
|
- `price.json`: price quote observations
|
||||||
- Flattens nested metadata fields into `metadata_*` columns
|
|
||||||
- Preserves all raw rows (no deduplication)
|
ETL behavior:
|
||||||
- Adds labels:
|
|
||||||
- `actor_type` in `{{human, agent}}`
|
1. Accepts both Kafka-envelope records and flat payload records.
|
||||||
- `is_agent` in `{{0, 1}}`
|
2. Flattens nested JSON to a tabular schema.
|
||||||
- `record_type` in `{{interaction, price_log}}`
|
3. Preserves row-level provenance (`source_session_dir`, `source_row_index`, topic fields).
|
||||||
|
4. Adds modeling labels (`actor_type`, `is_agent`, `record_type`).
|
||||||
|
|
||||||
|
## Schema highlights
|
||||||
|
|
||||||
|
Core modeling fields:
|
||||||
|
|
||||||
## Data Fields
|
|
||||||
Core fields used for modeling:
|
|
||||||
- `actor_type`, `is_agent`, `record_type`
|
- `actor_type`, `is_agent`, `record_type`
|
||||||
- `sessionId`, `experimentId`, `storeMode`, `ts`
|
- `sessionId`, `experimentId`, `storeMode`, `ts`
|
||||||
- `eventName`, `page`, `productId`, `price`, `userAgent`
|
- `eventName`, `page`, `productId`, `price`, `userAgent`
|
||||||
|
|
||||||
Kafka provenance fields:
|
Kafka provenance fields:
|
||||||
|
|
||||||
- `kafka_partition_id`, `kafka_offset`, `kafka_timestamp_ms`, `kafka_compression`
|
- `kafka_partition_id`, `kafka_offset`, `kafka_timestamp_ms`, `kafka_compression`
|
||||||
- `kafka_is_transactional`, `kafka_headers`, `kafka_key_*`, `kafka_value_*`
|
- `kafka_is_transactional`, `kafka_headers`, `kafka_key_*`, `kafka_value_*`
|
||||||
|
|
||||||
Flattened metadata fields currently present:
|
<details>
|
||||||
|
<summary>Metadata columns in this release</summary>
|
||||||
|
|
||||||
{metadata_lines}
|
{metadata_lines}
|
||||||
|
|
||||||
Top interaction events:
|
</details>
|
||||||
{event_lines}
|
|
||||||
|
|
||||||
## Intended Uses
|
## Quick start
|
||||||
- Human-vs-agent traffic classification
|
|
||||||
- Session-level behavioral modeling
|
|
||||||
- Dynamic pricing robustness analysis under agent-mediated reconnaissance
|
|
||||||
|
|
||||||
## Out-of-Scope Uses
|
```python
|
||||||
- Identity inference or user-level profiling
|
from datasets import load_dataset
|
||||||
- Credit, employment, insurance, or legal decision making
|
|
||||||
|
|
||||||
## Data Splits
|
ds = load_dataset("velocitatem/whoclickedit")
|
||||||
No official train/validation/test split is provided in the current release.
|
```
|
||||||
Users should create time-aware or session-aware splits to avoid leakage.
|
|
||||||
|
|
||||||
## Privacy and Sensitive Content
|
Recommended split strategy:
|
||||||
- `userAgent` and referrer metadata can be quasi-identifying in small samples.
|
|
||||||
- Use care before publishing derived artifacts that can re-identify participants.
|
|
||||||
|
|
||||||
## Limitations
|
- Prefer session-aware or time-aware splits.
|
||||||
- Data is generated in a controlled experiment platform, not a full production marketplace.
|
- Do not split rows from the same `sessionId` across train and test.
|
||||||
- Agent traffic currently reflects the configured tasking and browser automation setup.
|
|
||||||
- Coverage is stronger for `hotel` than `airline` in the current release.
|
## Intended use
|
||||||
|
|
||||||
|
- Human-vs-agent behavior classification.
|
||||||
|
- Session-level telemetry modeling for dynamic pricing defenses.
|
||||||
|
- Robustness experiments under agent-mediated reconnaissance.
|
||||||
|
|
||||||
|
## Safety and limitations
|
||||||
|
|
||||||
|
- `userAgent` and referrer metadata can be quasi-identifying in very small samples.
|
||||||
|
- Data comes from a controlled research platform, not a full production marketplace.
|
||||||
|
- Current release has stronger coverage for `hotel` flows than `airline` flows.
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
If you use this dataset, cite the PHANTOM thesis project and link this dataset page.
|
|
||||||
|
If you use this dataset, cite the PHANTOM thesis project and link this page:
|
||||||
|
`https://huggingface.co/datasets/velocitatem/whoclickedit`
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user