32 refine data pipeline training data construction (#37)

* feature: modularized feature engineering for ml setup (new pipeline)

* chore: updating imports properly

* test: updating fixtures with ua and meta

* chore: migrating code ignore groups

* chore: syntax cleaning and code quality

* chore: fixing pipeline data compatability

* Update experiments/procesing/steps/session.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* chore: refactoring and dixing path joining

* chore: refactoring function definition to avoid reinit

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This commit is contained in:
Daniel Alves Rösel
2025-12-12 12:15:15 +01:00
committed by GitHub
parent a2a443c027
commit a1916c966c
6 changed files with 316 additions and 159 deletions

View File

@@ -269,3 +269,13 @@ def empty_context(empty_provider):
store_mode='hotel',
window_size='30s'
)
@pytest.fixture
def session_interactions(mock_interactions):
"""Enriched interaction data for session feature extraction tests"""
df = mock_interactions.copy()
df['userAgent'] = ['Mozilla/5.0 Chrome/120', 'Mozilla/5.0 Chrome/120',
'HeadlessChrome/120', 'HeadlessChrome/120', 'HeadlessChrome/120']
df['metadata_base_price'] = [None, None, 150.0, 150.0, 200.0]
return df