Files
Daniel Alves Rösel ad9423bf59 Airflow addition (#28)
* introducing airflow to run pipeline

* chore: updating dag with upload to registry

* introducing complete provider (non refactored and noisy)

* chore: removing old shit

* generic pricing baselines

* feature: super simple model registry (to be updated maybe third party OS software)

* chore: refactoring the providers docker config and requirements

* chore: refactored and broke down components (braking

* exporting all

* local pipeline excution working

* fix: fixing import structures from nonrelativistic

* chore: enables cross comm pickling with fully e2e pipeline compilation

* docs: what the pipeline is like now

* pipelines local running and pipeline high level definition

* cleaning old pipeline and vectorization

* leaked but fixing, not so important

* test: started with pipeline step testing

* chore: cleaning up provider of prices

* test: extra tests wit hsemantic meaning checks

* migrating pricers

* feature: introducing pricing predictors (pricers)

* chore: e2e is done with new pipeline

* extra session feature extraction

* feature: experiemntal sessin pricer and metrics(vibe)

* chore: redefined and connected pricers (#29)
2025-11-29 17:50:16 +01:00

20 lines
686 B
Python
Executable File

import os
import pandas as pd
import requests
from typing import List
from procesing.providers.base import DataProvider
class BackendAPIProvider(DataProvider):
"""Concrete backend API implementation"""
def __init__(self, backend_url: str = None):
self.backend_url = backend_url or os.getenv("BACKEND_URL", "http://localhost:5000")
def fetch_kafka_topic(self, topic: str) -> pd.DataFrame:
resp = requests.get(f"{self.backend_url}/api/kafka/dump?topic={topic}")
resp.raise_for_status()
data = resp.json()
if not data.get('success') or not data.get('data'):
return pd.DataFrame()
return pd.DataFrame(data['data'])