Back to Prompt Library
implementation

Develop Fairness Evaluation with Alibi Detect

Inspect the original prompt language first, then copy or adapt it once you know how it fits your workflow.

Linked challenge: Voice-Activated Dynamic Playlist Generator

Format
Code-aware
Lines
16
Sections
5
Linked challenge
Voice-Activated Dynamic Playlist Generator

Prompt source

Original prompt text with formatting preserved for inspection.

16 lines
5 sections
No variables
1 code block
Create a module that uses Alibi Detect to assess the fairness of the generated playlists. This module should take a set of simulated user profiles (with demographic data and past listening habits) and the agent's playlist recommendations as input. Implement a specific fairness metric, such as the Disparate Impact Ratio, to identify potential biases in genre or artist recommendations. Explain how you would integrate this evaluation into a continuous integration pipeline for the agent.

```python
import numpy as np
import pandas as pd
from alibiexplainer.fairness.disparate_impact import DisparateImpact

# Example simulated data
# user_profiles = pd.DataFrame({'user_id': [...], 'demographic_group': [...], 'preferred_genre': [...]})
# generated_playlists = pd.DataFrame({'user_id': [...], 'recommended_genre': [...]})

# target_variable = 'recommended_genre'
# protected_attribute = 'demographic_group'
# category_mapping = {'preferred': 1, 'not_preferred': 0} # Example mapping

# di = DisparateImpact(metric_name='ratio', protected_attribute=protected_attribute,
#                     favourable_label=1, unfavourable_label=0, category_mapping=category_mapping)
# scores = di.fit(X=user_profiles, y=generated_playlists[target_variable]).scores
# print(scores)
```

Adaptation plan

Keep the source stable, then change the prompt in a predictable order so the next run is easier to evaluate.

Keep stable

Hold the task contract and output shape stable so generated implementations remain comparable.

Tune next

Update libraries, interfaces, and environment assumptions to match the stack you actually run.

Verify after

Test failure handling, edge cases, and any code paths that depend on hidden context or secrets.