Criteo Data Preprocessing & Log-Binning Strategy

planningChallenge

Prompt Content

Describe your strategy for efficiently handling the large Criteo dataset. How will you load and preprocess the raw data (categorical and numerical features)? Detail the log-binning strategy for numerical features, explaining how you will convert dense numerical values into categorical embeddings. Also, specify how sparse categorical features will be handled (e.g., hashing trick, embedding layers, handling OOV values). Provide a high-level data pipeline design, including batching and efficient loading.

Try this prompt

Open the workspace to execute this prompt with free credits, or use your own API keys for unlimited usage.

Usage Tips

Copy the prompt and paste it into your preferred AI tool (Claude, ChatGPT, Gemini)

Customize placeholder values with your specific requirements and context

For best results, provide clear examples and test different variations