Constraints and Heuristics: Leveraging Dataset Metadata for Efficient AutoML

2025 · AutoML / Pipeline Optimization

First Author

MetaFlow, a metadata-driven AutoML framework that selects models using dataset properties combined with heuristic and constraint rules to eliminate unpromising algorithms before training.

Research Focus

AutoMLPipeline selectionMeta learningAlgorithm selectionData profiling

Publication Type

Research Paper

Area

Machine Learning

Summary

Developing effective machine learning pipelines requires appropriate algorithm selection and hyperparameter tuning tailored to dataset characteristics. Traditional AutoML systems rely on exhaustive search or computationally expensive optimization, which is infeasible in many real-world settings. This paper introduces MetaFlow, a metadata-driven AutoML framework that selects models using dataset properties combined with heuristic and constraint rules. Given metadata sample size, feature count, and class imbalance, MetaFlow eliminates unpromising algorithms before training and concentrates computation on the most suitable candidates. Experiments on four real-world datasets show that metadata-conditioned constraints prune the candidate search space by 67% without sacrificing predictive competitiveness.

Open Paper ↗Download PDF

Next Research >