As mentioned in the previous article on Decisioin trees, it is possible to tune model parameters automatically. For example, regression and classification have the same core model tuning functionality. The automation approach would be to select evaluator, pick metric to optimize for and then train pipeline to do the parameter tuning.
Each so selected model can be evaluated against standard metrics as offered by RegressionMetrics Spark Scala and Python objects:
RegressionEvaluator is used to build the pipeline, as in sample below:
importorg.apache.spark.ml.evaluation.RegressionEvaluatorimportorg.apache.spark.ml.regression.GeneralizedLinearRegressionimportorg.apache.spark.ml.Pipelineimportorg.apache.spark.ml.tuning.{CrossValidator,ParamGridBuilder}valglr=newGeneralizedLinearRegression().setFamily("gaussian").setLink("identity")valpipeline=newPipeline().setStages(Array(glr))valparams=newParamGridBuilder().addGrid(glr.regParam,Array(0,0.5,1)).build()valevaluator=newRegressionEvaluator().setMetricName("rmse").setPredictionCol("prediction").setLabelCol("label")valcv=newCrossValidator().setEstimator(pipeline).setEvaluator(evaluator).setEstimatorParamMaps(params).setNumFolds(2)// should always be 3 or more but this dataset is smallvalmodel=cv.fit(df)

importorg.apache.spark.mllib.evaluation.RegressionMetricsvalout=model.transform(df).select("prediction","label").rdd.map(x=>(x(0).asInstanceOf[Double],x(1).asInstanceOf[Double]))valmetrics=newRegressionMetrics(out)println(s"MSE =${metrics.meanSquaredError}")println(s"RMSE =${metrics.rootMeanSquaredError}")println(s"R-squared =${metrics.r2}")println(s"MAE =${metrics.meanAbsoluteError}")println(s"Explained variance =${metrics.explainedVariance}")
