PMML model export - RDD-based API

2020-01-21
  • Table of contents {:toc}

spark.mllib supported models

spark.mllib supports model export to Predictive Model Markup Language (PMML).

The table below outlines the spark.mllib models that can be exported to PMML and their equivalent PMML model.

spark.mllib model PMML model
KMeansModel ClusteringModel
LinearRegressionModel RegressionModel (functionName="regression")
RidgeRegressionModel RegressionModel (functionName="regression")
LassoModel RegressionModel (functionName="regression")
SVMModel RegressionModel (functionName="classification" normalizationMethod="none")
Binary LogisticRegressionModel RegressionModel (functionName="classification" normalizationMethod="logit")

Examples

To export a supported `model` (see table above) to PMML, simply call `model.toPMML`. As well as exporting the PMML model to a String (`model.toPMML` as in the example above), you can export the PMML model to other formats. Refer to the [`KMeans` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.KMeans) and [`Vectors` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$) for details on the API. Here a complete example of building a KMeansModel and print it out in PMML format: {% include_example scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala %} For unsupported models, either you will not find a `.toPMML` method or an `IllegalArgumentException` will be thrown.