• MLeap 序列化
    • 创建一个简单的 MLeap Pipeline
    • 序列化为 Zip 文件
      • JSON 格式
      • Protobuf 格式
    • 序列化为目录
      • JSON 格式
      • Protobuf 格式
    • 反序列化
      • 反序列化 Zip Bundle
      • 反序列化目录 Bundle

    MLeap 序列化

    MLeap 中序列化和反序列化都非常简单。你可以选择序列化 MLeap Bundle 到文件系统中的一个目录,或者是序列化为一个 Zip 压缩包以便用于后期分发。

    创建一个简单的 MLeap Pipeline

    1. import ml.combust.bundle.BundleFile
    2. import ml.combust.bundle.serializer.SerializationFormat
    3. import ml.combust.mleap.core.feature.{OneHotEncoderModel, StringIndexerModel}
    4. import ml.combust.mleap.core.regression.LinearRegressionModel
    5. import ml.combust.mleap.runtime.transformer.Pipeline
    6. import ml.combust.mleap.runtime.transformer.feature.{OneHotEncoder, StringIndexer, VectorAssembler}
    7. import ml.combust.mleap.runtime.transformer.regression.LinearRegression
    8. import org.apache.spark.ml.linalg.Vectors
    9. import ml.combust.mleap.runtime.MleapSupport._
    10. import resource._
    11. // Create a sample pipeline that we will serialize
    12. // And then deserialize using various formats
    13. val stringIndexer = StringIndexer(
    14. shape = NodeShape.scalar(inputCol = "a_string", outputCol = "a_string_index"),
    15. model = StringIndexerModel(Seq("Hello, MLeap!", "Another row")))
    16. val oneHotEncoder = OneHotEncoder(
    17. shape = NodeShape.vector(1, 2, inputCol = "a_string_index", outputCol = "a_string_oh"),
    18. model = OneHotEncoderModel(2, dropLast = false))
    19. val featureAssembler = VectorAssembler(
    20. shape = NodeShape().withInput("input0", "a_string_oh").
    21. withInput("input1", "a_double").withStandardOutput("features"),
    22. model = VectorAssemblerModel(Seq(TensorShape(2), ScalarShape())))
    23. val linearRegression = LinearRegression(
    24. shape = NodeShape.regression(3),
    25. model = LinearRegressionModel(Vectors.dense(2.0, 3.0, 6.0), 23.5))
    26. val pipeline = Pipeline(
    27. shape = NodeShape(),
    28. model = PipelineModel(Seq(stringIndexer, oneHotEncoder, featureAssembler, linearRegression)))

    序列化为 Zip 文件

    In order to serialize to a zip file, make sure the URI begins with jar:file and ends with a .zip.

    为了序列化为 Zip 文件,需要确保 URL 以 jar:file 开头,以 .zip 结尾。

    For example jar:file:/tmp/mleap-bundle.zip.

    例如: jar:file:/tmp/mleap-bundle.zip

    JSON 格式

    1. for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-json.zip"))) {
    2. pipeline.writeBundle.format(SerializationFormat.Json).save(bundle)
    3. }

    Protobuf 格式

    1. for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-protobuf.zip"))) {
    2. pipeline.writeBundle.format(SerializationFormat.Protobuf).save(bundle)
    3. }

    序列化为目录

    为了序列化为目录,需要确保 URL 以 file 开头。

    例如: file:/tmp/mleap-bundle-dir

    JSON 格式

    1. for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-json-dir"))) {
    2. pipeline.writeBundle.format(SerializationFormat.Json).save(bundle)
    3. }

    Protobuf 格式

    1. for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-protobuf-dir"))) {
    2. pipeline.writeBundle.format(SerializationFormat.Protobuf).save(bundle)
    3. }

    反序列化

    反序列化和序列化一样简单,你无需事先知道 MLeap Bundle 的序列化格式,唯一需要了解的,是这个包的路径。

    反序列化 Zip Bundle

    1. // Deserialize a zip bundle
    2. // Use Scala ARM to make sure resources are managed properly
    3. val zipBundle = (for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-json.zip"))) yield {
    4. bundle.loadMleapBundle().get
    5. }).opt.get

    反序列化目录 Bundle

    1. // Deserialize a directory bundle
    2. // Use Scala ARM to make sure resources are managed properly
    3. val dirBundle = (for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-json-dir"))) yield {
    4. bundle.loadMleapBundle().get
    5. }).opt.get