Enterprise Analytics Platform FAQs

FAQs for MI Lab

Q: What preparation is required for accessing data in HDFS/HIVE through notebook?

A: Complete the following steps before access data in HDFS/HIVE through notebook:

  1. When requesting container resource through Resource Management, select Enable read access for HDFS and Data Warehouse.
  2. When adding PVC through Resource Configuration > Storage Configuration, use the requested resource with read access for HDFS and Data Warehouse (and used by MI Lab).
  3. When creating Notebook instance, select the spark or pyspark image, and also select the Mount Hadoop PVC option.

Q: How to use Python to retrieve big data amount from HIVE for model training?

A: If the data amount is not big, use pyhive. For big data amount, consider downloading files from HDFS to local storage. Process the data with HIVE SQL and compress it into an ORC file, and then process the ORC file with pyarrow package.

Q: How to collaborate with others within Notebook?

A: In different Notebook instances, mount the same PVC storage to realize collaboration and sharing.

Q: The Kernel fails to start (Error Starting Kernel) after switching the environment or performing other operations in the Notebook. How to resolve this problem?

A: Try running the python3 -m ipykernel install --user command in the Terminal.

Q: Can I add a new Kernel in the Notebook?

A: Yes. Please refer to the following commands:

conda create -n py36-test python=3.6
source activate py36-test
conda install ipykernel
python -m ipykernel install --name py36-test
conda deactivate

Q: Why does my Notebook instance become slow after using the Notebook for some time?

A: When opening the Notebook instance, some Kernel sessions or Terminal sessions will be created. When you close the Notebook, these sessions will not be closed for easier access when you open the Notebook again. You need to close these sessions manually if they are not needed.

Q: After installing some packages in the Notebook, some package dependency issues happen. How can I restore the Notebook instance to the initial status?

A: In the Notebook menu, click File > Shut Down to close the Notebook instance. Open the Notebook again to restore the Notebook status.

FAQs for MI Hub

Q: When calling model service APIs, if the request body is too big or the processing time exceeds the limit, a timeout error will be reported. How to solve this problem?

A: When deploying a model version, you can set the Timeout value for the model service API (the maximum value is 60,000 ms).

Q: When using MLflow version 1.10.0, a compatibility issue may occur, which causes the model version not being successfully published. How to solve this problem?

A: MLflow version 1.8 is integrated in MI Hub and MI Lab by default. If you upgrade MLflow to version 1.10.0 or use a newer MLflow version in model development, you must use artifacts files of MLflow 1.8 version for publishing model versions.

Q: Model services deployed by MI Hub can be called within the cluster only. How to call the services cross clusters?

A: To expose model services, the services must be published through EnOS API Management, which provides authentication and traffic control service. For more information about EnOS APIM, see API Management.

Q: When calling model service APIs, what is the scope of using authentication?

A: When calling EAP model service APIs, try to use the authentication function in a way other than Seldon SDK. For internal calls with REST or GRPC, authentication is not required.

Q: The request time for calling EAP model service API is not stable. How to improve the stability of request time?

A: You can try increasing the memory request when deploying the model and test calling the model service through Postman.

FAQs for MI Pipelines

Q: The workflows in both Enterprise Analytics Platform and Enterprise Data Platform support scheduling. What are the differences?

A: The batch processing workflows of Enterprise Data Platform are for data synchronization and data processing, which supports synchronizing and processing structured data and file streams based on Data IDE, Shell, and Python. The workflows are used by data engineers.

The intelligent workflows of Enterprise Analytics Platform are for the lifecycle management of machine learning models, including data preparation, model training, model deployment, and model prediction service. The workflows are used by data scientists.

Q: When workflows are running, how to control the concurrency in the case of high concurrency?

A: Use the following methods to control workflow concurrency by levels:

  • Control the maximum number of runs at the same time of each workflow by setting the maximum concurrency number at runtime.
  • Control the maximum number of concurrent pods of a single workflow by setting the advanced parameter “maximum pod number”.
  • Control the item concurrency of the ParallelFor operator by setting the concurrency parameter of the operator.
  • Control the concurrency of operators by setting the “maximum pod number” parameter of the ParallelFor operator.

By setting the above 4 parameters, you can control the concurrency of a workflow from run to pod.

Q: Can data be transferred between operators of a workflow?

A: Yes, you can use the File operator or the Directory operator for transferring data.

Q: When an operator in a workflow runs in error, can the workflow rerun from where the error occurs?

A: Yes. You can click Retry on the Running Instance Detail page upon running errors. Note that rerunning the workflow is only for occasional errors. If operator parameter configuration is modified after an error occurs, rerunning will not take effect. Secondly, if the running exceeds the timeout setting of the workflow, rerunning will not take effect either.

Q: How to monitor the resource usage of a running workflow?

A: You can find the pod name on the Running Instance Detail page. Then, check and monitor the usage of pod running resources in Granfana by the pod name.

FAQs for Resource Management

Q: How does the resources that are requested through Resource Management correspond to the resource instances when deploying models? How to set the Resource Request and Resource Limit when deploying models?

A: The unit of requested resources is 1CU = 1core CPU + 16Gi Memory, which corresponds to the Resource Request in EAP model deployment. The setting of Resource Limit is based on the K8S QoS mechanism, helping users to better control the resource usage. Currently, the ratio of Request and Limit in EAP model deployment is 1:10. If you need higher Pod priority, you can set the Request and Limit to the same value, but you have to bear higher resource costs.