Frequently Asked Questions¶

Stream Processing¶

FAQs for Stream Processing are as follows:

Q: When configuring the running mode of a stream processing pipeline, which running mode shall I choose, the Standalone Mode or the Cluster Mode?¶

A: For the Standalone Mode, the underlying resources cannot be horizontally expanded, so the running resources are limited. However, the resource utilization efficiency of the Standalone Mode is high, and it is suitable for processing data with small traffic. For the Cluster Mode, the underlying resources can be horizontally expanded to meet the resource requirement of stream processing pipelines, and it is suitable for processing data with big traffic.

Q: How much resource shall I request for running a stream processing pipeline?¶

A: You can refer to the Operator Performance Description for the performance index of each operator. The required resource can be estimated based on the used operators, the data traffic size, and the running mode configuration of the stream processing pipeline. The most recommended method is to simulate the production data flow in the test environment, adjust the running resources of the stream processing pipeline according to the operation monitoring data, and then apply the corresponding configuration to the production environment.

Q: When I start a newly published stream processing pipeline, the startup is failed. Why?¶

A: Several reasons might cause the startup failure of stream processing pipelines. You can troubleshoot with the following steps.

Ensure that your network connection is ready when you perform maintenance operations on the stream processing pipeline.
Ensure that the requested resource quota is enough for the running resource configuration of the stream processing pipeline. You can request more resources or adjust the running resource configuration as needed.
If stream processing system errors are reported, you can try restarting the stream processing pipeline or contact the EnOS operation team.

Q: My stream processing pipeline is started and running, but the calculated output is not generated as expected. Why?¶

A: The stream processing pipeline is running, but no output data is found on the monitoring page. The situation might be caused by the following reasons.

The configuration of the stream processing pipeline is not correct, such as incorrect measurement point IDs.
The input point data is not uploaded as expected, so no output data is generated.
Required system pipelines are not started and running correctly, which caused the data consumption or data output failure.
The output point is not registered in the asset model, so the calculated data cannot be stored normally.

Q: How many stream processing pipelines can be created for an organization?¶

A: Currently, an organization can have at most 50 stream processing pipelines.

Time Series Data Management¶

FAQs for Time Series Data Management are as follows:

Q: What preparation work is required before configuring TSDB storage policies?¶

A: Before configuring TSDB storage policies, you need to request for the Time Series Database resource for your organization. Otherwise, the configured TSDB storage policies will not take effect by default. To request for the Time Series Database resources, see Resource Management on EnOS.

Q: When should I configure TSDB storage policies?¶

A: It is recommended that you configure TSDB storage policies after your devices are connected to the IoT hub and before device data is ingested. Otherwise, the ingested data will not be stored in TSDB by default. If you want to store the data that is processed by the streaming engine, you must configure the TSDB storage policies for the processed data before the stream processing pipelines start running.

Q: Will the TSDB storage policies take effect immediately after the configuration is saved?¶

A: The storage policies will take effect in about 5 minutes after the configuration is saved.

Q: How many storage policy groups can be created for an organization?¶

A: Currently, an organization can have at most 5 storage policy groups.

Q: Can attributes of models and measurement points that are associated with TSDB storage policies be modified?¶

A: When TSDB storage policies are configured, the associated measurement point IDs, measurement point types, and data types cannot be modified. Otherwise, the stored data cannot be retrieved with EnOS TSDB data service APIs.

Q: My devices are connected and have started uploading data to the cloud. Why could I not get data through the data service APIs?¶

A: After device connection, you need to configure TSDB storage policies for your device measurement points. Otherwise, the ingested data will not be stored in TSDB by default, and you cannot get the data with API.

Q: Can data stored in TSDB be deleted?¶

A: Data stored in TSDB can be deleted with the Data Deletion feature. For more information, see Deleting Data in TSDB.

Q: Can data stored in TSDB be archived?¶

A: Yes. Data stored in TSDB can be archived with the Data Archiving service.

Data Federation¶

FAQs for Data Federation service are as follows:

Q: What are the differences between enabling and disabling the cross-source analysis function when starting a data federation channel?¶

A: Enabling the cross-source analysis function requires more resource request than disabling the function. Cross-source analysis supports accessing data from multiple data sources by unified SQL statements. Using a channel with cross-source analysis disabled, you can query data from a single data source only (currently supporting MySQL and Hive data sources with the corresponding querying syntax). Besides querying data, you can also download data from Hive data sources with the Federation Download feature.

Q: When configuring JDBC connection in Tableau to access data in Hive through a read channel with cross-source analysis disabled, there is no response, or an error will be reported. How to solve this issue?¶

A: Currently, Data Federation does not support JDBC connection in Tableau to access data in Hive through a read channel with cross-source analysis disabled

Why does a timeout error occur when accessing Hive data source through a read channel with cross-source analysis disabled by JDBC?¶

A: If aggregation or sorting statements like group by, count, and order by are used for data query, the mapreduce jobs will be started in Hive, which leads to minute-level running and response time. It is recommended to access data in Hive through a read channel with cross-source analysis enabled or use the Federation Download feature.

When does an error occur when using query statements as follows?¶

SELECT * FROM mysql.test.data LEFT JOIN hive.db.device ON mysql.test.data.dev_id=hive.db.device.dev_id WHERE hive.db.device.dev_id IS NOT NULL
hive

A: Name like hive.db.device.dev_id is too long. It is recommended to use an alias. The query statement can be changed to:

SELECT * FROM mysql.test.data m LEFT JOIN hive.db.device h ON m.dev_id=h.dev_id WHERE h.dev_id IS NOT NULL

Q: The read channel is started successfully, but a message “Schema does not exit” is reported when querying data. Why?¶

A: This error might be caused by network connection or component dependency issues. The data source is not added correctly. Try restarting the channel.

Q: When dragging data in Tableau or IDE, an error about invalid SQL is reported. Why?¶

A: Data Federation cannot fit Tableau or IDE fully because it has special requirements on data source, database, and table names. It is recommended to use custom SQL statements or enter SQL in the IDE Console to query data.

Data Subscription¶

FAQs for Data Subscription are as follows:

Q: How many data subscription jobs can be created for an organization?¶

A: Currently, an organization can have at most 15 data subscription jobs.

Q: How many consumer groups are supported for a data subscription job? How many consumers are supported in a consumer group?¶

A: The number of consumer groups for a data subscription job is not limited, but a consumer group allows 2 consumers to consume to subscribed data at the same time.

Q: How long will subscribed real-time asset data be stored in Kafka topics?¶

A: By default, subscribed data will be stored in Kafka topics for 3 days. In case the data consumption stops temporarily, you can continue consuming the subscribed data within 3 days after the real-time data is subscribed.

Data Archiving¶

FAQs for Data Archiving are as follows:

Q: Do data archiving jobs support both automatic and manual modes?¶

A: The running of data archiving jobs is rule-driven. You need to configure data archiving jobs based on your business needs (such as where to store the archived data, which data to archive, and the archiving cycle). When a data archiving job is started and running, data will be archived according to the configuration without human intervention.

Currently, data archiving supports Real-Time and Offline job types. For real-time job type, the data archiving job will keep running. Once data is generated from the data source, the job will archive the data according to the configuration automatically. For offline job type, the job will run only once. After all the data specified in the configuration is archived, the job will stop running.

Q: What will be impacted if the configuration of a running data archiving job is modified?¶

A: After the data archiving job configuration is modified and submitted, the updated configuration will take effect immediately. The data that has been archived will not be impacted. For example, if the storage path of archived data is changed from /tds/ods/alarm1/ to /tds/ods/alarm2/, the new storage path will take effect immediately after the change is submitted. After about 1-2 minutes, the archived data will be stored in the alarm2 directory. The archived data that has been stored in the alarm1 directory will not be impacted.

Q: How to query the data that has been archived in the target storage?¶

A: The Data Archiving service enables archiving data from the data sources to the target storage. It is a set of archiving job configuration and management tools, but it does not provide the management of the target storage systems, nor the query ability of archived data. You need to use the corresponding management tools of the target storage systems for data query. For example:

If the target storage is EnOS HDFS, you can use the Jupyter Notebook that is provided Enterprise Analytics Platform > MI Lab product to query data stored in HDFS. For information, see Managing Notebook Instance.
If the target storage is Azure Blob, you can use the client tools provided by Azure platform to query the data stored in Blob Storage.

Q: When the data archiving job is restarted after running failure, will the job re-archive the data at the moment when the job fails?¶

A: For the following situations:

For real-time data archiving, when the failed job is restarted, it will re-archive all the failed data in the last 3 days automatically. If the job failed for more than 3 days, it can process data in the latest 3 days only. Therefore, when the data archiving job failure triggers the alert notification through SMS or email, the alert receiver must take action in time to avoid data loss.
For offline data archiving, when the failed job is restarted, it will re-archive all the data again.

Data Synchronization and Batch Processing¶

FAQs for Data Synchronization and Batch Processing are as follows:

Q: Does Data Synchronization service supports synchronizing both structured data and unstructured data?¶

A: Yes. Data Synchronization service supports synchronizing structured data and file stream (unstructured data).

Q: Do Data Synchronization and Batch Processing services support system variables?¶

A: Yes. Data Synchronization and Batch Processing services support triggering time and business date variables, time-related variables, and non-time-related variables to achieve dynamic parameter transfer. For detailed information, see Supported System Variables.

Q: Do Data Synchronization and Batch Processing services support resource isolation?¶

A: Yes. Currently, the resources used by the Data Synchronization and Batch Processing services are dynamically requested on demand. After data synchronization and batch processing jobs are completed, the resources can be released. The requested resources are completely isolated and do not affect each other.

Q: Does the Batch Processing service support distributed operation of multiple tasks?¶

A: Yes. When configuring the running mode of a batch processing task, you can specify the source of the distribution key to enable distributed operation of multiple tasks for enhancing running efficiency.

Q: Do Data Synchronization and Batch Processing services support alert configuration?¶

A: Yes. After configuring alert service for the Data Synchronization and Batch Processing services, the alert messages will be sent to the specified receivers through SMS or email upon running exception.

Q: Does the Batch Processing service support calling by external applications?¶

A: Yes. The Batch Processing service provides REST APIs for integration with external applications.

Q: Can data be filtered only by the row level when synchronizing data from MySQL, SQL Server, or Oracle to EnOS Hive? Can I configure column-level data filtering conditions or complex query conditions to filter data in the data source?¶

A: Besides configuring row-level data filtering conditions in the data source, you can click Switch to SQL and enter data query statements to filter data when configuring the data synchronization job. However, note that the performance of SQL query statements depends on the performance of the source databases. Running SQL statements might also impact the database performance. It is not recommended to enter SQL query statements that are too complicated.

Q: Why does the status of a batch processing workflow remain `Running` when the running log shows that it is completed?¶

A: When the running of the last task node of the workflow is completed, the status of the workflow might remain Running because the workflow status is monitored and triggered by another process. The status change has a short latency.