Codementor Events

How to Sync Up Data from MaxCompute to Greenplum with DataWorks

Published Mar 15, 2019Last updated Apr 18, 2019

By Jeffrey Gao, Solutions Architect

Alibaba Cloud DataWorks is the Big Data platform product launched by Alibaba Cloud, with the capabilities of one-stop Big Data development, data permission management, offline job scheduling, data integration (including data sync) and other features.

Today, we will demo how to use the data sync feature of DataWorks, to synchronize data, from MaxCompute, the most advanced big data platform of Alibaba Cloud, to Greenplum, one of the popular MPP database.

DataWorks supports multiple data source types to do synchronization. For more information, please refer to https://www.alibabacloud.com/help/doc-detail/53008.htm?spm=a2c41.12636940.0.0.750f6569pEjP1m

About Greenplum

Greenplum database is an open-source massively parallel data platform. It’s based on PostgreSQL and equipped with the analytical tools necessary to draw additional insights from your data. Greenplum’s massive parallel processing architecture provides automatic parallelization of all data and queries in a scale-out, shared nothing architecture.

Synching MaxCompute to Greenplum with DataWorks

When the Greenplum instance is ready, we can use pgAdmin tool to login to manage the data. Before data synchronization, the table is empty.
d8213605f2a780e735f0b107155b6ed3d220cbaa[1].png

We need to provision the data source properties, including source and destination. Since Greenplum is based on PostgreSQL, we can put it as PostgreSQL data source.
3491f5dd85a77f74ac47338aa3f85adbe5bf3b80[1].png

Then we set up a data sync task.
6d71b7420601646d4b2dff4044233959be23b0ef[1].png

In data sync provisioning, we can provision the data source and destination, including the corresponding tables.
8aad2a54c240fa486e0f20ed3c36e6d12e3de2c1[1].png

Then provision the mappings of fields and types between the source and destination.
7d1ac32d5ef28fb0204d40e24444901967f52207[1].png

When provision is done, we can execute the task and check the Runtime Log on the data synchronization status.
7f92fc62e90340b3eef92acc0c447b82288fe53b[1].png

We can also login the Greenplum instance to check if data is already synchronized.
662580a36c8c703006522df1589041a50e5c08de[1].png

Furthermore, if we need this task be automatically executed periodically, we can provision the scheduling mode in the tab of Schedule.

81b0e5ffbf619d731f2b8610a8cee763640d4b7e[1].png

Reference:https://www.alibabacloud.com/blog/how-to-sync-up-data-from-maxcompute-to-greenplum-with-dataworks_594549?spm=a2c41.12636940.0.0

Discover and read more posts from Alibaba Cloud
get started