MySQL – merging external data

[adsense id=”0514458240″ width=”468″ height=”60″]

In larger organizations where data is stored in many different databases, there is always a need to exchange data between databases. For instance, an orders databases may need to transfer data to a warehouse database. In most cases the exchange of data is not a simple case of inserting into the secondary database. For instance, we may need to verify the existence of a record in the secondary data and perform an update rather than an insert. Deleting and inserting the new records is a possibility, but if auto_increment values are being used as keys or the secondary database contains more information then we run the risk of corrupting the data integrity of the secondary database or even losing data.

This post provides explains the power of MySQL stored procedures to solve these types of problems.

Imagine we have a set of orders data containing the order# and product# as primary key, we also include the number of items to be shipped. This information needs to be made available in the secondary (warehouse) database. The warehouse database also includes shipping status. The data has been loaded into a temporary table and is ready to be merged into the warehouse database.

CREATE TABLE tmporders (orderid int, productid int, items int);
CREATE TABLE shiporders (orderid int, productid int, items int, status int);
CREATE TABLE logtable(id int auto_increment, description varchar(255), primary key (id));

The status in the ship orders table is either 0 – unshipped or 1 – shipped. If an order has already shipped we cannot update the record so we will log an error in the logtable.

DELIMITER $$
CREATE PROCEDURE ordupdate()
BEGIN
  DECLARE ordid int;
  DECLARE prodid int;
  DECLARE item int;
  DECLARE count int;
  DECLARE shipitems int;
  DECLARE shipstatus int;
  DECLARE done int DEFAULT 0;
  DECLARE cur CURSOR FOR SELECT orderid, productid, items FROM tmporders;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
  OPEN cur;
  REPEAT
    FETCH cur INTO ordid, prodid, item;
    -- Now we need to check if record exists in ship orders table
    SELECT COUNT(1) INTO count FROM shiporders WHERE orderid = ordid AND productid = prodid;
    IF count = 0 THEN
      -- We do not have any records so we insert
      INSERT INTO shiporders(orderid, productid, items, status) VALUES (ordid, prodid, item, 0);
    ELSE
      SELECT items, status INTO shipitems, shipstatus FROM shiporders WHERE orderid = ordid AND productid = prodid;
      -- If the number of items are the same then we do not perform any operation
      IF shipitems <> item THEN
        IF shipstatus = 0 THEN
          UPDATE shiporders SET items = item WHERE orderid = ordid AND productid = prodid;
        ELSE
          -- We have a problem the order has already shipped so we log a message
          INSERT INTO logtable (description) VALUES (CONCAT('Order ID: ', 
              ordid, ' and Product ID: ', prodid, 
              ' attempted to update number of items but status is already shipped'));
        END IF;
      END IF;
    END IF;
  UNTIL done END REPEAT;
  CLOSE cur;
END$$
DELIMITER ;

Here is our tests and their respective results.

mysql> select * from tmporders;
+---------+-----------+-------+
| orderid | productid | items |
+---------+-----------+-------+
|       1 |         1 |     5 |
|       1 |         2 |     5 |
|       1 |         3 |     2 |
|       2 |         2 |     6 |
|       2 |         3 |     3 |
|       4 |         1 |     4 |
|       6 |         5 |    10 |
+---------+-----------+-------+
7 rows in set (0.00 sec)
mysql> select * from shiporders;
+---------+-----------+-------+--------+
| orderid | productid | items | status |
+---------+-----------+-------+--------+
|       1 |         1 |     3 |      0 |
|       2 |         2 |     6 |      1 |
|       2 |         3 |     6 |      1 |
+---------+-----------+-------+--------+
3 rows in set (0.00 sec)
mysql> select * from logtable;
Empty set (0.00 sec)
mysql> call ordupdate;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from shiporders;
+---------+-----------+-------+--------+
| orderid | productid | items | status |
+---------+-----------+-------+--------+
|       1 |         1 |     5 |      0 |
|       2 |         2 |     6 |      1 |
|       2 |         3 |     6 |      1 |
|       1 |         2 |     5 |      0 |
|       1 |         3 |     2 |      0 |
|       4 |         1 |     4 |      0 |
|       6 |         5 |    10 |      0 |
+---------+-----------+-------+--------+
7 rows in set (0.00 sec)
mysql> select * from logtable;
+----+-------------------------------------------------------------------------------------------------+
| id | description                                                                                     |
+----+-------------------------------------------------------------------------------------------------+
|  1 | Order ID: 2 and Product ID: 3 attempted to update number of items but status is already shipped |
+----+-------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Depending on the type of table that is being used a transaction may wrap the stored procedure to make sure that in the event of a failure we can rerun the entire code. Otherwise it might be best on each iteration of the loop to remove the item from the temporary table which will allow us to rerun for the remaining items.