Re:Re: auto-inc column as secondary field to compose a primary key in mysql cluster

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re:Re: auto-inc column as secondary field to compose a primary key in mysql cluster

JohnXhark
Hi Mauritz,
Thanks for your answering.
I found this from the doc:
"...an InnoDB table with an AUTO_INCREMENT column requires at least one key where the auto-increment column is the only or leftmost column."

So my index definition will not work for innoDB, and it will not work as expected for MyISAM as you just said. it seems this is a mess. But, specifically, will it work as expected for NDB engine?

The motivation for my design is this:
we have a use case, heavily query of the pattern "select top (n) * from tbl where cityid = some_city and id > id1 and ... order by id", now assuming ndb is used as the engine, if pk is (cityid, id), then only a direct fetching is needed, however, if using schema like you mentioned: pk = id, index1 = (cityid, id), then a nested-loop query will be needed, and lots of network round-trip.

Strectly load balance is ideal, but not the top most concern. Also, I choose the wording "shardingkey" just for convenience, it's just a place holder for the left-most colomn here, so I can still leave the partitioning to mysql cluster.
JX



Sender:Mauritz Sundell <[hidden email]>
Hi John
On 2016-11-16 05:37, JohnXhark wrote:
> I would like to define a primary key like this: [shardingkey, id(auto-inc)], is this possible? Here the id is a secondary field in primary key, but not the primary key by itself.
>
> However, it's not clear if this is legitimate, the document at mysql.com says:
> AUTO_INCREMENT columns. Like other MySQL storage engines, the NDB storage engine can handle a maximum of one AUTO_INCREMENT column per table. However, in the case of a Cluster table with no explicit primary key, an AUTO_INCREMENT column is automatically defined and used as a “hidden” primary key. For this reason, you cannot define a table that has an explicit AUTO_INCREMENT column unless that column is also declared using thePRIMARY KEY option. Attempting to create a table with an AUTO_INCREMENT column that is not the table's primary key, and using the NDB storage engine, fails with an error.
NDB allows one auto_increment column, and it must be part of an unique
key or primary key.
So, yes, you can have a primary key like you mention.
But, it might not behave as you expect, depending of what you expect:)
For example will id be unique with itself, not as for example with
MyISAM, there id will be unique only for a specific sharding key.
Also I'm curious of how you use the sharding key, and why you need it.
Note, that having a non unique partition key like your sharding key,
puts the responsibility for keeping data balanced across partitions and
data nodes will be on the application and not on NDB.
If you have not a proven benefit by using a sharding key, my guess is
that the default partitioning on a primary key and an index on the
sharding key (now no longer a sharding key) might work better.
Best way is probably to test and measure what is best for your needs.
Regards,
Mauritz
>
>
> Thanks,JX
--
MySQL Cluster Mailing List
For list archives: http://lists.mysql.com/cluster
To unsubscribe: <a href="http://lists.mysql.com/clusterBKHB">http://lists.mysql.com/clusterBKHB^TS\\XZ[[\B܈\\]\Έ\˛^\[ K\\B[XܚXN\˛^\[ K\\B
--
MySQL Cluster Mailing List
For list archives: http://lists.mysql.com/cluster
To unsubscribe:    http://lists.mysql.com/cluster
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re:Re: auto-inc column as secondary field to compose a primary key in mysql cluster

Jesper Wisborg Krogh
Hi John,

On 17/11/2016 16:40, JohnXhark wrote:
> Hi Mauritz,
> Thanks for your answering.
> I found this from the doc:
> "...an InnoDB table with an AUTO_INCREMENT column requires at least one key where the auto-increment column is the only or leftmost column."
>
> So my index definition will not work for innoDB, and it will not work as expected for MyISAM as you just said. it seems this is a mess.

Actually I've seen requests for InnoDB to behave like MyISAM, so guess
whether MyISAM behaves as expected depends on the use case. Anyway
that's a side question.

> But, specifically, will it work as expected for NDB engine?

Yes you can do that for NDB:

    mysql> CREATE TABLE sales (
              customer_id int unsigned NOT NULL,
              sales_id int unsigned NOT NULL auto_increment,
              description varchar(40) NOT NULL, value decimal(6,2) NOT NULL,
              PRIMARY KEY (customer_id, sales_id)
            ) ENGINE=ndbcluster PARTITION BY KEY (customer_id) PARTITIONS 8;
    Query OK, 0 rows affected (0.36 sec)

    mysql> INSERT INTO sales (customer_id, description, value) VALUES
    (1, 'towel', 2.55);
    Query OK, 1 row affected (0.02 sec)

    mysql> INSERT INTO sales (customer_id, description, value) VALUES
    (1, 'pencil', 1.43);
    Query OK, 1 row affected (0.00 sec)

    mysql> INSERT INTO sales (customer_id, description, value) VALUES
    (2, 'laptop', 1299.99);
    Query OK, 1 row affected (0.01 sec)

    mysql> INSERT INTO sales (customer_id, description, value) VALUES
    (1, 'knife', 5.99);
    Query OK, 1 row affected (0.00 sec)

    mysql> SELECT * FROM sales ORDER BY customer_id, sales_id;
    +-------------+----------+-------------+---------+
    | customer_id | sales_id | description | value   |
    +-------------+----------+-------------+---------+
    |           1 |        1 | towel       |    2.55 |
    |           1 |        2 | pencil      |    1.43 |
    |           1 |        4 | knife       |    5.99 |
    |           2 |        3 | laptop      | 1299.99 |
    +-------------+----------+-------------+---------+
    4 rows in set (0.01 sec)

Note however there is no requirement from a Cluster perspective for the
partition key to be the first column in the Primary Key. So the
following table would work the same:

    CREATE TABLE sales (
       sales_id int unsigned NOT NULL auto_increment,
       customer_id int unsigned NOT NULL,
       description varchar(40) NOT NULL,
       value decimal(6,2) NOT NULL,
       PRIMARY KEY (sales_id, customer_id)
    ) ENGINE=ndbcluster PARTITION BY KEY (customer_id) PARTITIONS 8;

Does that help?

Best regards,
Jesper Krogh
MySQL Support
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re:Re: auto-inc column as secondary field to compose a primary key in mysql cluster

Mauritz Sundell
In reply to this post by JohnXhark


On 2016-11-17 06:40, JohnXhark wrote:
> Hi Mauritz,
> Thanks for your answering.
> I found this from the doc:
> "...an InnoDB table with an AUTO_INCREMENT column requires at least one key where the auto-increment column is the only or leftmost column."
>
> So my index definition will not work for innoDB, and it will not work as expected for MyISAM as you just said. it seems this is a mess. But, specifically, will it work as expected for NDB engine?
Yes, in the way that in NDB the auto-increment column will be generated
uniquely in itself.

But I'm no fan of auto_increment, it was invented long time ago in a
non-transactional single-threaded environment.
Many properties that is natural to assume in that environment are not a
good match for a transaction, multi-treaded, distributed environment :(
I would recommend to let application assign the id if possible, that
will be much clearer how it works and will also be more portable.

For examples both innodb and ndb support transactions that not myisam do.
* roll back a transaction will create "holes" in auto_increment sequence.
* concurrent transactions can make a row with higher id to be visible
before a row with lower id is visible.  Say two transactions T1, and T2,
there T1 starts first and allocates id, but T2 commits first.
That is if one sees two rows with ids 56 and 58 one can not know if 57
will never show up or will show up soon even if one knows no deletes
have occurred.

Also AUTO_INCREMENT is a serialization point, if one have many
concurrent transactions inserting rows, the transactions will be
serialized when acquire the next auto_increment value.
This is also true for both innodb and ndb, but is even worse for NDB
since reserving next auto_increment value will cause an extra network
round-trip to ndb cluster.

If insert performance is too low or slow, one can configure mysql server
to reserve several auto_increment values at a time from NDB cluster by
setting ndb_autoincrement_prefetch_sz which is defaulted to 1 to some
higher value.  But be aware that this will increase the likelihood for
big holes in id-sequence, and also that low id can show up after much
longer time since if you have 10 mysqld servers reserving say 100 id
values at a time, when server#1 will start inserting with say id=3400
and server#10 will start inserting new rows with id=4300.

I do not know if the above is relevant for you or not.

>
> The motivation for my design is this:
> we have a use case, heavily query of the pattern "select top (n) * from tbl where cityid = some_city and id > id1 and ... order by id", now assuming ndb is used as the engine, if pk is (cityid, id), then only a direct fetching is needed, however, if using schema like you mentioned: pk = id, index1 = (cityid, id), then a nested-loop query will be needed, and lots of network round-trip.
With NDB, how the server calls storage engine is not a direct
translation how the ndbcluster storage engine talks to data cluster on
network.
Partition pruning and statements and scan results are normally batched,
and simple conditions on scans are pushed down to data nodes for a first
filtering.
Reasoning valid for storage engines doing all work within the server
like Innodb, are not always valid for NDB that distribute works among  
many data nodes and also need to respect actions from other mysql
servers too.

Partition on cityid may or may not be the best for you.
It will be the best for throughput (if load is well balanced) since only
one partition on one data node will be contacted, resulting in less
control messages to be sent, especially for small result sets.
For big result set it would probably be better to not partition on
cityid, since when the scan will be executed in parallel on all data
nodes with all theirs LDM threads at same time resulting in faster
response time.  Price is more bytes on network due to more control
message, but essential no more network round trips in serial.

I'm not sure what select top(n) mean, but if you intend to get exactly
one row with the least id bigger than id1 in some_city as with order by
id limit 1, it is probably wise to partition on city_id.

Best thing is to test and measure with target load.


Also if you ever access rows by id without city_id, you should probably
also have add unique key (id).

Mauritz

> Strectly load balance is ideal, but not the top most concern. Also, I choose the wording "shardingkey" just for convenience, it's just a place holder for the left-most colomn here, so I can still leave the partitioning to mysql cluster.
> JX
>
>
>
> Sender:Mauritz Sundell <[hidden email]>
> Hi John
> On 2016-11-16 05:37, JohnXhark wrote:
>> I would like to define a primary key like this: [shardingkey, id(auto-inc)], is this possible? Here the id is a secondary field in primary key, but not the primary key by itself.
>>
>> However, it's not clear if this is legitimate, the document at mysql.com says:
>> AUTO_INCREMENT columns. Like other MySQL storage engines, the NDB storage engine can handle a maximum of one AUTO_INCREMENT column per table. However, in the case of a Cluster table with no explicit primary key, an AUTO_INCREMENT column is automatically defined and used as a “hidden” primary key. For this reason, you cannot define a table that has an explicit AUTO_INCREMENT column unless that column is also declared using thePRIMARY KEY option. Attempting to create a table with an AUTO_INCREMENT column that is not the table's primary key, and using the NDB storage engine, fails with an error.
> NDB allows one auto_increment column, and it must be part of an unique
> key or primary key.
> So, yes, you can have a primary key like you mention.
> But, it might not behave as you expect, depending of what you expect:)
> For example will id be unique with itself, not as for example with
> MyISAM, there id will be unique only for a specific sharding key.
> Also I'm curious of how you use the sharding key, and why you need it.
> Note, that having a non unique partition key like your sharding key,
> puts the responsibility for keeping data balanced across partitions and
> data nodes will be on the application and not on NDB.
> If you have not a proven benefit by using a sharding key, my guess is
> that the default partitioning on a primary key and an index on the
> sharding key (now no longer a sharding key) might work better.
> Best way is probably to test and measure what is best for your needs.
> Regards,
> Mauritz
>>
>> Thanks,JX


--
MySQL Cluster Mailing List
For list archives: http://lists.mysql.com/cluster
To unsubscribe:    http://lists.mysql.com/cluster

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re:Re: auto-inc column as secondary field to compose a primary key in mysql cluster

Mauritz Sundell
In reply to this post by Jesper Wisborg Krogh


On 2016-11-17 09:12, Jesper Wisborg Krogh wrote:
> Hi John,
...
>    mysql> CREATE TABLE sales (
>              customer_id int unsigned NOT NULL,
>              sales_id int unsigned NOT NULL auto_increment,
>              description varchar(40) NOT NULL, value decimal(6,2) NOT
> NULL,
>              PRIMARY KEY (customer_id, sales_id)
>            ) ENGINE=ndbcluster PARTITION BY KEY (customer_id)
> PARTITIONS 8;
Just a small comment on the example.
Use ... )ENGINE=ndbcluster PARTITION BY KEY (customer_id);
That is, do not use explicit partition count: PARTITIONS 8.

Trust NDB to deduce the best number of partitions depending on number of
nodes and threads, this will also ease future changes to NDB cluster.

Mauritz


--
MySQL Cluster Mailing List
For list archives: http://lists.mysql.com/cluster
To unsubscribe:    http://lists.mysql.com/cluster

Loading...