Welcome to the E-Blah Community!
We would like to welcome you to our community and invite you to register an account or login.
Being a registered member is important, as it gives you several advantages over the normal Guest status. After registering you will be able to download files and images, post messages, and access member-only portions of the forum - just to name a few. Registration is quick and simple, and only takes about a minute of your time.

E-Blah Community    Technical Support    Forum Support  ›  MySQL Possible Future Issue
Users Browsing Forum
Yahoo! Bot and 1 Guests

MySQL Possible Future Issue  This thread currently has 175 views. Print
1 Pages 1 Recommend Thread
vol7ron
April 17, 2008, 8:17am Report to Moderator Report to Moderator
E-Blah Member
Posts: 106
Gender: Male
Posts Per Day: 0.45
Reputation: 100.00%
Reputation Score: +9 / -0
Time Online: 1 days 58 minutes
Location: Washington DC
Age: 25
So as anyone else who browses this forum probably knows, I've been pretty active lately in helping to program the new version of Eblah for databases.  Because all my concerns are directed toward the EBlah SQL Beta, I'm always unsure to post here or in the forum dedicated to it.  I wish there was a place discussions of the code.

Anyhow, in an effort to migrate the MySQL code to PostgreSQL, I came across a few problems with SQL syntax.  While some were database-specific like the format of quotes and the use of schemas, this one in particular was more nested on the fact that MySQL didn't conform to SQL standards.

The issue deals with having an aggregate query such as SELECT (*) FROM table;

EBlah is using something like SELECT (*) FROM table ORDER BY column;
This doesn't yield any errors because MySQL picks an arbitrary value for the non-group by attribute.  However, in another database this would definitely yield an error because you must use the GROUP if picking out a specific column.

Therefore the syntax would be SELECT (*) FROM table GROUP BY column ORDER BY column;

Note that the two queries are in no way similar.  One counts all records in a table, returning one count where as the other returns a count for each value in column that is a like.  The solution would be to not add the GROUP BY, but to remove the ORDER BY as it's not necessary in an aggregate function.

As you know, this is a simple thing to understand, but I feel as time goes on this setting in MySQL could change and they could defaultedly conform to SQL standards.  An interesting read to turn on this setting to pop an error is http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html

Around mid-top page they show you how to conform to the "standard."  -- I quoted standard because their argument is that this setting isn't standard, however it is practiced by pretty much every database aside from MySQL.



Increase vol7ron's Rating
Logged Offline
Private Message Private message
Justin
April 17, 2008, 1:02pm Report to Moderator Report to Moderator

The E-Blah Developer
E-Blah Programmer
Posts: 15,075
Gender: Male
Posts Per Day: 6.52
Reputation: 93.40%
Reputation Score: +297 / -21
Time Online: 36 days 23 hours 27 minutes
Location: Tallassee, AL
Age: 22
So it should be:

SELECT (*) FROM table GROUP BY column;

?


I do installs for $25 and upgrades for $20.
Technical support is always free.

  Donate to E-Blah!  

My Websites: Revolution Reality (My Blog)  |  MinistryTalk.com  |  Portfolio

"But you, O Lord, are a compassionate and gracious God, slow to anger, abounding in love and faithfulness." — Psalm 86:15 NIV

Logged Offline
Site Site Private Message Private message Reply: 1 - 6
vol7ron
April 17, 2008, 1:17pm Report to Moderator Report to Moderator
E-Blah Member
Posts: 106
Gender: Male
Posts Per Day: 0.45
Reputation: 100.00%
Reputation Score: +9 / -0
Time Online: 1 days 58 minutes
Location: Washington DC
Age: 25
Quoted from vol7ron
The solution would be to not add the GROUP BY, but to remove the ORDER BY as it's not necessary in an aggregate function.


SELECT COUNT(*) FROM table;     would be the best syntax when getting a count of all the posts/threads
I know it was a lot to read -- that's why I made it small.


There are other caveats in some of the application that hinge on MySQL's user-friendly features.  One I've come to is in dealing with adding new categories and boards.

Some fields like "Order" and "Posts" are set up with a NOT NULL.  When sending a NULL to MySQL, if the field is of type integer, MySQL will put a 0 for you.  This is not good, because what if 0 actually meant something in the program?

So, I've had to add an IF statement to both sections, stating if the "Order" value was NULL, to replace it with a default value that I set to 0.  This way it wasn't MySQL or any other database that would control what that value would be.

In other cases where values are explicitly being sent to the driver (ex.  $sql->set('parent'); $sql->update($FORM{"parent"});  ) but NOT NULL fields are not sent with them (ex. ORDER was not included with PARENT in the previous example), the database will set a default value for you.  So I've also added a default '0' to "Posts" and "Threads" on the BOARDS table.  I also added it to "Order" but that wasn't necessary because we are explicitly sending in "Order" when we create the Board/Category.



Increase vol7ron's Rating
Logged Offline
Private Message Private message Reply: 2 - 6
amosmj
April 21, 2008, 10:20am Report to Moderator Report to Moderator
E-Blah Member
Posts: 10
Posts Per Day: 0.04
Time Online: 2 hours 10 minutes
An unsorted, grouped query SHOULD yield the same number of rows as a sorted, grouped query.
If you group on the Primary Key you SHOULD get an accurate count of records.
I'm supposed to start doing some Postgress work in a few weeks but I haven't so far. That said, most packages have a COUNT versus a COUNT UNIQUE statement that should avoid some of these issues as well.

I would consider SELECT COUNT (Primary Key) FROM Table;
using * asks the system to do too much work. In mySQL there is no difference, in Postgress  there should be. MySQL is shunned by the DB community in much the same way Access(Jet) is. It's not a real database. Yet, both are more accessable and ubiquitous than any single DB system I know of, Go figure.
Logged Offline
Private Message Private message Reply: 3 - 6
vol7ron
April 21, 2008, 5:20pm Report to Moderator Report to Moderator
E-Blah Member
Posts: 106
Gender: Male
Posts Per Day: 0.45
Reputation: 100.00%
Reputation Score: +9 / -0
Time Online: 1 days 58 minutes
Location: Washington DC
Age: 25
Quoted from amosmj
An unsorted, grouped query SHOULD yield the same number of rows as a sorted, grouped query.

The number of rows should be the same but certain functions that manipulate data on time-series will yield different results, especially when using limits.  In this particular case a count is being performed, so ORDER and GROUP is not necessary.


Quoted from amosmj
If you group on the Primary Key you SHOULD get an accurate count of records.

This is bad SQL technique.  Proper use would be to use the DISTINCT or UNIQUE syntax as you later stated.

Quoted from amosmj
I would consider SELECT COUNT (Primary Key) FROM Table;
using * asks the system to do too much work.

Not always true.  In many cases "*" as opposed to the PK has yielded faster results in certain situations rather than the later.  I forget why exactly, but it has to do with how some databases process the * and how it interacts with indexes.

Quoted from amosmj
MySQL is shunned by the DB community in much the same way Access(Jet) is. It's not a real database. Yet, both are more accessable and ubiquitous than any single DB system I know of, Go figure.

I totally agree whole-heartedly.  MySQL is a little better than Jet, though.  Unfortunately, when working with large tables you'll notice the limitations of Access, especially when those tables have over 255 fields.  Before anyone says "any table with over 255 fields is probably a poorly designed database, or not in 3NF" I will say that (s)he has never worked with a large database and understood that 3NF is not always the best measure to stand by.  But I'll give a firm example that many financial systems can still be in 3NF and carry well over 255 fields.

As an update, the PostgreSQL version is almost done.. there are some quirks still needed to be worked out.  For example: NULL values being sent when re-saving Boards and there's a Post/Thread count issue.  If someone gives me a place to upload, I'll put it here for other people to help out if they wish.



Increase vol7ron's Rating
Logged Offline
Private Message Private message Reply: 4 - 6
amosmj
April 22, 2008, 8:17am Report to Moderator Report to Moderator
E-Blah Member
Posts: 10
Posts Per Day: 0.04
Time Online: 2 hours 10 minutes
Quoted from vol7ron


Not always true.  In many cases "*" as opposed to the PK has yielded faster results in certain situations rather than the later.  I forget why exactly, but it has to do with how some databases process the * and how it interacts with indexes.



Interesting,I had not heard that. Presumably that varies by vendor then?

nice work on the conversion to a DB format.
Logged Offline
Private Message Private message Reply: 5 - 6
vol7ron
April 22, 2008, 1:43pm Report to Moderator Report to Moderator
E-Blah Member
Posts: 106
Gender: Male
Posts Per Day: 0.45
Reputation: 100.00%
Reputation Score: +9 / -0
Time Online: 1 days 58 minutes
Location: Washington DC
Age: 25
Yes there was an interesting article I read some time ago that introduced this exact notion.  Most of my readings are Oracle and DB2 based, though, but this could really have been any db vendor.

When in doubt sticking to the PK is not a bad thing and as I recall the "faster" was in a pure comparison and probably not anything too noticeable.

My philosophy, though, is to make things as fast as possible to begin with... make structural tweaks later... and then upgrade the hardware when all else fails.



Increase vol7ron's Rating
Logged Offline
Private Message Private message Reply: 6 - 6
1 Pages 1 Recommend Thread
Print

E-Blah Community    Technical Support    Forum Support  ›  MySQL Possible Future Issue