|
|
vol7ron |
|
|
Posts: 106
Gender:  Male
Posts Per Day: 0.45
Reputation: 100.00%
Reputation Score: +9 / -0
Time Online: 1 days 58 minutes
Location: Washington DC
Age: 25
|
So as anyone else who browses this forum probably knows, I've been pretty active lately in helping to program the new version of Eblah for databases. Because all my concerns are directed toward the EBlah SQL Beta, I'm always unsure to post here or in the forum dedicated to it. I wish there was a place discussions of the code.
Anyhow, in an effort to migrate the MySQL code to PostgreSQL, I came across a few problems with SQL syntax. While some were database-specific like the format of quotes and the use of schemas, this one in particular was more nested on the fact that MySQL didn't conform to SQL standards.
The issue deals with having an aggregate query such as SELECT (*) FROM table;
EBlah is using something like SELECT (*) FROM table ORDER BY column; This doesn't yield any errors because MySQL picks an arbitrary value for the non-group by attribute. However, in another database this would definitely yield an error because you must use the GROUP if picking out a specific column.
Therefore the syntax would be SELECT (*) FROM table GROUP BY column ORDER BY column;
Note that the two queries are in no way similar. One counts all records in a table, returning one count where as the other returns a count for each value in column that is a like. The solution would be to not add the GROUP BY, but to remove the ORDER BY as it's not necessary in an aggregate function.
As you know, this is a simple thing to understand, but I feel as time goes on this setting in MySQL could change and they could defaultedly conform to SQL standards. An interesting read to turn on this setting to pop an error is http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html
Around mid-top page they show you how to conform to the "standard." -- I quoted standard because their argument is that this setting isn't standard, however it is practiced by pretty much every database aside from MySQL. |
| Increase vol7ron's Rating |
|
|
|
|
|
Justin |
|
|
Posts: 15,075
Gender:  Male
Posts Per Day: 6.52
Reputation: 93.40%
Reputation Score: +297 / -21
Time Online: 36 days 23 hours 27 minutes
Location: Tallassee, AL
Age: 22
|
|
|
|
|
|
vol7ron |
|
|
Posts: 106
Gender:  Male
Posts Per Day: 0.45
Reputation: 100.00%
Reputation Score: +9 / -0
Time Online: 1 days 58 minutes
Location: Washington DC
Age: 25
|
The solution would be to not add the GROUP BY, but to remove the ORDER BY as it's not necessary in an aggregate function.
SELECT COUNT(*) FROM table; would be the best syntax when getting a count of all the posts/threads I know it was a lot to read -- that's why I made it small. There are other caveats in some of the application that hinge on MySQL's user-friendly features. One I've come to is in dealing with adding new categories and boards. Some fields like "Order" and "Posts" are set up with a NOT NULL. When sending a NULL to MySQL, if the field is of type integer, MySQL will put a 0 for you. This is not good, because what if 0 actually meant something in the program? So, I've had to add an IF statement to both sections, stating if the "Order" value was NULL, to replace it with a default value that I set to 0. This way it wasn't MySQL or any other database that would control what that value would be. In other cases where values are explicitly being sent to the driver (ex. $sql->set('parent'); $sql->update($FORM{"parent"}); ) but NOT NULL fields are not sent with them (ex. ORDER was not included with PARENT in the previous example), the database will set a default value for you. So I've also added a default '0' to "Posts" and "Threads" on the BOARDS table. I also added it to "Order" but that wasn't necessary because we are explicitly sending in "Order" when we create the Board/Category. |
| Increase vol7ron's Rating |
|
|
|
|
|
amosmj |
|
E-Blah Member  
Posts: 10
Posts Per Day: 0.04
Time Online: 2 hours 10 minutes
|
An unsorted, grouped query SHOULD yield the same number of rows as a sorted, grouped query. If you group on the Primary Key you SHOULD get an accurate count of records. I'm supposed to start doing some Postgress work in a few weeks but I haven't so far. That said, most packages have a COUNT versus a COUNT UNIQUE statement that should avoid some of these issues as well.
I would consider SELECT COUNT (Primary Key) FROM Table; using * asks the system to do too much work. In mySQL there is no difference, in Postgress there should be. MySQL is shunned by the DB community in much the same way Access(Jet) is. It's not a real database. Yet, both are more accessable and ubiquitous than any single DB system I know of, Go figure. |
|
|
|
|
|
vol7ron |
|
|
Posts: 106
Gender:  Male
Posts Per Day: 0.45
Reputation: 100.00%
Reputation Score: +9 / -0
Time Online: 1 days 58 minutes
Location: Washington DC
Age: 25
|
An unsorted, grouped query SHOULD yield the same number of rows as a sorted, grouped query.
The number of rows should be the same but certain functions that manipulate data on time-series will yield different results, especially when using limits. In this particular case a count is being performed, so ORDER and GROUP is not necessary.
If you group on the Primary Key you SHOULD get an accurate count of records.
This is bad SQL technique. Proper use would be to use the DISTINCT or UNIQUE syntax as you later stated.
I would consider SELECT COUNT (Primary Key) FROM Table; using * asks the system to do too much work.
Not always true. In many cases "*" as opposed to the PK has yielded faster results in certain situations rather than the later. I forget why exactly, but it has to do with how some databases process the * and how it interacts with indexes.
MySQL is shunned by the DB community in much the same way Access(Jet) is. It's not a real database. Yet, both are more accessable and ubiquitous than any single DB system I know of, Go figure.
I totally agree whole-heartedly. MySQL is a little better than Jet, though. Unfortunately, when working with large tables you'll notice the limitations of Access, especially when those tables have over 255 fields. Before anyone says "any table with over 255 fields is probably a poorly designed database, or not in 3NF" I will say that (s)he has never worked with a large database and understood that 3NF is not always the best measure to stand by. But I'll give a firm example that many financial systems can still be in 3NF and carry well over 255 fields. As an update, the PostgreSQL version is almost done.. there are some quirks still needed to be worked out. For example: NULL values being sent when re-saving Boards and there's a Post/Thread count issue. If someone gives me a place to upload, I'll put it here for other people to help out if they wish. |
| Increase vol7ron's Rating |
|
|
|
|
|
amosmj |
|
E-Blah Member  
Posts: 10
Posts Per Day: 0.04
Time Online: 2 hours 10 minutes
|
Not always true. In many cases "*" as opposed to the PK has yielded faster results in certain situations rather than the later. I forget why exactly, but it has to do with how some databases process the * and how it interacts with indexes.
Interesting,I had not heard that. Presumably that varies by vendor then? nice work on the conversion to a DB format. |
|
|
|
|
|
vol7ron |
|
|
Posts: 106
Gender:  Male
Posts Per Day: 0.45
Reputation: 100.00%
Reputation Score: +9 / -0
Time Online: 1 days 58 minutes
Location: Washington DC
Age: 25
|
Yes there was an interesting article I read some time ago that introduced this exact notion. Most of my readings are Oracle and DB2 based, though, but this could really have been any db vendor.
When in doubt sticking to the PK is not a bad thing and as I recall the "faster" was in a pure comparison and probably not anything too noticeable.
My philosophy, though, is to make things as fast as possible to begin with... make structural tweaks later... and then upgrade the hardware when all else fails. |
| Increase vol7ron's Rating |
|
|
|
|
|
|