Results 1 to 7 of 7

Thread: optimization of mysql database

  1. #1
    Join Date
    Feb 2005
    Posts
    4

    Unhappy optimization of mysql database

    Hi,

    I have a database with about 450,000 entries (not sure if this is comparitively large or not) which contains the names of albums and films and books with prices and date of being published. This is being regularly updated. I have read though the aticles on optimizing the database and have made some improvements to the query time however there is one particular query that is taking extremely long time to execute (or at least seems that way) and I cant see a way to improve the time taken. The query is:

    SELECT name FROM database WHERE name like '%Name1%Name2%' ORDER BY Pub_date DESC LIMIT 5;

    I am using a key on the name and the pub_date fields. It doesnt want to use the name key, I assume becasue of the % at the beginning. I have only resently added the Pub_date key but the results have been mixed with a increase in speed for queries where there is only '%Name1%' and a significant slowdown in queries with '%Name1%Name2%'. The time for executing is about 10-12 seconds. Is this as good as it gets? The key on the name field is not the full lenght of the text but long enough to be unique. I really need to get this running faster. Is there any way in the setup of the query or the server where I could get this working quicker? If not What kind of speed increase is associated with using Linux instead of Windows? I am using mysql 4.1.9 I am running the server on a relatively old system: dual p3 1ghz with 768mb ram with windows 2000. Although it is old I am hoping that I can get this running faster with out upgrading as I am on a tight budget. Thanks for looking at this.

  2. #2
    Join Date
    Feb 2003
    Posts
    1,048
    The problem isn't your OS or your machine, it's your db design (though MySQL does run faster on Linux than on Windows).

    When you say key, do you mean primary key? Does your table have an identity field or some int field that can be used to uniquely identify each entry? If so, that is what your key field should be. Otherwise, any index you put on this table is going to be mega-slow!!!


    You do want the name field and the pub_date fields to be indexed, but not together. Create a separate index on each field.

    I'm assuming that there are duplicate names in the name field because the same book can be published multiple times and book titles aren't neccessarily unique.

  3. #3
    Join Date
    Feb 2005
    Posts
    4
    I have a primary key on an id field which is just a number which auto increments. Then indexes on the date name fields and they are separate indexes and not together.

    Yes there are duplicates in the database but usually the query is limited to 1 or 5 ordered by the dates.

  4. #4
    Join Date
    Feb 2003
    Posts
    1,048
    It is definitely the name search that is slowing everything down. And it's definitely the pattern matching that's causing it to not use the index.

    Does MySQL have something comparable to SQL Server's full text indexing?

    If not, you can do a version of it on your own by creating a single word lookup table.

    tbLU_Name:
    LUID autonumber primary key
    NameID
    SingleWord


    NameID is the id field from your main table and SingleWord would be every word in your main table, one word per record. You would have a separate index on each field.

    Your query would look like:

    SELECT name
    FROM database As D
    WHERE Exists (Select 1 From tbLU_Name Where SingleWord = 'Name1' And LUID = D.IDField)
    And Exists (Select 1 From tbLU_Name Where SingleWord = 'Name2' And LUID = D.IDField)
    ORDER BY Pub_date DESC LIMIT 5;


    And your system seems fine for the task, but if I was going to change anything about it, I would add more RAM!!!

  5. #5
    Join Date
    Feb 2005
    Posts
    4
    I looked into this and yes MySQL does have full text indexing, although I am not sure how similar it is to SQL server.

    I did create a fulltext index on the name field of my database but I removed the stop word list and set the minimum word length to 1.

    Where a search was:

    SELECT * FROM database WHERE modified like '%Name1%' ORDER BY Pub_date DESC LIMIT 1

    It would now be:

    SELECT * FROM database WHERE MATCH(name) AGAINST('Name1') ORDER BY Pub_date DESC LIMIT 1;

    This is much faster but there is the problem that it doesnt do partial matches. For example a search for 'night' would get only things containg 'night' and if I search for 'night,fire' then the order by pub_date option is ignored and it uses the relevancy instead. It is possible to get around these things to a certain extent by using the boolean mode where the relevancy order is ignored and some partial matching can be done so now the query is:

    SELECT * FROM database WHERE MATCH(name) AGAINST('+night* +fire*' IN BOOLEAN MODE) ORDER BY Pub_date DESC LIMIT 1

    which would give all the stuff beginning with night with words beginning with fire in them. This would be fine for most of the searches that are being performed.

    But this is much slower than the other type of search. It is fine if only one search word is used but 2 or more slows down. I am also finding a large slowdown with counts like:

    SELECT count(*) FROM database WHERE name like '%Name1%'

    is there a better way to do this or am i just doing this wrong? Thanks for your help.

  6. #6
    Join Date
    Feb 2003
    Posts
    1,048
    No, you're doing it right. Text searches are just slower than other searches. Still, it shouldn't be that slow.

  7. #7
    Join Date
    Feb 2005
    Posts
    4
    Hi again,

    I have been working on this and it seems like it may be a problem with the size of the database on the harddisk. I had read that it is better to use a fixed database size rather than varchar so I changed this but some queries seemed to benifit a little whilst others slowed massively. I found that when I went back to varchar the situation improved a little. I think the reason is the difference of the size on the harddisks. The fixed was ~350mb whilst the varible was ~60mb. Althought the variable size is faster it is still unacceptable and I am starting to think about maintaining both heap and MyISAM tables of the same data updating both but just using the heap for select queries. I have done some tests and so far it seems to fix all the problems. But before I start changing all the setup I would like to know if I have missed something in the setup of the database variables which may provide a better solution.
    Thanks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •