Rails + PostgreSQL accent insensitive search 96

For a project we’re currently working on the need for accent insensitive search came up.

I google for a couple of solutions, most fo which required applying a translate() to the records or using to_ascii()  in order to replace accented characters with plain ones. For me this should be a considerable performance hit as the DB had to apply the function to every record before performing the query.

I decided to go with a case insensitive using the operator ‘~*’. Basically, for each character which has accented variations I introduce an expression (ex: a -> (a|à|á|â|ã) resulting in a query like :
select * from my_table where name ~* '(a|à|á|â|ã)ndr(e|é|è|ê)'

This will search for the name “André” in the database.

Since I’m using ruby I created a new method in the String class called accent_insensitive_regexp:

class String

@@ACCENT_INSENSITIVE_REGEXES=[
“(a|á|à|â|ã)”,”(e|é|è|ê)”,”(i|í|ì)”,”(o|ó|ò|ô|õ)”,”(u|ú|ù)”,”(c|ç)”
]

def accent_insensitive_regexp
res=self
@@ACCENT_INSENSITIVE_REGEXES.each {|exp|
res.gsub! Regexp.new(exp), exp
}
res
end

end

Now i can my case and accent insensitive search working by using something like:

User.find :all, :conditions=>[" name ~* '?' ",name.downcase.accent_insensitive_regexp]

Those who’ve tried to use the downcase and upcase methods in Rails will notice that this will not work properly since these methods to not deal with the accented characters.
For dealing with that issue I changed those two methods in the String class to:

alias_method :old_upcase, :upcase
alias_method :old_downcase, :downcase
@@DOWNCASE_ACCENT_CHARS=”çàáèéìíòóùúâêôãõ”
@@UPCASE_ACCENT_CHARS=”ÇÀÁÈÉÌÍÒÓÙÚÂÊÔÃÕ”

def upcase
self.old_upcase.tr(@@DOWNCASE_ACCENT_CHARS,@@UPCASE_ACCENT_CHARS)
end

def downcase
self.old_downcase.tr(@@UPCASE_ACCENT_CHARS,@@DOWNCASE_ACCENT_CHARS)
end

And there you have it. I quick and dirty hack to get this stuff to work properly in RoR. Please notice I haven’t spent much time cleaning the code I just though this could get someone with similar problems started.