Dealing with UTF-8 with appengine’s bulk loading
We just uploaded our first app to Google Appengine’s servers.
It’s called YouDIG - You Draw I Guess - and we believe it to be a fun game to play online.
Being a word based game we had to have an easy way to upload new Riddles to the online app. So, I created a very simple bulkload.py which just calls the ImportCSV function from the SDK. The problem was that it doesn’t work with UTF-8 files!
Here’s what I did (I’m still a Python newbie, so feel free to send me your comments/suggestions):
1 - Edited the google/appengine/tools/bulkload_client.py
def ContentGenerator
....
if rows_written > 0:
yield rows_written, unicode(content.getvalue(),'utf-8')
def PostEntities
....
body = urllib.urlencode({
constants.KIND_PARAM: kind,
constants.CSV_PARAM: content.encode("utf-8"),
})
this basically unicodes everything and encodes it as UTF-8 before sending the POST request.
2 - Created my Loaders (similar to the ones described in the docs):
class RiddleLoader(bulkload.Loader):
def HandleEntity(self, entity):
entity['approved']=True
return entity
def __init__(self):
bulkload.Loader.__init__(self, 'Riddle',
[('word', Riddle.lowerCase),('level', str),
('language', Language.get_key_by_code ),
('category', Category.get_key_by_sys_name)])
if __name__ == '__main__':
mybulkload.main(RiddleLoader())
This is a simple loader that extends the bulkload.Loader. The “Language.get_key_by_code” and “Category.get_key_by_sys_name” are static functions that allow me to get the entity based on a string. This way I can import Languages, Categories and Riddles and have the relations set using simple string keys (since I don’t know and entity’s key before it’s saved!).
The main difference from the standard bulkloading is in the main method. The “mybulkload” which is a class extending “BulkLoad” and allows to receive UTF-8 CSV POST data.
3 - The mybulkload package:
def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
yield line.encode('utf-8')
class MyBulkLoad(BulkLoad):
""" A handler for bulk load requests.
"""
def Load(self, kind, data):
Validate(kind, basestring)
Validate(data, basestring)
output = []
try:
loader = Loader.RegisteredLoaders()[kind]
except KeyError:
output.append('Error: no Loader defined for kind %s.' % kind)
return (httplib.BAD_REQUEST, ''.join(output))
buffer = StringIO.StringIO(data)
reader = csv.reader(utf_8_encoder(buffer), skipinitialspace=True)
entities = []
line_num = 1
for row in reader:
try:
output.append('\nLoading from line %d...' % line_num)
entities.extend(loader.CreateEntity([unicode(cell,'utf-8') for cell in row]))
output.append('done.')
except:
exc_info = sys.exc_info()
stacktrace = traceback.format_exception(*exc_info)
output.append('error:\n%s' % stacktrace)
return (httplib.BAD_REQUEST, ''.join(output))
line_num += 1
for entity in entities:
datastore.Put(entity)
return (httplib.OK, ''.join(output))
def main(*loaders):
"""Starts bulk upload.
Raises TypeError if not, at least one Loader instance is given.
Args:
loaders: One or more Loader instance.
"""
if not loaders:
raise TypeError('Expected at least one argument.')
for loader in loaders:
if not isinstance(loader, Loader):
raise TypeError('Expected a Loader instance; received %r' % loader)
application = webapp.WSGIApplication([('.*', MyBulkLoad)])
wsgiref.handlers.CGIHandler().run(application)
if __name__ == '__main__':
main()
i just copied this stuff from the __init.py__ in google/appengine/ext/bulkloa, added the utf8_encoder function and extended the Bulkload class overloading the Load method.
Here what I used:
reader = csv.reader(utf_8_encoder(buffer), skipinitialspace=True)
to encode the stuff send to the CSV reader and:
entities.extend(loader.CreateEntity([unicode(cell,'utf-8') for cell in row]))
to unicode everything before creating the entities.
Perhaps there was an easier way but this is working for me so I hope this can help some of you.
Next I’ll write an entity eraser to bulk delete entities from the AppEngine’s production servers…



















































Nice tutorial but it didn’t work for me for some reason.
i am also Newbi in python and need to bulk load UTF-8
but i get the following error when trying to put utf-8 chars
[’Traceback (most recent call last):\n’, ‘ File “C:\\Program Files\\Google\\goo
gle_appengine\\google\\appengine\\ext\\bulkload\\__init__.py”, line 412, in Load
\n entities.extend(loader.CreateEntity([unicode(cell,\’utf-8\’) for cell in r
ow]))\n’, ‘ File “C:\\Program Files\\Google\\google_appengine\\google\\appengin
e\\ext\\bulkload\\__init__.py”, line 228, in CreateEntity\n entity[name] = co
nverter(val)\n’, ‘ File “C:\\Python25\\lib\\encodings\\cp1255.py”, line 12, in
encode\n return codecs.charmap_encode(input,errors,encoding_table)\n’, “Unico
deEncodeError: ‘charmap’ codec can’t encode character u’\\ufeff’ in position 0:
character maps to \n”]
ERROR 2008-08-29 10:18:30,977 bulkload_client.py] Import failed
hi
1t3593s56he2039l
good luck
I am a newbie on Python. Trying to use your method. But not sure where to put the file created in step 3. getting “name ‘BulkLoad’ is not defined”
and thanks for your effort of posting this.
Page 23 walmart pharmacy
I just discovered the website who reviews about
Several
home business ideas
If you want to know more here it is
home based business
www.home-businessreviews.com
All successful people men and women are big dreamers. They imagine what their future could be, ideal in every respect, and then they work every day toward their distant vision, that goal or purpose.
react. buy cialis “Breath in body . - . Right. I’ll get blanket. Get you blanket.”
tired of comments like “What is your favourite season? ” or buy antibiotics online. Then write to me at icq 75949683256…
There is no victory at bargain basement prices.
Try for a goal that’s reasonable, then gradually raise it.
I truly believe that we have reached the point where technology has become one with our society, and I think it is safe to say that we have passed the point of no return in our relationship with technology.I don’t mean this in a bad way, of course! Ethical concerns aside… I just hope that as the price of memory drops, the possibility of transferring our memories onto a digital medium becomes a true reality. It’s one of the things I really wish I could see in my lifetime.(Posted from NetSurf for R4i Nintendo DS.)
Fuck Me…are those real?
buzz.
zanaflex
naval officer, that he would drink a bottle of rum sitting on the
singulair
chain the beginning of which is hidden in heaven,” said Pierre.
zovirax
fictitious.
metformin and pcos
likes her way of reading. She reads to him in the evenings and reads
nizoral
(Boris understood that Arakcheev envied Balashev and was displeased
elavil
he said to one of his adjutants, and then turned to the Duke of
prilosec
Resources like the one you mentioned here will be very useful to me! I will post a link to this page on my blog. I am sure my visitors will find that very useful.
time heat academies without…
include southern cycles million scenario…
The response to national disaster is awesome but it’s a damn shame that so many people take advantage of the negative situations.
I mean everytime there is an earthquake, a flood, an oil spill - there’s always a group of heartless people who rip off tax payers.
This is in response to reading that 4 of Oprah Winfreys “angels” got busted ripping off the system. Shame on them!
http://www.cbsnews.com/blogs/2009/08/19/crimesider/entry5251471.shtml
aHKgdR
non societies program seasonal…
national air 2004 allowing pre recent forward…
attributable scientists developer near hypothesis mitigation ppm
required impact article added
ndo2tA Excellent article, I will take note. Many thanks for the story!