Indica et Buddhica :: Home
 

Quick link » Repositorium

HTML Monier-Williams Lexicon: Use of H-K and UTF-8 translit.

Last modified: 07/11/2008 03:26 AM
HTML Monier-Williams Sanskrit-English Dictionary: Design considerations when using Harvard-Kyoto and UTF-8 transliteration.

[Ex H-Buddhism mailing list]



From:  Richard MAHONEY <r.mahoney@ICONZ.CO.NZ>
Reply-To:  r.mahoney@iconz.co.nz
To:  H-BUDDHISM@H-NET.MSU.EDU
Subject:  Re: HTML Monier-Williams Sanskrit-English Dict. ... [Harvard-Kyoto / UTF-8 translit.] (Mahoney)
Date:  Fri, 11 Jul 2008 10:06:36 +1200


Dear Readers,

I am forwarding my response to Madhav Deshpande. His question over the
use of Harvard-Kyoto and UTF-8 transliteration in the HTML M-W is often
raised.


Best regards,

 Richard MAHONEY


-----Forwarded Message-----
From: Richard MAHONEY <r.mahoney@ICONZ.CO.NZ>
To: INDOLOGY@liverpool.ac.uk
Subject: Re: HTML Monier-Williams Sanskrit-English Dictionary
 (Version 0.3 - Release Candidate 1)
Date: Thu, 10 Jul 2008 16:50:55 +1200

Dear Madhav,

On Wed, 2008-07-09 at 00:03, Deshpande, Madhav wrote:
> Hi Richard,
>
>     Thanks for making available this version of the MW HTML.  As I
>  downloaded and looked at it, I see that the main word entrees are in
>  Harvard-Kyoto notation while the names of cited texts appear in full
>  diacritics (utf8?).  Is that by design?


Apologies for the delay in answering you. I've been a little bogged
down. The response to my note was a little greater than anticipated ...

Yes, using H-K rather than UTF-8 for the headwords and embedded Skt
terms was intentional. I've uploaded a screen shot of a typical page
viewed with the web browser Opera. The image is attached at the tail
end of the following page (`zams-opera.png'):

 HTML Monier-Williams Sanskrit-English Dictionary (Version 0.3 - RC 1)
 http://indica-et-buddhica.org/sections/news/repositorium/html-monier-williams

The primary and secondary headwords for `zaMs', and all the Skt terms
appearing within the body of the definition, are consistently
transliterated using H-K. They are also dark red for emphasis. I've
done this as most users, myself included, appear to prefer to search
for headwords using H-K. That said, I am also aware that there are a
good number of others who prefer UTF-8 so to suit them I have written
code that should be reasonably easy to modify.

If you look at the HTML source you will see that all Skt terms are
marked up in this manner:

 <span class="s">zaMs</span>

It shouldn't be too difficult to write a script (Perl?) to convert
anything appearing between this class of span into UTF-8. If someone
would like to write such a thing then I would be happy to run it over
the final release so that both a Harvard-Kyoto and UTF-8 version is
available for download.


Kind regards,

 Richard


--
Richard MAHONEY | internet: http://indica-et-buddhica.org/
Littledene      | telephone/telefax (man.): +64 3 312 1699
Bay Road        | cellular: +64 27 482 9986
OXFORD, NZ      | email: r.mahoney@indica-et-buddhica.org
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Indica et Buddhica: Materials for Indology and Buddhology
Scholia: http://scholia.indica-et-buddhica.org/
Tabulae: http://tabulae.indica-et-buddhica.org/

Attached file: zams-opera.png 124.46 Kb
Registration provides free-of-charge access to all materials within Repositorium, Catalogus, and Philologica. Registered users can also submit material on their publications through Scholia.
 
Indica et Buddhica :: Home

Copyright © 2005-2008 Indica et Buddhica. All rights reserved.
Registration | Lost Password? | Terms of Use | Privacy | Copyright Policy | Contact Us
Scholia | Repositorium | Catalogus | Lexica | Philologica | Journals &c. | About

In association with Amazon :: In association with Amazon.co.uk :: Partner von Amazon.de
Accelerated by Joyent   Served from Joyent Accelerator for
 Applications: On-demand Computing   Powered by Nuxeo