Philologica - Search Term Help
Last modified:
02/07/2008 10:13 PM
Philologica Indica et Buddhica - Search Term Help.
To search texts with Philologica one must have signed up for free as a registered user. Philologica Indica et Buddhica - Search Term Help [*]Search terms are entered into the Terms(s) input field in either the basic or advanced search form. Searches are case insensitive, for e.g., SPROS PA is equivalent to spros pa. One must, however, enter the correct utf-8 accented characters or diacritics, for e.g., please enter prapańca, not prapanca. Alongside or below the main search form is a key listing the most often used diacritics for Romanised Sanskrit. These characters may be cut and pasted into the search input field. Search terms may also be entered using wildcard characters to match patterns, for e.g., rūp.* to retrieve rūpam, rūpeṇāpi and so on. Contents
- Diacritics and Special Characters -Philologica digital texts are encoded in utf-8. All search terms must therefore use utf-8 characters in all the search input fields, including the Terms(s) and the Bibliographic fields (i.e., the Title and Author fields &c.). Philologica provides three ways to input utf-8:
-- Diacritics for Romanised Sanskrit (only for Term(s) input field) --
-- Special Characters --
- Wildcard Characters and Boolean Operators -One can include wildcard characters in search terms. This enables one to search for terms that match a pattern. Wildcard characters are available for both full text and bibliographic searching. -- Full Text Searching --Philologica supports wildcard characters and Boolean (logical) operators, which are modeled on UNIX regular expressions to perform ``pattern matching'' in full text searching. Pattern matching allows identification of a large number of words corresponding to a defined pattern. Wildcard characters can be useful, for example, in identifying cognates made obscure by affixes and vowel weakening, inconsistencies due to irregular orthography, and variations on account of word inflection as well as for discovering potential emendations for uncertain readings. The most commonly used regular expression operators (wildcard and Boolean) are listed below. --- Wildcard Characters ---
N.B., for a full list of words matching a wildcard search term, go to the advanced search interface, enter the wildcard search term, select the Refined Search Results tab, select the Frequency by Title radio button, and then press `Search'. --- Boolean (Logical) Operators ---
N.B., wildcard characters and boolean operators can be combined within the same search: e.g., bahurūp.* | rūpa.? uttamam. -- Searching for Titles and Authors &c. (Bibliographic Searching) --When searching for titles and authors and so on (bibliographic searching) one needs only limited support for wildcard characters and Boolean operators. In general, one should only need to enter an uncommon term from the title or author's name. Please note that only the Boolean operator OR (|) can be used, not AND (space); that the wildcard operator (.*) is unnecessary; and that a title or author's name that contains diacritics must be entered with utf-8 characters, not postfix modifiers. - Punctuation Marks and Searching -In general, it is advisable to avoid using punctuation marks when engaged in full text or bibliographic searching. -- Full Text Searching --Punctuation marks must be avoided when full text searching. Many of the symbols often used for punctuation are used by Philologica for postscript modifiers or wildcard characters. The punctuation that must be eschewed includes: the comma (,), question mark (?), exclamation mark (!), vertical bar (|), forward (/) and backward (\) slashes, parentheses (( )), braces({ }), brackets([ ]), angle brackets (< >), colons (:), and semi-colons (;) as well as quotation marks (` ' "), ampersands (&), asterisk (*), percentage sign (%), dollar sign ($), and number sign (#). Some punctuation marks are especially problematic and deserve further comment: --- Apostrophe ---The Tibetan 'a chuṅ is represented by an apostrophe ('). When searching for a term including an apostrophe one should substitute a wildcard character: e.g., search for tha snyad .*dogs pa or tha snyad .?dogs pa rather than tha snyad 'dogs pa. --- Hyphen ---Some texts use hyphens to separate words within compounds. When searching for words within a hyphenated compound one should omit the hyphen: e.g., search for apāya hetu rather than apāya-hetu. --- Period ---The period (.) is not searchable. It serves as a wildcard character. N.B., as few digital texts are tagged for sentence termination, PhiloLogic relies on punctuation marks in combination with capitalisation to identify sentence termination. This is especially problematic for Indological and Buddhological texts. -- Searching for Titles and Authors &c. (Bibliographic Searching) --The following punctuation marks produce a ``No documents found matching specified bibliographic criteria'' message when used in bibliographic search input fields: parentheses (( )), semi-colons (;), colons (:), ampersands (&), apostrophes ('), quotation marks (` ' "), braces ({ }), brackets ([ ]), and angle brackets (< >), forward slash (/), as well as the dollar sign ($). The following punctuation marks have no adverse effect on a bibliographical search and, if appearing within a string, must be entered: period (.), hyphen (-), question mark (?), exclamation mark (!), and comma (,). N.B., On the whole, it is perhaps most convenient simply to avoid using punctuation marks when making bibliographic searches. All that is often needed to find what one wants is to enter an uncommon bibliographic term from either the title or author's name. [*] This page is a modified version of the PhiloLogic User Manual: 3. Character Representation for Search Terms.
|
advertisement
|
|
Copyright © 2005-2008 Richard
MAHONEY - OXFORD - N.Z.. All rights reserved. |