Philologica Indica et Buddhica: Search Term Help. [*]
Search terms are entered into the Terms(s) input field in either
the basic or advanced search form. Searches are case insensitive, for e.g.,
SPROS PA is equivalent to spros pa. One must, however,
enter the correct utf-8 accented characters or diacritics, for e.g., please enter
prapańca, not prapanca. Alongside or below the main
search form is a key listing the most often used diacritics for Romanised
Sanskrit. These characters may be cut and pasted into the search input
field. Search terms may also be entered using wildcard characters to match patterns, for e.g.,
rūp.* to retrieve rūpam, rūpeṇāpi and so
on.
Contents
-
Diacritics and Special
Characters
-
Diacritics for
Romanised Sanskrit
-
Special Characters
-
Wildcard Characters and Boolean
Operators
-
Full Text Searching :
Wildcard Characters; Boolean (Logical) Operators
-
Searching for Titles and Authors
&c.
-
Punctuation Marks and
Searching
-
Full Text Searching: Apostrophe; Hyphen; Period
-
Searching for Titles and Authors
&c.
- Diacritics and Special
Characters -
Philologica
digital texts are encoded in utf-8. All search terms must therefore use
utf-8 characters in all the search input fields, including the
Terms(s) and the Bibliographic fields (i.e., the
Title and Author fields &c.).
Philologica
provides three ways to input utf-8:
- enter utf-8 characters directly using a suitable keyboard input
layout
- cut and paste utf-8 characters from the key alongside or below the
main search form
- enter utf-8 characters indirectly using the postfix modifiers listed
below (only applies to the Term(s) input field)
-- Diacritics for
Romanised Sanskrit (only for Term(s) input field) --
- long a (ā) = a;
- long i (ī) = i;
- long u (ū) = u;
- vocalic r (ṛ) = r;
- vocalic l (ḷ) = l;
- velar n (ṅ) = g;
- palatal n (ń) = j;
- retroflex t (ṭ) = t;
- retroflex d (ḍ) = d;
- retroflex n (ṇ) = n;
- palatal s (ś) = z;
- retroflex s (ṣ) = s;
- anusvara (ṁ) = m;
- visarga (ḥ) = h;
- circumflex = ^ :: e.g., a^ --> â
-- Special Characters --
- ampersands (&) and many other punctuation marks are not searchable
characters
- mathematical symbols
- the equal sign (=) and minus sign (-) will produce a ``No words
matching specified search term(s)'' message
- the plus sign (+) is not a searchable character, but, if entered, will
be ignored
- Wildcard Characters and
Boolean Operators -
One can include wildcard characters in search terms. This enables one to
search for terms that match a pattern. Wildcard characters are available
for both full text and bibliographic searching.
-- Full Text Searching
--
Philologica
supports wildcard characters and Boolean (logical) operators, which are
modeled on UNIX regular expressions to perform ``pattern matching'' in
full text searching. Pattern matching allows identification of a large
number of words corresponding to a defined pattern. Wildcard characters
can be useful, for example, in identifying cognates made obscure by
affixes and vowel weakening, inconsistencies due to irregular orthography,
and variations on account of word inflection as well as for discovering
potential emendations for uncertain readings. The most commonly used
regular expression operators (wildcard and Boolean) are listed below.
--- Wildcard Characters
---
- . (period) :: matches any single character:
- śrāvak. :: matches śrāvaka and śrāvako [
Search result]
- .* (period asterisk "dot-star") :: matches any string of
characters:
- .*rūpāṁ :: matches divyarūpāṁ, bahurūpāṁ, raudrarūpāṁ,
ghorarūpāṁ, &c. [
Search result]
- pur.*ya :: matches puraskṛtya, purāṇasya, puruṣasiṁhasya,
puruṣottamasya, &c. [
Search result]
- śrāvak.* :: matches śrāvaka, śrāvako, śrāvakaguṇāḥ,
śrāvakabhūmayaḥ, śrāvakapratyekabuddhayānaṁ &c. [
Search result]
- .? (period question mark) :: matches the characters entered, or the
characters entered plus one instead of the question mark:
- mahāratha.? :: matches mahāratha, mahāratham, mahārathas, and
mahārathau [
Search result]
- [a-z] (brackets) :: matches a single character within a range:
- pariśuddh[a-i] :: matches pariśuddha, pariśuddhi, and
pariśuddhe, but not pariśuddho [
Search result]
N.B., for a full list of words matching a wildcard search term, go to the
advanced search interface, enter the
wildcard search term, select the Refined Search Results tab,
select the Frequency by Title radio button, and then press
`Search'.
--- Boolean (Logical) Operators
---
- | (vertical bar) :: the OR operator:
- pariśuddha | pariśuddhi :: matches pariśuddha OR pariśuddhi
[
Search
result]
- Space :: the AND operator in sentence and paragraph proximity
searching:
- yadā vedanāṁ :: matches yadā AND vedanāṁ in sentence and
paragraph proximity searching [
Search result]
N.B., wildcard characters and boolean operators can be combined within
the same search: e.g., bahurūp.* | rūpa.? uttamam. [
Search
result]
-- Searching for Titles and
Authors &c. (Bibliographic Searching) --
When searching for titles and authors and so on (bibliographic searching)
one needs only limited support for wildcard characters and Boolean
operators. In general, one should only need to enter an uncommon term from
the title or author's name. Please note that only the Boolean operator OR
(|) can be used, not AND (space); that the wildcard operator (.*) is
unnecessary; and that a title or author's name that contains diacritics
must be entered with utf-8 characters, not postfix modifiers.
- Punctuation Marks
and Searching -
In general, it is advisable to avoid using punctuation marks when engaged
in full text or bibliographic searching.
-- Full Text Searching
--
Punctuation marks must be avoided when full text searching. Many of the
symbols often used for punctuation are used by Philologica for
postscript modifiers or wildcard characters.
The punctuation that must be eschewed includes: the comma (,), question
mark (?), exclamation mark (!), vertical bar (|), forward (/) and backward
(\) slashes, parentheses (( )), braces({ }), brackets([ ]), angle brackets
(< >), colons (:), and semi-colons (;) as well as quotation marks (`
' "), ampersands (&), asterisk (*), percentage sign (%), dollar sign
($), and number sign (#).
Some punctuation marks are especially problematic and deserve further
comment:
--- Apostrophe ---
The Tibetan 'a chuṅ is represented by an apostrophe ('). When
searching for a term including an apostrophe one should substitute a
wildcard character: e.g., search for tha snyad .*dogs pa or
tha snyad .?dogs pa rather than tha snyad 'dogs
pa.
--- Hyphen ---
Some texts use hyphens to separate words within compounds. When searching
for words within a hyphenated compound one should omit the hyphen: e.g.,
search for apāya hetu rather than apāya-hetu.
--- Period ---
The period (.) is not searchable. It serves as a wildcard character.
N.B., as few digital texts are tagged for sentence termination, PhiloLogic relies on
punctuation marks in combination with capitalisation to identify sentence
termination. This is especially problematic for Indological and
Buddhological texts.
-- Searching for Titles and
Authors &c. (Bibliographic Searching) --
The following punctuation marks produce a ``No documents found matching
specified bibliographic criteria'' message when used in bibliographic
search input fields: parentheses (( )), semi-colons (;), colons (:),
ampersands (&), apostrophes ('), quotation marks (` ' "), braces ({
}), brackets ([ ]), and angle brackets (< >), forward slash (/), as
well as the dollar sign ($).
The following punctuation marks have no adverse effect on a
bibliographical search and, if appearing within a string, must be entered:
period (.), hyphen (-), question mark (?), exclamation mark (!), and comma
(,).
N.B., On the whole, it is perhaps most convenient simply to avoid using
punctuation marks when making bibliographic searches. All that is often
needed to find what one wants is to enter an uncommon bibliographic term
from either the title or author's name.
[*] This page is a
modified version of the PhiloLogic User
Manual: 3. Character Representation for Search Terms.