search-help.html 5.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165
  1. <html>
  2. <head>
  3. <title>mozilla cross-reference: search help</title>
  4. <link rel='stylesheet' title='' href='style/style.css' type='text/css'>
  5. </head>
  6. <body bgcolor="#ffffff" text="#000000"
  7. link="#0000EE" vlink="#551a8b" alink="#ff0000">
  8. <table bgcolor="#000000" width="100%" border=0 cellpadding=0 cellspacing=0>
  9. <tr><td><a href="//www.mozilla.org/"><img
  10. src="//www.mozilla.org/images/mozilla-banner.gif" alt=""
  11. border=0 width=600 height=58></a></td></td></table>
  12. <p>
  13. <table class=desc>
  14. <tr><td>
  15. <h1 align=center>search help<br>
  16. <font size=3>
  17. for the<br>
  18. <a href="./"><i>mozilla cross-reference</i></a>
  19. </font></h1>
  20. </td></tr></table>
  21. <p>
  22. <blockquote><blockquote>
  23. <i>
  24. This text is derived from the Glimpse manual page.
  25. For more information on glimpse, see the
  26. <a href="http://webglimpse.net/">Glimpse homepage</a>.
  27. </i>
  28. </blockquote></blockquote>
  29. <a name="Patterns"></a><h2>Patterns</h2>
  30. <ul>
  31. glimpse supports a large variety of patterns, including simple
  32. strings, strings with classes of characters, sets of strings,
  33. wild cards, and regular expressions (see <a href="#Limitations">Limitations</a>).
  34. </ul>
  35. <p> <h3>Strings</h3>
  36. <ul>
  37. Strings are any sequence of characters, including the special symbols
  38. `^' for beginning of line and `$' for end of line. The following
  39. special characters (`$', `^', `*', `[', `^', `|', `(', `)', `!', and
  40. `\' ) as well as the following meta characters special to glimpse (and
  41. agrep): `;', `,', `#', `&gt;', `&lt;', `-', and `.', should be preceded by
  42. `\' if they are to be matched as regular characters. For example,
  43. \^abc\\\\ corresponds to the string ^abc\\, whereas ^abc corresponds
  44. to the string abc at the beginning of a line.
  45. </ul>
  46. <p> <h3>Classes of characters</h3>
  47. <ul>
  48. a list of characters inside [] (in order) corresponds to any character
  49. from the list. For example, [a-ho-z] is any character between a and h
  50. or between o and z. The symbol `^' inside [] complements the list.
  51. For example, [^i-n] denote any character in the character set except
  52. character 'i' to 'n'.
  53. The symbol `^' thus has two meanings, but this is consistent with
  54. egrep.
  55. The symbol `.' (don't care) stands for any symbol (except for the
  56. newline symbol).
  57. </ul>
  58. <p> <h3>Boolean operations</h3>
  59. <ul>
  60. Glimpse
  61. supports an `AND' operation denoted by the symbol `;'
  62. an `OR' operation denoted by the symbol `,',
  63. a limited version of a 'NOT' operation (starting at version 4.0B1)
  64. denoted by the symbol `~',
  65. or any combination.
  66. For example, pizza;cheeseburger' will output all lines containing
  67. both patterns.
  68. '{political,computer};science' will match 'political science'
  69. or 'science of computers'.
  70. </ul>
  71. <p><h3>Wild cards</h3>
  72. <ul>
  73. The symbol '#' is used to denote a sequence
  74. of any number (including 0)
  75. of arbitrary characters (see <a href="#Limitations">Limitations</a>).
  76. The symbol # is equivalent to .* in egrep.
  77. In fact, .* will work too, because it is a valid regular expression
  78. (see below), but unless this is part of an actual regular expression,
  79. # will work faster.
  80. (Currently glimpse is experiencing some problems with #.)
  81. </ul>
  82. <p><h3>Combination of exact and approximate matching</h3>
  83. <ul>
  84. Any pattern inside angle brackets &lt;&gt; must match the text exactly even
  85. if the match is with errors. For example, &lt;mathemat&gt;ics matches
  86. mathematical with one error (replacing the last s with an a), but
  87. mathe&lt;matics&gt; does not match mathematical no matter how many errors are
  88. allowed. (This option is buggy at the moment.)
  89. </ul>
  90. <h3>Regular expressions</h3>
  91. <ul>
  92. Since the index is word based, a regular expression must match words
  93. that appear in the index for glimpse to find it. Glimpse first strips
  94. the regular expression from all non-alphabetic characters, and
  95. searches the index for all remaining words. It then applies the
  96. regular expression matching algorithm to the files found in the index.
  97. For example, glimpse 'abc.*xyz' will search the index for all files
  98. that contain both 'abc' and 'xyz', and then search directly for
  99. 'abc.*xyz' in those files. (If you use glimpse -w 'abc.*xyz', then
  100. 'abcxyz' will not be found, because glimpse will think that abc and
  101. xyz need to be matches to whole words.) The syntax of regular
  102. expressions in glimpse is in general the same as that for agrep. The
  103. union operation `|', Kleene closure `*', and parentheses () are all
  104. supported. Currently '+' is not supported. Regular expressions are
  105. currently limited to approximately 30 characters (generally excluding
  106. meta characters). The maximal number of errors
  107. for regular expressions that use '*' or '|' is 4.
  108. </ul>
  109. <a name="Limitations"></a><h2>Limitations</h2>
  110. <ul>
  111. The index of glimpse is word based. A pattern that contains more than
  112. one word cannot be found in the index. The way glimpse overcomes this
  113. weakness is by splitting any multi-word pattern into its set of words
  114. and looking for all of them in the index.
  115. For example, <i>'linear programming'</i> will first consult the index
  116. to find all files containing both <i>linear</i> and <i>programming</i>,
  117. and then apply agrep to find the combined pattern.
  118. This is usually an effective solution, but it can be slow for
  119. cases where both words are very common, but their combination is not.
  120. <p>
  121. As was mentioned in the section on <a href="#Patterns">Patterns</a> above, some characters
  122. serve as meta characters for glimpse and need to be
  123. preceded by '\\' to search for them. The most common
  124. examples are the characters '.' (which stands for a wild card),
  125. and '*' (the Kleene closure).
  126. So, "glimpse ab.de" will match abcde, but "glimpse ab\\.de"
  127. will not, and "glimpse ab*de" will not match ab*de, but
  128. "glimpse ab\\*de" will.
  129. The meta character - is translated automatically to a hyphen
  130. unless it appears between [] (in which case it denotes a range of
  131. characters).
  132. <p>
  133. Search patterns are limited to 29 characters.
  134. Lines are limited to 1024 characters.
  135. </ul>
  136. <p>
  137. <hr>
  138. <address>
  139. <a href="mailto:lxr@linux.no">
  140. Arne Georg Gleditsch and Per Kristian Gjermshus</a>
  141. </address>
  142. </body>
  143. </html>