Better Bibles Blog has moved. Read our last post, below, and then
click here if you are not redirected to our new location within 60 seconds.
Please Bookmark our new location and update blogrolls.

Wednesday, July 11, 2007

Fuzzy Searching

I want to thank Lingamish for saying that I was so "dang smart and funny" by letting him know that I may be funny - but I'm not that smart. I do know how to cheat when necessary, so I will reveal one of my little secrets for Lingamish.

For example, it took Lingamish a while to figure out what Iyov's name meant. Peter kept dropping hints until L got it. So, Lingamish, I'll tell you what I did. I trotted over to visit Iyov and say "hello", but really, that name in English letters didn't mean a hill of beans to me. Then I looked at the Hebrew in his first post. It created a sort of fuzzy impression. I thought I might have it. That and Blake and so on.

However, I wanted to greet Iyov without making a fool of myself, very important to make that attempt once in a while, so I copied the first five words from Iyov's intro. into the google search box. אִישׁ הָיָה בְאֶרֶץ־עוּץ אִיּוֹב and there it was , 85 hits - try it! You'll never go back to using your own brain. You don't have to know Hebrew - you don't even have to know that it is Hebrew.

But here is the weird thing. Now go up to the banner and copy and paste the same first five words איש היה בארץ־עוץ איוב into google. 297 hits and they are completely different hits from the last search.

So this is a problem, one is a search with vowel markings and the other without, consonants only. There are different software solutions, ways to undertake fuzzy searches and phonetic searches, but the process is not standardized nor is it in the public domain. Read Chris Heard's discussion about trying to do a consonantal search in Accordance. I don't have Accordance so I can't comment on its capabilities. But sometimes I just take out a lexicon and flip through the pages - there is nothing like a fuzzy search. This doesn't work to search the whole of scripture in one sitting, of course.

Google has a different set of issues for every language and resolves them in different ways. French and German with obligatory accents have a legacy situation in which accents don't count. There are other languages besides Hebrew where diacritic marks are optional but obligatory in a search.

There are languages with orthographies which still have two or more possible strings for each word, left over effects from legacy systems, and then there are the languages for which different keyboards create different strings for each word. Accents are composed one way on this keyboard and another way on that keyboard. In that case, searches do not overlap, keyboarders, in spite of speaking the same langauge and reading the same text, live in distinct and parallel digital worlds. This is what it means to speak a minority language. We need to remember that not only do we have access to technology but technology was designed with English in mind, and we access technology through English.

Oops, I forgot this is the BBB, not Abecedaria. Anyway, I hope that Bible software will invest in creating capabilities for lots of alternative searches, fuzzy, phonetic, plene, non-plene (what is that called?) etc. Because technology may be changing fast but the human brain is not.

Update: David Lang has pointed me to his post on the Accordance blog. I was hoping you would comment, David, because I am not familiar with Accordance.

So the important information is that unlike Google, Accordance ignores the vowel markings. Good stuff. My point would be that everyone should automatically familiarize themselves with these details when they begin using a search engine.

Next, David explains the difference between lexical and inflected forms. This is crucial and it is pretty neat to have, because a lexicon only supplies lexical entries, not inflected ones (at least not usually).

Finally, not only can you eliminate the vowel markings but also the point which differentiates sin from shin. Now that is what I am talking about - that's the way it should be. Once again, my point is that people need to inform themselves on searching - what the parameters are. And definitely, posing a question in a forum is THE way to gain information. I was an avid member of two different forums for several years. It can be a lot of fun.


At Wed Jul 11, 05:39:00 PM, Blogger David Lang said...


With respect to Chris Heard's post about consonantal searching, see my comment on his post, along with this post on the Accordance blog.


Post a Comment

Links to this post:

Create a Link

Subscribe to Post Comments [Atom]

<< Home