[Mesorah] mis-accentuation

Shlomo Argamon argamon at iit.edu
Thu Dec 21 21:29:09 PST 2017


Interesting - I hadn't heard of this.

Well, a great deal depends on exactly what methods were used, and more
importantly, what textual features were used to predict the nikkud and
t'amim. Certainly, any such method would label based on regularities, and
hermeneutically-based exceptions could not be captured. That issue aside,
there are many potential methodological pitfalls that must be dealt with to
have confidence in the results. (Again, I'm speaking generally, as I don't
know what these researchers did.)

One of the main issues is that of different text styles and genres. If you
build a statistical model based on a book (or books) in one style (say,
prophetic visions) and then use it to label text in a different style (say,
historical narrative), you cannot trust the results. Critical is that even
if you test your method on known text, if the test is not on the same type
of text (style, genre, etc.) as the unknown text you want to label (the
Chumash in this case), your test accuracy will have no predictable
connection to accuracy on the unknown text. (Parenthetically, "90% hat'ama"
doesn't sound that impressive to me, at least not to rely on. Would you eat
a cookie from a jar that is "90% unpoisoned"?)

Do you have a reference to a detailed exposition of the method and results?

Shlomo

Shlomo Engelson Argamon
Professor of Computer Science
Director, Master of Data Science
Illinois Institute of Technology
http://about.me/shlomoargamon

On Thu, Dec 21, 2017 at 2:00 PM, Micha Berger <micha at aishdas.org> wrote:

> On Thu, Dec 21, 2017 at 09:22:46PM +0200, David and Esther Bannett wrote:
> :> I thought the chumash was lost from the Keter before it reached Jewish
> :> hands. How does BI have a copy its contents?
>
> : BI used a computer on the Keter to get statistics. For example, if a
> : certain style of word had a meteg in the vast majority of
> : apearances, they made it a klall and used that form in other places.
> : Thus they reconstituted the parts of the Keter that were missing.
>
> Of course, that guarantees erasing the exceptions that TSBP statements
> were hung on. (Like "Mi kamokha needar baqodesh", with a kaf degushah
> on that iteration, in contrast to the grammatically normal first "Mi
> khamokha ba'eilim...")
>
> : Using their klallim. they took a piece of the keter and removed all
> : nikkud and  t'amim,  They then replaced the nikkud and t'amim
> : according to their klallim and compared it with the original.  They
> : had over 90% hat'ama.
>
> I wonder what people who do statistical analysis of texts, like Moshe
> Koppel
> or Shlomo Argamon-Engleson, think of their work.
>
> So, I CC-ed them.
>
> Tir'u baTov!
> -Micha
>
> --
> Micha Berger             With the "Echad" of the Shema, the Jew crowns
> micha at aishdas.org        G-d as King of the entire cosmos and all four
> http://www.aishdas.org   corners of the world, but sometimes he forgets
> Fax: (270) 514-1507      to include himself.     - Rav Yisrael Salanter
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.aishdas.org/pipermail/mesorah-aishdas.org/attachments/20171221/4de0946c/attachment-0001.htm>


More information about the Mesorah mailing list