您现在的位置: 纽约时报中英文网 >> 纽约时报中英文版 >> 科学 >> 正文

机器学习或成揭开古文字之谜的钥匙

更新时间:2019/10/20 8:57:21 来源:纽约时报中文网 作者:佚名

The key to cracking long-dead languages?
机器学习或成揭开古文字之谜的钥匙

Broken and scorched black by fire, the dense, wedge-shaped marks etched into the ancient clay tablets are only just visible under the soft light at the British Museum. These tiny signs are the remains of the world’s oldest writing system: cuneiform.

在大英博物馆(British Museum)的柔光照射下,人们只能勉强看到镌刻在这些古老泥板上的密密麻麻的楔形标记。这些细小的标记是世界上最古老的书写系统——楔形文字的遗迹。

Developed more than 5,000 years ago in Mesopotamia, the land between the Tigris and Euphrates rivers where modern-day Iraq now lies, cuneiform captured life in a complex and fascinating civilisation for some three millennia. From furious letters between warring royal siblings to rituals for soothing a fractious baby, the tablets offer a unique insight into a society at the dawn of history.

楔形文字起源于5000多年前的美索不达米亚,位于底格里斯河和幼发拉底河之间,也就是现在的伊拉克。楔形文字捕捉了一个长达3000年之久的、复杂而迷人的文明。从王室兄弟姐妹之间愤怒内斗的信件,到安抚一个任性婴儿的仪式,这些石碑让人们可以从另一个独特的视角了解历史初期的社会。

They chronicle the rise of fall of Akkad, Assyria and Babylonia, the world’s first empires. An estimated half a million of them have been excavated, and more are still buried in the ground.

它们记录了阿卡德、亚述和巴比伦帝国的兴衰,这是世界上第一个帝国。据估计,人们已经挖掘出了约50万块楔形文字板,但还有很多仍深埋地下。

However, since cuneiform was first deciphered by scholars around 150 years ago, the script has only yielded its secrets to a small group of people who can read it. Some 90% of cuneiform texts remain untranslated.

约150年前,学者首次破译楔形文字。然而,只有一小部分能读懂这种文字的人才了解其中的秘密。目前,仍有约90%的楔形文字未被翻译出来。

That could change thanks to a very modern helper: machine translation.

但是,这种情况可能会有所改变,这都要归功于现代工具——机器翻译。

“The influence that Mesopotamia has on our own culture is something that people don’t know much about,” says Émilie Pagé-Perron, a researcher in Assyriology at the University of Toronto. Mesopotamia gave us the wheel, astronomy, the 60-minute hour, maps, the story of the flood and the ark, and the first work of literature, the Epic of Gilgamesh. But its texts are mainly written in Sumerian and Akkadian, languages that relatively few scholars can read.

"人们并不了解美索不达米亚文明对自身文化的影响,"多伦多大学亚述学研究员佩龙(Emilie Page-Perron)说。美索不达米亚文明孕育了车轮、天文学、一小时60分钟的计时制、地图、洪水和方舟的故事、以及第一部文学作品——《吉尔伽美什史诗》。这本诗集主要是用苏美尔语和阿卡德语写成的,能读懂这些语言的学者少之又少。

Pagé-Perron is coordinating a project to machine translate 69,000 Mesopotamian administrative records from the 21st Century BC. One of the aims is to open up the past to new research.

佩龙现在正在进行的一个项目,是用机器翻译公元前21世纪以来美索不达米亚文明的行政记录,数量多达69000份,其目的之一是为新的研究发掘过去。

“We have information about so many different aspects of the lives of Mesopotamian people, and we can’t really profit from the expertise of people in different fields like economics or politics, who if they had access to the sources, could help us tremendously to understand those societies better,” says Pagé-Perron.

佩龙说:"我们虽然已经获得了关于美索不达米亚人生活的信息,但却没有真正从(美索不达米亚)不同领域专业人士的知识中获益,比如经济和政治领域。如果有渠道(了解这些知识),我们能更好地了解那些古老的社会。"

Apart from the clay tablets, there are also more than 50,000 Mesopotamian engraved seals scattered in collections around the world. For millennia, the people of Mesopotamia used seals made of engraved stone that were pressed into wet clay to mark doors, jars, tablets and other objects. Only some 10% of these have even been catalogued, let alone translated.

除了石碑,还有5万多枚美索不达米亚雕刻印章散落在世界各地。几千年来,美索不达米亚人使用由雕刻石头制成的印章,这些印章被压入潮湿的粘土中,用来标记门、罐子、石板和其他物品。这些刻章中只有十分之一被编入目录,更不用说翻译了。

“We have more sources from Mesopotamia than we have from Greece, Rome and ancient Egypt together,” says Jacob Dahl, a professor of Assyriology at the University of Oxford. The challenge is finding enough people who can read them.

牛津大学亚述学教授达尔(Jacob Dahl)表示:"我们所获得的关于美索不达米亚文明的资料比希腊、罗马和古埃及的加起来还要多,但真正的挑战在于找到能读懂它们的人。"

Pagé-Perron and her team are training algorithms on a sample of 4,000 ancient administrative texts from a digitised database. Each records transactions or deliveries of sheep, reed bundles or beer to a temple or an individual. Originally impressed into the clay with a reed stylus, the texts have already been transliterated into our alphabet by modern scholars. The Sumerian word for big, for example, can be written in cuneiform signs, or it can be written in our alphabet as “gal”.

佩龙和她的团队正在对一个数字化数据库中的4000个古代行政文本样本编写算法。这些行政文本包括交易和运输记录,比如把羊、芦苇束或啤酒运到寺庙或个人手中的记录。这些文字最初是用芦苇笔刻在粘土上的,现在,学者已经把它们音译成了我们的字母表。例如,苏美尔语中表示"大"的词可以写成楔形文字,也可以写成英文字母表中的"gal"。

The wording in these administrative texts is simple: “11 nanny goats for the kitchen on the 15th day”, for example. This makes them particularly suitable for automation. Once these algorithms have learned to translate the sample texts into English, they will then automatically translate the other transliterated tablets.

这些行政文书的措辞很简单。例如,"第15天,厨房有11只母山羊"。这种特点使得它们特别适合被自动化处理。一旦算法学会了将样本文本翻译成英语,它们就能自动翻译其他经过音译的石碑。

“The texts we’re working on are not very interesting individually, but they’re extremely interesting if you take them as groups of texts,” says Pagé-Perron, who expects the English versions to be online within the next year. The records give us a picture of day to day life in ancient Mesopotamia, of power structures and trading networks, but also of other aspects of its social history, such as the role of female workers. Searchable translations would enable researchers from other areas to explore these rich facets of life in the ancient world.

佩龙表示:"如果单独看我们正在研究的文本,它并没有那么有趣。但如果你把它们当作一组文本来看,就有意思多了。"她预计英文版平台将在明年内上线。这些记录向我们展示了古代美索不达米亚人的日常生活,包括权力结构和贸易网络,同时还展示了社会历史的其他方面,如女工的角色。平台上可被检索的翻译,将使不同地方的研究人员都能探索到古代生活的丰富面向。

“These people are so different and so remote from us, but at the same time, they have the same basic problems,” explains Pagé-Perron. “Understanding Mesopotamia is a way of understanding what it means to be human.”

佩龙解释说:"这些人与我们是如此不同,但他们也面对着和我们一样的基本问题。理解美索不达米亚文明,能够帮助我们理解生而为人的意义。"

She hopes machine analysis will also clarify certain features of Sumerian that still puzzle modern academics. This extinct language is not related to any modern language but has been preserved in inscriptions written in cuneiform. It may be our last remaining link to even older, unrecorded societies.

她希望机器分析也能弄清苏美尔人的一些特征,这是至今仍困扰着现代学术界的难题。这种已经灭绝的语言与任何现代语言都没有联系,但却保存在以楔形文字书写的碑文中。这可能是我们与更古老,甚至没有历史记载的社会之间最后的联系。

“Sumerian is probably the last member of what must have been a large family of languages that goes back thousands and thousands of years,” says Irving Finkel, the curator in charge of the 130,000 cuneiform tablets stored at the British Museum. “Writing appeared in the world just in time to rescue Sumerian… We’re just lucky that we had some ‘microphone’ that picked it up before it went away with all the others.”

"苏美尔语可能是数千年前的语言大家庭中的最后一个成员,"芬克尔(Irving Finkel)说。"文字及时地出现在这个世界上,拯救了苏美尔语……幸运的是,在苏美尔语与其他文字一起消失之前,我们及时地开始学习这种语言。"

Finkel is one of the world’s leading cuneiform experts. In his book-filled office at the British Museum, he explains how the script was slowly deciphered thanks to a multi-lingual inscription about a king, just like the Rosetta Stone that helped researchers make sense of Egyptian hieroglyphs.

芬克尔是世界上顶尖的楔形文字专家之一。他在大英博物馆堆满书的办公室里讲解了手稿是如何慢慢被破译的,这多亏了一位国王的多语种铭文,就像罗塞塔石碑帮助研究人员理解了埃及象形文字一样。

“It’s actually rather astonishing how interesting it is when you find a human mind across millennia, where it is like talking to them on the telephone,” he says. “It’s the most exciting thing in the world when you meet one of these people.”

他说:"当你与千年前的灵魂进行交谈时你会惊讶地发现,这简直太有趣了,仿佛在和他们打电话。认识他们是世界上最令人兴奋的事情。"

Ancient access

触碰古老宝藏

Few of us will ever cradle a 5,000-year-old tablet in our palm. But thanks to advanced imaging techniques, anyone with an internet connection can now access treasures such as the world’s oldest surviving royal library, which is being digitised. It was built in Nineveh by Ashurbanipal, a powerful and book-loving Assyrian king. Some of the surviving tablets from his library are displayed at the British Museum as part of a special exhibition on Ashurbanipal. Although blackened and hardened by fire when Nineveh was sacked in 612 BC, the text they carry can still be read.

只有少数人能接触到拥有5000年历史的石碑,但多亏了先进的成像技术,现在任何人只要能上网就能接触到这些宝藏。比如,世界上现存最古老的皇家图书馆,人们正在将它数字化。这座图书馆位于尼尼微,由亚述国王亚述巴尼帕(Ashurbanipal)建造。大英博物馆展出了图书馆里幸存的一些碑文,是亚述巴尼帕专题展览的一部分。虽然早在公元前612年,尼尼微遭遇洗劫时,这些碑文被火烤得又黑又硬,但上面得文字仍可辨认。

New imaging techniques are making the job of working with such ancient, often damaged texts easier. With highly detailed images, it is possible to pick out marks that may be too obscure to see with a human eye.

新的成像技术让人们在处理这些古老且破损严重的文本时更加轻松。有了精细的图像,人们就有可能找出那些肉眼看不见的模糊标记。

Dahl and his colleagues have been digitising tablets and seals stored in collections in Teheran, Paris and Oxford for a project known as the Cuneiform Digital Library Initiative. This vast online database already contains about a third of the world’s cuneiform texts, as well as some undeciphered written languages, such as Proto-Elamite from ancient Iran. Without sprawling digital resources like this, training machines to do translation would not even be possible.

达尔和他的同事一直在进行一个名为"楔形文字数字图书馆倡议"(Cuneiform Digital Library Initiative)的项目,将储存在德黑兰、巴黎和牛津馆藏中的碑文及印章进行数字化处理。这个庞大的在线数据库已经包含了世界上约三分之一的楔形文字,以及一些未被破译的书面语言,如古伊朗的原始埃兰语。如果没有这样庞大的数字资源,让机器进行翻译几乎是不可能的。

Digitisation is also helping researchers to piece together links between texts scattered in collections around the world. Dahl, along with researchers at the University of Southampton and the University of Paris-Nanterre, has digitised 3D images of about 2,000 stone seals from Mesopotamia. In a pilot project, they then used AI algorithms to examine a group of six tablets and identify matching seal impressions found elsewhere in the world. The algorithm correctly selected a tablet that is currently stored in Italy, and another that is stored in the United States; both had been stamped by the same seal.

数字化还帮助研究者们将散落在世界各地的文本拼凑起来。达尔与南安普顿大学及巴黎南泰尔大学的研究者一同对美索不达米亚的200多枚石印的3D图像进行了数字化处理。在试点项目中,他们使用了人工智能算法校验了6块碑文,并识别出在世界其他地方发现的与之匹配的石印。算法准确地挑选出了两块现存于意大利和美国的石碑,这两块石碑上盖的石印是一样的。

Matching seals and impressions has been notoriously difficult in the past, as many are stored thousands of miles apart. Dahl estimates that all seals could be digitised within about five years, which would then make it possible to trace other patterns. There is some indication, for example, that certain types of stone were favoured by women.

在过去,想要将石印和印痕匹配起来困难重重,因为许多石印储存在数千英里之外的地方。达尔预计,五年内可以将所有的印章进行数字化处理,这样就可以追踪其他方面的信息。比如说,有迹象表明,某种石头更受到女性的青睐。

“That is the kind of question you could not answer unless you had large numbers of seals imaged in the way we’re doing, and applying techniques like algorithms or machine learning,” Dahl says. He hopes that as artificial intelligence evolves, it will help us unravel the full potential of the rich information contained in collections around the world.

达尔说:"要得出这种结论必须拥有大量经过处理的石印图像,并运用算法和机器学习等技术。"他希望,人工智能的发展能帮助探索世界各地收藏品中蕴藏的丰富信息。

“I want Assyriology, which covers half of human history and a very endangered cultural heritage, to be at the forefront of this.”

"亚述研究涵盖了人类历史的一半,是一种濒临灭绝的文化遗产。我希望亚述学能走在这方面的前沿。"

Cracking codes

破译古人的语言

Imaging is also changing research into undeciphered scripts. Humans tend to be better than machines at this type of decipherment, which typically involves small amounts of text, creative mental leaps, and an understanding of how people lived and organised themselves. It also involves a great deal of intellectual flexibility.

成像技术也改变了对于未破译文本的研究。对于数量少、具创造性文本的破译,人类往往比机器做得更好,人类有着对生活和组织方式的深入理解,以及高度的灵活性。

Early cuneiform signs, for example, were not even arranged in a linear text, but simply placed together with a box drawn around them. Proto-Elamite is three-dimensional: a shallow impression of a circle has a different meaning than a deeper one. However, technology has helped the decipherment process by providing detailed pictures that can be magnified, shared and compared.

例如,早期的楔形文字符号并不是线性排布的,而是简单地与画在周围的方框排在一起。原始埃兰语是三维立体的,一个圆印的深浅不同意义也不同。但是,技术可以放大、分享和比较图片的细节,加快了破译进程。

“The crucial problem is first and foremost to get proper images,” says Dahl, who is working on deciphering the mysterious script. “That’s lacking for the first 100 years of study of Proto-Elamite.”

一直致力于破译神秘文本的达尔说:"获得正确的图像是问题的核心。原始埃兰语研究缺乏的正是这个。"

Such advances go beyond the field of Assyriology. Philippa Steele, a senior research fellow at Cambridge University, is an expert in the early writing systems of ancient Crete and Greece. These include ‘Linear A’, an undeciphered script, and ‘Linear B’, which was used to write an ancient form of Greek.

这些进步已经超越了亚述学领域。剑桥大学高级研究员斯蒂尔(Philippa Steele)是研究古克里特和希腊早期文字系统的专家。其中包括"线形文字A"(一种未破译的文字)和"线形文字B"(一种古代希腊语的书写形式)。

Thanks to techniques that take sophisticated images of ancient tablets that feature these scripts, Steele has discovered new details.

归功于成熟的成像技术,古代石碑上的文字被很好第呈现,斯蒂尔才在其中发现了新的细节。

“You can make out features that are very difficult to make out with the naked eye,” she says. “And often those features might correspond to the ways in which the person writing the document interacted with the document. So for Linear B, for example… you can make out erasures. Sometimes you can tell when the person writing the document has worked something out and then written something over the top.”

她说:"你可以辨认出肉眼很难辨认的特征。"这些特征通常与撰写文本的人与文本交互的方式相对应。例如,对于线性B,你可以分辨出更改的痕迹。有时你可以判断出撰写这份文件的人是什么时候想出来了什么,然后又在上面写了什么。

Pagé-Perron hopes that machines will eventually be able to translate more complex Sumerian tablets, and other languages like Akkadian. “There’s a lot more to discover about ancient cultures,” she says.

佩龙希望机器最终能够翻译更复杂的苏美尔语石碑和其他语言,比如阿卡德语。她说:"关于古代文化,还有很多东西有待发现。"

Perhaps one day, we will be able to read all of our earliest texts in translation – though many of Mesopotamia’s riddles are likely to outlive us, not least because many missing cuneiform fragments are still in the ground, waiting to be excavated.

也许有一天,我们将能够阅读所有古老文字的翻译版本,尽管当我们去世时,美索不达米亚的许多未解之谜还未解开,尤其是现在许多缺失的楔形文字碎片仍深埋地下,等待挖掘。

The kings of ancient Mesopotamia thought deeply about the past and the future. They revered cuneiform texts from previous eras, and buried special inscriptions recording their names and achievements, promising rewards for a later ruler who would honour them.

古代美索不达米亚的国王们深深地思考着过去和未来。他们崇敬前朝的楔形文字,将记录着他们的名字和成就的铭文埋藏地下,寄望后世的统治者会将荣耀归于自己。

In some ways their wish came true. Their battles and conquests may be forgotten by most. But their most powerful invention, writing, has helped humanity develop ideas and technologies over millennia – and now, train machines to learn from the past.

在某种程度上,他们的愿望已经实现。他们的经历过的战争和征服可能已经被大多数人遗忘,但是他们最强大的发明——文字——在过去的几千年里助力了人类思想和技术的发展。而现在,人类开始训练机器从过去中学习。

“全文请访问纽约时报中文网,本文发表于纽约时报中文网(http://cn.nytimes.com),版权归纽约时报公司所有。任何单位及个人未经许可,不得擅自转载或翻译。订阅纽约时报中文网新闻电邮:http://nytcn.me/subscription/”

相关文章列表