Link to paper The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract ChatGPT is evaluated for machine translation Candidate prompts generally work well ChatGPT performs competitively with commercial translation products on high-resource European languages ChatGPT lags behind significantly on low-resource or distant languages ChatGPT does not perform as well as commercial systems on biomedical abstracts or Reddit comments Paper Content Introduction ChatGPT is an intelligent chatting machine It is trained to follow instructions and provide detailed responses It can answer followup questions, admit mistakes, challenge incorrect premises, and reject inappropriate requests It can do various natural language processing tasks, including question answering, storytelling, logic reasoning, code debugging, and machine translation Evaluation setting Compared 3 commercial translation products Evaluated on Flores-101, WMT19 Biomedical Translation Task, WMT20 Robustness Task Sampled 50 sentences from each set for evaluation Used BLEU score, ChrF++, and TER as metrics Translation prompts ChatGPT was asked to provide ten concise prompts or templates for machine translation Three candidate prompts were summarized from the results, with an extra added to one of them The three candidate prompts were compared on a Chinese-to-English translation task, with TP3 performing the best in terms of all three metrics Multilingual translation Four languages are evaluated: German, English, Romanian, and Chinese 12 directions of translation are tested German-English translation is considered a high-resource task Romanian-English translation is considered a low-resource task ChatGPT performs competitively for German-English translation ChatGPT lags behind for Romanian-English translation Translating between different language families is harder than within the same language family Translation robustness ChatGPT was evaluated on WMT19 Bio and WMT20 Rob2 and Rob3 test sets....