[编程开发] 研0应该怎么入门nlp?

一、Deep contextualized word representations
作者:Matthew E. Peters / Mark Neumann / Mohit Iyyer / Matt Gardner / Christopher M. Clark / ... / Luke Zettlemoyer
摘要:We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.
全文链接:文献全文 - 学术范 (xueshufan.com)

二、Enriching Word Vectors with Subword Information
作者:Piotr Bojanowski / Edouard Grave / Armand Joulin / Tomas Mikolov
摘要:Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models to learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram, words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.
全文链接:文献全文 - 学术范 (xueshufan.com)

三、Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
作者:Yonghui Wu / Mike Schuster / Zhifeng Chen / Quoc V. Le / Mohammad Norouzi / ... Jeffrey Dean
摘要:Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.
全文链接:文献全文 - 学术范 (xueshufan.com)

四、GloVe: Global Vectors for Word Representation
作者:Jeffrey Pennington / Richard Socher / Christopher D. Manning
摘要:Recent methods for learning vector spacerepresentations  of  words  have  succeededin   capturing   fine-grained   semantic   andsyntactic  regularities  using  vector  arith-metic,  but the origin of these regularitieshas  remained  opaque.    We  analyze  andmake explicit the model properties neededfor  such  regularities  to  emerge  in  wordvectors.   The  result  is  a  new  global  log-bilinear  regression  model  that  combinesthe  advantages  of  the  two  major  modelfamilies  in  the  literature:   global  matrixfactorization  and  local  context  windowmethods.  Our model efficiently leveragesstatistical information by training only onthe nonzero elements in a word-word co-occurrence matrix, rather than on the en-tire sparse matrix or on individual contextwindows in a large corpus. The model pro-duces a vector space with meaningful sub-structure, as evidenced by its performanceof 75% on a recent word analogy task.  Italso outperforms related models on simi-larity tasks and named entity recognition.
全文链接:文献全文 - 学术范 (xueshufan.com)

五、Sequence to Sequence Learningwith Neural Networks
作者:Ilya Sutskever / Oriol Vinyals / Quoc V. Le
摘要:Deep Neural Networks (DNNs) are powerful models that have achieved excel-lent performance on difficult learning tasks. Although DNNswork well wheneverlarge labeled training sets are available, they cannot be used to map sequences tosequences.  In this paper, we present a general end-to-end approach to sequencelearning that makes minimal assumptions on the sequence structure. Our methoduses a multilayered Long Short-Term Memory (LSTM) to map theinput sequenceto a vector of a fixed dimensionality, and then another deep LSTM to decode thetarget sequence from the vector.  Our main result is that on anEnglish to Frenchtranslation task from the WMT’14 dataset, the translationsproduced by the LSTMachieve a BLEU score of 34.8 on the entire test set,  where the LSTM’s BLEUscore was penalized on out-of-vocabulary words. Additionally, the LSTM did nothave difficulty on long sentences.  For comparison, a phrase-based SMT systemachieves a BLEU score of 33.3 on the same dataset.  When we usedthe LSTMto rerank the 1000 hypotheses produced by the aforementioned SMT system, itsBLEU score increases to 36.5, which is close to the previous best result on thistask.  The LSTM also learned sensible phrase and sentence representations thatare sensitive to word order and are relatively invariant to the active and the pas-sive voice.  Finally, we found that reversing the order of thewords in all sourcesentences (but not target sentences) improved the LSTM’s performance markedly,because doing so introduced many short term dependencies between the sourceand the target sentence which made the optimization problemeasier.
全文链接:文献全文 - 学术范 (xueshufan.com)

六、The Stanford CoreNLP Natural Language Processing Toolkit
作者:Christopher D. Manning / Mihai Surdeanu / John Bauer / Jenny Finkel / Steven J. Bethard / David McClosky
摘要:We  describe  the  design  and  use  of  theStanford  CoreNLP  toolkit,  an  extensiblepipeline  that  provides  core  natural  lan-guage analysis. This toolkit is quite widelyused, both in the research NLP communityand also among commercial and govern-ment  users  of  open  source  NLP  technol-ogy.   We  suggest  that  this  follows  froma  simple,  approachable  design,  straight-forward  interfaces,  the  inclusion  of  ro-bust  and  good  quality  analysis  compo-nents,  and  not  requiring  use  of  a  largeamount of associated baggage.
全文链接:文献全文 - 学术范 (xueshufan.com)

七、Distributed Representations of Words and Phrases and their Compositionality
作者:Tomas Mikolov / Ilya Sutskever / Kai Chen / Greg Corrado / Jeffrey Dean
摘要:The recently introduced continuous Skip-gram model is an efficient method forlearning high-quality distributed vector representations that capture a large num-ber of precise syntactic and semantic word relationships. In this paper we presentseveral extensions that improve both the quality of the vectors and the trainingspeed.  By subsampling of the frequent words we obtain significant speedup andalso learn more regular word representations.  We also describe a simple alterna-tive to the hierarchical softmax called negative sampling.An inherent limitation of word representations is their indifference to word orderand their inability to represent idiomatic phrases.  For example, the meanings of“Canada” and “Air” cannot be easily combined to obtain “Air Canada”. Motivatedby this example, we present a simple method for finding phrases in text, and showthat learning good vector representations for millions of phrases is possible.
全文链接:文献全文 - 学术范 (xueshufan.com)

八、Natural Language Processing (Almost) from Scratch
作者:Ronan Collobert / Jason Weston / Léon Bottou / Michael Karlen / Koray Kavukcuoglu / ... Pavel P. Kuksa
摘要:We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements.
全文链接:文献全文 - 学术范 (xueshufan.com)

九、Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation
作者:David M. W. Powers
摘要:Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general multi-class case.
全文链接:文献全文 - 学术范 (xueshufan.com)

十、Glove: Global Vectors for Word Representation
作者:Piotr Bojanowski / Edouard Grave / Armand Joulin / Tomas Mikolov
摘要:Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.
全文链接:文献全文 - 学术范 (xueshufan.com)


Natural language processing详情-学术范 (xueshufan.com)


1、Christopher D. Manning
所属机构:Stanford University

关于 Christopher D. Manning 的学术成果,请访问:学者学术成果 - 学术范 (xueshufan.com)

2、Tomas Mikolov
所属机构:Czech Technical University in Prague

关于Tomas Mikolov 的学术成果,请访问:学者学术成果 - 学术范 (xueshufan.com)

3、Richard Socher

关于Richard Socher 的学术成果,请访问:学者学术成果 - 学术范 (xueshufan.com)

4、Ilya Sutskever

关于Ilya Sutskever 的学术成果,请访问:学者学术成果 - 学术范 (xueshufan.com)

5、Jeffrey Dean

关于Jeffrey Dean 的学术成果,请访问:学者学术成果 - 学术范 (xueshufan.com)

<hr/>我保研之前完全是深度学习+机器学习+NLP 零基础,当时自己摸索着入门走了很多弯路。
1. 李宏毅机器学习2022 Spring李宏毅《机器学习》国语课程(2020)_哔哩哔哩_bilibili上面第一个是最新版,该课程的官网,附带Homework,但视频链接是Youtube,可能很多小伙伴无法打开。第二个是B站的搬运课程,其实内容差不多。
2.【完结】动手学深度学习 PyTorch版这是李沐老师2021年发布的深度学习视频,其中所有东西都手把手逐行讲解了代码,跟着课程自己独立把里面的代码部分都敲一遍!
接下来需要入手NLP部分了,理论部分推荐学一遍Stanford CS224n,我认为这是NLP讲的最好的课程之一!
3. 【双语字幕】斯坦福CS224n《深度学习自然语言处理》课程(2019) by Chris Manning按照惯例,该NLP实践了,这里推荐邱锡鹏老师的NLP-Beginner:自然语言处理入门练习
4. NLP-Beginner:自然语言处理入门练习恭喜你,做完以上部分,你已经成功入门了NLP!
<hr/>多说几句,打好基础以后如何再进一步开始nlp research呢?
分享一下我自己入坑一个新领域时如何找论文,首先在GitHub上找找看有没有别人总结好的paper list,下面这个 @Gordon Lee 大佬的repo总结了绝大部分NLP方向的paper list,可以先在这里找找看。
GitHub - Doragd/Awesome-Paper-List: A curated list of repositories in which many NLP/CV/ML papers and related area resources are collected.那么读完了list上论文后怎么找最新论文呢?

  • arxiv的CL板块,很多论文在被accept之前都会放到arxiv上,不夸张的说你可以在这里看到最新的研究成果!
Computation and Language2. ACL Anthology,这里汇总了几乎所有NLP相关会议、期刊的论文合集!
AI Conference Deadlines
