M

This essay is now a Draft

The list of questions relevant to this contest is here. Once you submit your essay, it will be available for judges to review and will no longer be able to edit it. Please make sure to review the eligibility criteria before submitting. Thank you!

Pending

This content now needs to be approved by community moderators.

Submitted

This essay was submitted and is waiting for review.

{{qctrl.question.primary_project.name}}

AI Safety ∩ AI/DL Research

by BellardiaLemonus {{qctrl.question.publish_time | dateStr}} Edited on {{qctrl.question.edited_time | dateStr}} {{"estimatedReadingTime" | translate:({minutes: qctrl.question.estimateReadingTime()})}}
  • Facebook
  • Twitter

  • This essay was submitted to the AI Progress Essay Contest, an initiative that focused on the timing and impact of transformative artificial intelligence. You can read the results of the contest and the winning essays here.


    Contribution: In this essay, I contribute a first-pass search of research terms relevant to research in AI Safety, Deep Learning (DL), and Artificial Intelligence (AI), using Google Scholar, the LessWrong/EAF API, OpenAlex, and ArXiV [1]. I perform a brief analysis of AI research trends based on term frequency, and find that AI Safety roughly accounts for 0.072% of AI research, and that for every AI Safety search result, there are roughly 754 and 392 search results for research involving AI and DL, respectively. Additionally, I forecast the near-term future of select topics in AI Safety and AI/DL research. These forecasts and this preliminary review of research term frequency could be useful for AI risk reduction (1) through detailing which topics in AI Safety might be neglected, (2) through capturing how neglected AI Safety might be as a whole, relative to general research in AI/DL, and (3) through assessing how certain areas within DL that are relevant to AI risk might change over the coming years.

    Linkpost/Footnotes: https://rodeoflagellum.github.io/for_dl_ai_safety/


    Outlook

    Here I examine the landscape of AI Safety and introduce some questions relating to how the general AI/DL research community handles AI risk

    AI safety, as a distinct discipline, is relatively new; the earliest occurrence of "AI Safety" or "Safe AI" I could find in academic literature appears to be from 1995, in M. G. Rodd's publication Safe AI - is this Possible?, but the origins of AI Safety as a field are less obvious [2]. In any case, the field is growing rapidly, both in terms of popularity [3] and funding.

    In 2014, spending on strategical and technical interventions totaled ~1.75 million USD between the Future of Humanity Institute (FHI) and the Machine Intelligence Research Institute (MIRI), two of the field's progenitors, and grew to at least ~9.1 million USD in 2017 (distributed across many new organizations), a ~5.2 fold increase [4].

    More recently, Open Philanthropy donated ~80 million USD across 2019 and 2020 towards reducing risks from AI. Should Open Philanthropy continue exist, the Metaculus community predicts that the funding for AI Safety will continue to increase, with median predictions of 78 million USD and 121 million USD in funding for the years 2025 and 2030, respectively.

    Why so much newfound funding for decreasing risk from AI? Nestled within the wider Deep Learning (DL) and AI research community, the field of AI Safety stems from the concern that AI systems can be deleterious, in a variety of ways minor or grave, to humanity, presently and in the future. AI systems and their wrath are often themes of science fiction, captured by systems such as HAL 9000 or Prime Intellect. Perhaps the earliest formulation of the threat of AI systems was in Samuel Butler's 1863 essay entitled Darwin among the Machines, in which he wrote [5]:

    The upshot is simply a question of time, but that the time will come when the machines will hold the real supremacy over the world and its inhabitants is what no person of a truly philosophic mind can for a moment question.

    The complexity of the aims and problems within AI Safety is driven in part by the complexity of intelligence and human values. From an informal viewpoint, the landscape of AI Safety can be understood by looking at the organizations and individuals who broadcast concern regarding AI and who generate research oriented around these concerns.

    One such map, created in 2017 by Søren Elverlin [6], attempts to capture the AI Safety community; I believe that it's a safe bet that most people who work in AI Safety will have heard of many of the entities listed in Elverlin's map.

    (to see the image in greater detail, click here: https://aisafety.com/2017/09/26/map-ai-safety-community/)

    From a more formal viewpoint, the Future of Life Institute's AI Safety map [7] has AI Safety as a root node containing five main branches of concern (Validation, Control, Verification, Security, and Foundations) [8].

    (to see the image in greater detail, click here: https://futureoflife.org/landscape/)

    Speculation is diverse regarding the internal form or the embodiment that an extremely dangerous AI system might take, but researchers often frame risk from AI in terms of what an AI system can achieve rather than in terms of how complex it is or whether it's conscious.

    One benchmark for considering the extent of what sophisticated AI systems might "do" is the degree of change they can engender in human civilization. In this vein of thinking, Open Philanthropy provides the following definition of transformative AI as "AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution" [9].

    As we have seen, AI Safety is a broad field, and transformative AI is but one of many framings for understanding the potential impacts of AI. Given the difficulties of forecasting rare events [10], along with the current consensus that no AGI has ever existed (an empty reference class), predicting the trajectory and impact of transformative AI seems difficult. Nonetheless, it's somewhat plausible that, in the event transformative AI is created, it will stem from the DL research community; presently, 100 Metaculus members assign this possibility (i.e., artificial general intelligence (AGI) will be based on DL) a median probability of 70%. Moreover, 121 Metaculites believe that the date that "the first [strong and robotic] AGI [is] first developed and demonstrated" will be between 2037 and 2084 (1st quartile - 3rd quartile), with a median prediction of 2052.

    The full extent of the poor outcomes for humanity that could be engendered by transformative AI, or by AI systems generally, is beyond the scope of this essay. However, taking the magnitude of the severity of AI risks as given, these predictions, along with other trajectories researched by the AI Safety community, indicate the need for urgent monitoring and governance of AI systems and computing resources, especially in the DL community.

    Instrumentally speaking, ensuring that those in the general AI/DL research community are lucidly aware of the risks from AI might greatly improve the outcomes for humanity. As such, characterizing the size, influence, and general parameters of AI Safety within the broader context of AI/DL research might be a first step towards achieving this, and could benefit investigations on how many bottlenecks in mitigating AI risk relate to funding, interest, or talent.

    So, some questions to address AI Safety's place within the AI/DL community might be:

    • How much has AI safety been researched, relative to the amount of research on AI/DL systems generally?
    • How do the AI Safety community and general AI/DL research community overlap? How much of general AI/DL research, independent of AI Safety, addresses AI risk?
    • How does the amount of interest in, funding in, and participation in the AI Safety community affect progress in AI Safety? (I leave this question untouched).

    In this essay, I attempt to address the italicized questions.


    Trends in Growth

    Here I look at the intersection of AI Safety and AI/DL search terms

    To begin answering these questions, and to lay some groundwork for future meta-science investigations of AI Safety, I decided that a reasonable path would probably involve querying some of the research APIs and databases I frequently use when reading about AI Safety, or about AI/DL, to see how much work had been done in these areas. There are two main groupings I explored:

    • Group 1: The intersection of AI Safety and DL, and the intersection of AI Safety and AI, and the amount of AI Safety research relative to AI/DL research
    • Group 2: Subfields of AI/DL such as Natural Language Processing (NLP), Reinforcement Learning (RL), Few-Shot Learning (FEW), and Multimodal Learning (MM) that are important to AI Safety [11] (this is handled in the next section)

    For all groups, I searched from the years 2000 through 2022, and cut off 2022 in the graphs. Here are the abbreviations: (AR) = ArXiV; (GS) = Google Scholar All Results, no citations; (GS, R) = Google Scholar Reviews only, no citations; and (OA) = OpenAlex. In every search, I looked for exact phrases (e.g., "AI Safety" rather than AI Safety). The raw data for the year 2000 through 2022 can be found in the Linkpost [12].

    Intersection of AI Safety and AI/DL (Group 1)

    Note that "AI Safety + DL" indicates searching "AI Safety", "Deep Learning". I removed the lines for "AI Safety + DL (OA)" and "AI Safety + AI (OA)", as both produced zero results, which is something I don't have an explanation for besides maybe that OpenAlex only searches the title and the abstracts.

    The first noteworthy finding from this search is that AI Safety overlaps more with work on Artificial Intelligence than it does with work on Deep Learning (sum of 265 > 149 for AI, DL respectively, for non-review Google Scholar search), which is not too surprising given that AI is the broader of the two fields. The trend for ArXiV (this may be more useful for detecting research signals in these fields, since Google Scholar produces many non-research results) also reflects this (sum of 51 > 5 for AI, DL respectively).

    Summing the results over the 23 years,

    • AI Safety ∩ DL (i.e., AI Safety work where the phrase "Deep Learning" is at least just mentioned) accounts for 100*(5/67)=7.46% and 100*(149/265)=56.23% of AI Safety work on ArXiV and Google Scholar, respectively (mean: 31.85%).
    • AI Safety ∩ AI accounts for 100*(51/67)=76.12% and 100*(265/265)=100.00% of AI Safety work on ArXiV and Google Scholar, respectively (mean: 88.06%).

    Next, we can look at how much AI/DL is comprised of work on AI Safety.

    AI Safety, Deep Learning, and Artificial Intelligence (Group 1)

    An interesting trend, captured only in the "AI (GS)" and "DL (GS)" group, is the large decrease in search results, which begins in 2018. Perhaps this reversal is related to non-research discussions/work on AI or DL, given that it is not captured in ArXiV's AI and DL results. Given the massive difference between the AI Safety results and AI/DL results, it is useful to get a "close up" before discussing numbers.

    Close Up of AI Safety Queries (Group 1)

    There is steady growth in the number of results on OpenAlex, ArXiV, and Google Scholar Reviews for AI Safety, but they are completely overshadowed by the results for AI + DL.

    Summing the results over the 23 years,

    • The ratio of results for AI Safety:AI, is 1:(50513/67) = 1:~754, 1:(53248/55) = 1:~968, and 1:(604970/265) = 1:~2283 for ArXiV, OpenAlex, and Google Scholar, respectively (mean: 1:1335).
    • Moreover, AI Safety ∩ AI results accounts for 100*(51/50513) = ~0.10% and 100*(265/604970) = ~0.044% of AI results overall for ArXiV and Google Scholar, respectively (mean: 0.072%).
    • The ratio of results for AI Safety:DL, is 1:(26234/67) = 1:~392, 1:(87240/55) = 1:~1586, and 1:(225999/265) = 1:~853 for ArXiV, OpenAlex, and Google Scholar, respectively (mean: 1:1335).
    • Moreover, AI Safety ∩ DL results accounts for 100*(5/26234) = ~0.019% and 100*(149/225999) = ~0.066% of DL results overall for ArXiV and Google Scholar, respectively (mean: 0.0425%).

    So, just looking at ArXiV, a summary statement might be: "Using term frequency as a proxy for neglectedness, AI Safety in DL research is 0.10/0.019 = 5.26 times more neglected than it is in AI research."


    Forecasting

    Here I explore some terms relating specifically to AI Safety and to DL, and comment on forecasts on DL subfield growth

    Fields Related to AI Safety, and AI Safety (Group 2)

    To recap: FEW="Few Shot Learning", MM="Multimodal Learning", RL="Reinforcement Learning", and NLP="Natural Language Processing"; each of these are subfields of DL, and all are relevant to AI Safety in that major breakthroughs in any of them will likey increase the risks from AI. The search queries for I used for these subfields were the same as those used in the Metaculus questions they're associated with (see below).

    The remainder of this essay consists of me synthesizing information to discuss the accuracy of the Metaculus community's current predictions concerning the growth of these subfields. I use the following abbreviations in the graphs below: Cum-Sum for cumulative summation, IQR for Interquartile Range; X%-ile (I meant to write X%-int) for X% ARIMA model Confidence Interval [13]; and BF for Best Fit.

    Before forecasting, I want first to comment on some general trends in the fields related to AI Safety. There is a clear discrepancy between research results for AI Safety and for the subfields of DL, with only Multimodal Learning coming up less frequently in literature than AI Safety. Since around 2015, the growth for AI seems exponential and for DL seems roughly cubic. When solving for r, in the exponential growth formula x(t) = x0 * (1 + r/100)^t, I get r = 51.44901 for the AI results, and r = 47.80855 for the DL results [14].

    Another interesting observation is that NLP is more abundant on ArXiV than DL is, despite NLP often being considered a component subject of DL at large. I was personally surprised that RL wasn't represented more; I thought NLP and RL would have been much closer.

    AI SAFETY

    To date, we have 2 resolved questions where the Metaculus community forecasted ArXiV search results for AI Safety, Interpretability, or Explainability, once for the interval [2020-12-14, 2021-06-14], and another for the interval [2021-01-14, 2022-01-14].

    E-prints on AI Safety, Interpretability, or Explainability (2020-2021)

    E-prints on AI Safety, Interpretability, or Explainability (2021-2022)

    For the [2020-12-14, 2021-06-14] question, the community's final median prediction was 287 (IQR: 239 - 339), and the question resolved as 260. The community's final median predition on the [2021-01-14, 2022-01-14] question was 583 (IQR: 495 - 690), and the question resolved as 560.

    In both cases, the final community prediction was very accurate, which is some evidence towards the community being accurate, down the line, on these types of questions.

    The community is currently forecasting the quantity of AI Safety, Interpretability, or Explainability ArXiV results for the intervals [2021-01-01, 2026-12-31] and [2021-02-14, 2031-02-14], with median predictions of 6.6k (IQR: 3.8k - 11k) and 12k (IQR: 6.6k - 19k), respectively. So, the community expects something like cubic growth (looking at the medians for 2026, 2031) in AI Safety over the coming years.

    While I don't have the data to support this, ancedotally speaking, the amount of funding for AI Safety research appears to be increasing tremendously (the earlier Open Philanthropy questions I included support this idea), which is something I believe might explain some of the community's forecasts, as more funding typically produces more research.

    The simple ARIMA model's results are mostly in line with the community's median forecasts, and the lines of best fit for 2018 to 2021 (reflecting the growth rate from this period) are out of line, indicating that the community expects that the next eight years of AI Safety research will be quite different from the last four.

    Should the community's predictions come to pass, and should growth in AI and DL research remain approximately exponential and cubic, respectively, then in 2031, there will be ~11.5k (IQR: ~6k - ~18.5k) new results on ArXiV for AI Safety (after taking away the 558 results from 2021), which means there would be around 1557+~11.5k=~13.1k (IQR: ~7.5k - ~20k) total results (the 1557 total by the end of 2021 since 2000 is added) for AI Safety in 2031; this means that the ratio of results for AI Safety:AI would be 1:~45 (IQR: 1:~78 - 1:~29), and the ratio of results for AI Safety:DL would be 1:~21 (IQR: 1:~36 - 1:~13), and improvement over the current ratios of 1:~754 and 1:~392, respectively! [15]. Note that I found the values for the AI and DL results in 2031 by using the exponential growth equation with the corresponding values of r I mentioned earlier.

    E-prints on AI Safety, Interpretability, or Explainability (2021-2026)

    E-prints on AI Safety, Interpretability, or Explainability (2021-2031)

    NATURAL LANGUAGE PROCESSING

    Thus far, 1 question on ArXiV NLP results has resolved (the interval [2021-01-14, 2022-01-14]).

    E-prints on Natural Language Processing (2021-2022)

    Given that the final community median was 9.6k (IQR: 9.0k - 10k), and that the question resolved at ~8k e-prints, it seems tenatively safe to suggest that the community slightly overestimates NLP progress.

    Currently, the community forecasts a median of 110k (IQR: 85k - 142k) ArXiV e-prints on NLP to be written between [2021-01-14, 2030-01-14].

    The simple ARIMA model and line of best fit for the cummatively summed results follow along with the community's forecasts, falling between the median and lower 25% bound, which is might actually be closer to the resolve value if the community is similarly incorrect about NLP progress for this timeline (i.e., it seems likely that ~98k might be very close to the resolve value). The ARIMA model for the unsummed e-print count per year should likely not be trusted, and I do not currently understand "what went wrong".

    Should the community's predictions come to pass, Natural Language Processings's representation on ArXiV will grow around 110k/23183=4.74 (IQR: 3.67 - >6.13) times between 2020-12-31 and 2030-01-14, inclusive (23183 NLP papers were published between 2000 and 2020, inclusive).

    E-prints on Natural Language Processing (2021-2030)

    REINFORCEMENT LEARNING

    Akin to the AI Safety questions, we have two resolved questions to inform ourselves on how accurate the community might be on questions involving RL e-prints.

    E-prints on Reinforcement Learning (2020-2021)

    E-prints on Reinforcement Learning (2021-2022)

    For the [2020-12-14, 2021-06-14] ArXiV RL e-print question, the community's final median forecast was 1.7k (IQR: 1.5k - 2.0k), and the question resolved to ~1.6k, indicating a slight overestimation. On the [2021-01-14, 2022-01-14] question, the community's final median forecast was 4.0k (IQR: 3.8k - 4.3k), and the question resolved to ~3.4k, further indicating a tendency towards overestimation.

    On the two open ArXiV RL e-print questions, the communtity median prediction is 36k e-prints (IQR: 28k - 48k) e-prints and 49k e-prints (IQR: 36k - >50k) published over the intervals [2021-01-14, 2027-01-01] and [2020-12-14, 2031-01-01], respectively.

    The ARIMA models, here, should be perceived of as "the output of an automatic ARIMA model, something that is somewhat interesting and that is perhaps paramaterized imcorrectly, and that should likely not be trusted here, given it's massive deviations".

    The line of best fit for the summed RL results is in accordance with the community's lower bound; I believe the lower bound should actually be slightly lower, given the community's past inaccuracies with RL e-print questions.

    The differences between the predictions of the two open questions on RL e-prints suggest that, in some sense, the community suggests that there the rate of progress in RL will be greater between 2022 and 2027 than between 2027 and 2030. Nothing comes immediately to my mind that would explain this minor shift downwards in the growth rate.

    Should the community's predictions come to pass, Reinforcement Learning's representation on ArXiV will grow around 49000/8309=5.90 (IQR: 4.33 - >6.02) times between 2020-12-31 and 2027-02-14, inclusive (8309 RL papers were published between 2000 and 2020, inclusive).

    E-prints on Reinforcement Learning (2021-2036)

    E-prints on Reinforcement Learning (2021-2030)

    FEW SHOT LEARNING

    E-prints on Few Shot Learning (2020-2021)

    E-prints on Few Shot Learning (2021-2022)

    For the final community median prediction for these questions, we have

    • 747 (IQR: 664 - 846) for [2020-12-14, 2021-06-14], which resolved to 744
    • 1.8k (IQR: 1.6k - 2.0k) for [2021-01-14, 2022-01-14], which resolved to ~1.7k

    This, in my mind, indicates that the community leans towards fairly accurate on the Few Shot Learning questions.

    Presently, the community predicts 13k (IQR: 7.6k - >20k) FEW e-prints for [2020-01-01, 2027-01-01], which the line of best fit for the cummatively summed results supports (the ARIMA model estimates much higher, perhaps due to weighting the 2020-2021 difference too greatly).

    Should the community's predictions come to pass, Few Shot Learning's representation on ArXiV will grow around 13000/1517=8.57 (IQR: 5.00 - >13.18) times between 2020-12-31 and 2027-02-14, inclusive (1517 FEW papers were published between 2000 and 2019, inclusive).

    E-prints on Few Shot Learning (2020-2027)

    MULTIMODAL LEARNING

    E-prints on Multimodal Learning (2021-2022)

    The final community median prediction for the above question suggests we should tentatively trust the community on Multimodal Learning questions:

    • 256 (IQR: 210 - 307) for [2021-12-14, 2022-01-14], which resolved to 248

    Looking at the community's median prediction of 10k (IQR: 5.8k - 18k) e-prints published during [2021-02-14, 2031-02-14], my sense is that the upper bound seems too high. The ARIMA model and, to some degree, the line of best fit support this idea, though I am weary not to trust these measures too much.

    Should the community's predictions come to pass, Multimodal Learning's representation on ArXiV will grow around 10000/432=23.14 (IQR: 13.43 - 41.67) times between 2020-12-31 and 2031-02-14, inclusive (432 MM papers were published between 2000 and 2020, inclusive).

    E-prints on Multimodal Learning (2021-2030)

    Categories:
    Artificial Intelligence
    Computing and Math
    Submit Essay

    Once you submit your essay, you can no longer edit it.