Ashish Vaswani

Ashish Vaswani is an artificial intelligence researcher and computer scientist who worked at Google Brain for over five years and also co-founded AI start-up Adept. 

Education and early career

Vaswani spent time as a summer intern at both IBM and Google before earning a Ph.D. at the University of Southern California (USC). There, he worked under the supervision of Dr. Liang Huang and Dr. David Chiang.

In a subsequent interview, Chiang said that “Ashish was my first PhD student and one of the very first people to see the potential for deep learning in natural language processing back in 2011.” 

Chiang also referred to Vaswani as a “visionary” who built GPU workstations in his USC office at a time when few people understood their importance in AI and natural language processing. 

Information Services Institute at USC

Vaswani then worked as a computer scientist in the Natural Language Group at USC’s Information Services Institute (ISI) for two years. 

The ISI is a leading research institute that focuses on interdisciplinary research in domains such as computer science, artificial intelligence, NLP, machine learning, computer vision, and information processing. 

Vaswani was managed by former USC Professor of Computer Science Kevin Knight who, between 2018 and 2022, was employed at Chinese mobility tech company Didi Global as an NLP scientist.

Vaswani recalls his time at USC fondly since he was afforded the freedom to explore new ideas and indulge in his passion for deep learning.

On the school’s website, he said that “Everything I learned at USC shaped how I do my research and how I learn and absorb information. It was a vibrant, tremendous research group pursuing bold ideas, and that’s rare.”

Google Brain

Vaswani then joined Google Brain as a staff research scientist in July 2016. At Google, he co-authored the now infamous 2017 paper entitled Attention Is All You Need with a team of researchers that included Niki Parmar, Noam Shazeer, and Lukasz Kaiser. 

The paper, which details research on pure attention-based models such as transformers, has been cited more than 50,000 times and led to significant advances in the capabilities of models such as GPT, Google’s BERT, and Microsoft’s MT-DNN.

Vaswani’s contribution to transformer models and AI

The transformer model is a deep learning architecture that has revolutionized natural language processing tasks such as machine translation, language understanding, and sentiment analysis

In the above-mentioned paper, Vaswani introduced transformers as a new paradigm for sequence-to-sequence modeling without using recurrent neural networks (RNNs) or convolutional neural networks (CNNs). 

Instead, the transformer model’s self-attention mechanism allows it to capture long-range dependencies and process sequences in parallel, which makes it highly efficient and effective for a wide range of language tasks. 

Transformers have since become a foundational concept in many state-of-the-art AI models and, as we touched on earlier, are now widely referenced and utilized in both academia and industry.


Later, Vaswani referred to the development of ChatGPT as a “clear landmark in the arc of AI” and that “We’re seeing the beginnings of profound tools for thought that will eventually make us much more capable in the digital world.”

But this idea was never Vaswani’s intention while he was working on the transformer architecture. Instead, he wanted to develop a single model that would ”consolidate all modalities and exchange information between them, just like the human brain.”

Other research

Vaswani has also researched ways to improve the efficiency and scalability of machine learning models. To that end, he has developed various techniques for model compression, quantization, and acceleration to substantially advance the field of efficient deep learning. 

These techniques enable machine learning models to run faster and consume less computational resources, which makes them suitable for deployment in resource-constrained environments like mobile devices.

What’s more, Vaswani’s work on efficient deep learning has been highly influential and has helped democratize AI by making it more accessible and practical for a wider range of applications.

Adept AI Labs

On April 27, 2022, Vaswani announced on Twitter that “After 5+ wonderful years in Google Brain, working at the forefront of ML alongside inspiring colleagues, I’m excited to share my new adventure. We started Adept with the mission to build the future of human-computer collaboration.”

Among Adept’s co-founders were Google Brain colleagues Niki Parmar and Anmol Gulati who created Google’s speech recognition model. Also on board were former Google software engineers Fred Bertsch, Max Nye, and Augustus Odena who built the company’s code generation model.

Vaswani served as Adept’s chief scientist for most of 2022. In September, he released another tweet announcing a new model called Action Transformer (ACT-1). In a video demonstration of the model on real estate platform Redfin, a user types the prompt “Find me a house in Houston that works for a family of 4. My budget is 600K” and the model populates a list with eligible properties.

Stealth Startup

According to his LinkedIn profile, Vaswani left Adept in a partial or complete capacity in November 2022 for reasons undisclosed. 

He then co-founded “Stealth Start-up” in December with details similarly scarce at the time of writing. Similar to Adept AI before it emerged in April 2022, it is likely the company is a stealth mode start-up that works on a product or service in secret before a public unveiling at some future date.

Later, tech news website The Information reported that Vaswani had been joined at the new company by co-founder and long-time colleague Niki Parmar. Parmar was also a senior research scientist at Google Brain and co-author of the seminal paper Attention Is All You Need.

Key takeaways

  • Ashish Vaswani is an artificial intelligence researcher and computer scientist who worked at Google Brain for over five years and also co-founded AI start-up Adept.
  • Vaswani worked as a computer scientist in the Natural Language Group at USC’s Information Services Institute (ISI) for two years. He then worked at Google Brain for another five years where he co-authored the influential paper Attention Is All You Need.
  • Vaswani co-founded Adept with various former colleagues and contacts from Google and elsewhere. According to his LinkedIn profile, Vaswani left Adept in a partial or complete capacity in November 2022 for reasons undisclosed.

Connected Business Model Analyses


Generalized AI consists of devices or systems that can handle all sorts of tasks on their own. The extension of generalized AI eventually led to the development of Machine learning. As an extension to AI, Machine Learning (ML) analyzes a series of computer algorithms to create a program that automates actions. Without explicitly programming actions, systems can learn and improve the overall experience. It explores large sets of data to find common patterns and formulate analytical models through learning.

Deep Learning vs. Machine Learning

Machine learning is a subset of artificial intelligence where algorithms parse data, learn from experience, and make better decisions in the future. Deep learning is a subset of machine learning where numerous algorithms are structured into layers to create artificial neural networks (ANNs). These networks can solve complex problems and allow the machine to train itself to perform a task.


DevOps refers to a series of practices performed to perform automated software development processes. It is a conjugation of the term “development” and “operations” to emphasize how functions integrate across IT teams. DevOps strategies promote seamless building, testing, and deployment of products. It aims to bridge a gap between development and operations teams to streamline the development altogether.


AIOps is the application of artificial intelligence to IT operations. It has become particularly useful for modern IT management in hybridized, distributed, and dynamic environments. AIOps has become a key operational component of modern digital-based organizations, built around software and algorithms.

Machine Learning Ops

Machine Learning Ops (MLOps) describes a suite of best practices that successfully help a business run artificial intelligence. It consists of the skills, workflows, and processes to create, run, and maintain machine learning models to help various operational processes within organizations.

OpenAI Organizational Structure

OpenAI is an artificial intelligence research laboratory that transitioned into a for-profit organization in 2019. The corporate structure is organized around two entities: OpenAI, Inc., which is a single-member Delaware LLC controlled by OpenAI non-profit, And OpenAI LP, which is a capped, for-profit organization. The OpenAI LP is governed by the board of OpenAI, Inc (the foundation), which acts as a General Partner. At the same time, Limited Partners comprise employees of the LP, some of the board members, and other investors like Reid Hoffman’s charitable foundation, Khosla Ventures, and Microsoft, the leading investor in the LP.

OpenAI Business Model

OpenAI has built the foundational layer of the AI industry. With large generative models like GPT-3 and DALL-E, OpenAI offers API access to businesses that want to develop applications on top of its foundational models while being able to plug these models into their products and customize these models with proprietary data and additional AI features. On the other hand, OpenAI also released ChatGPT, developing around a freemium model. Microsoft also commercializes opener products through its commercial partnership.


OpenAI and Microsoft partnered up from a commercial standpoint. The history of the partnership started in 2016 and consolidated in 2019, with Microsoft investing a billion dollars into the partnership. It’s now taking a leap forward, with Microsoft in talks to put $10 billion into this partnership. Microsoft, through OpenAI, is developing its Azure AI Supercomputer while enhancing its Azure Enterprise Platform and integrating OpenAI’s models into its business and consumer products (GitHub, Office, Bing).

Stability AI Business Model

Stability AI is the entity behind Stable Diffusion. Stability makes money from our AI products and from providing AI consulting services to businesses. Stability AI monetizes Stable Diffusion via DreamStudio’s APIs. While it also releases it open-source for anyone to download and use. Stability AI also makes money via enterprise services, where its core development team offers the chance to enterprise customers to service, scale, and customize Stable Diffusion or other large generative models to their needs.

Stability AI Ecosystem


About The Author

Scroll to Top