Education and early career
Vaswani spent time as a summer intern at both IBM and Google before earning a Ph.D. at the University of Southern California (USC). There, he worked under the supervision of Dr. Liang Huang and Dr. David Chiang.
In a subsequent interview, Chiang said that “Ashish was my first PhD student and one of the very first people to see the potential for deep learning in natural language processing back in 2011.”
Chiang also referred to Vaswani as a “visionary” who built GPU workstations in his USC office at a time when few people understood their importance in AI and natural language processing.
Information Services Institute at USC
Vaswani then worked as a computer scientist in the Natural Language Group at USC’s Information Services Institute (ISI) for two years.
The ISI is a leading research institute that focuses on interdisciplinary research in domains such as computer science, artificial intelligence, NLP, machine learning, computer vision, and information processing.
Vaswani was managed by former USC Professor of Computer Science Kevin Knight who, between 2018 and 2022, was employed at Chinese mobility tech company Didi Global as an NLP scientist.
Vaswani recalls his time at USC fondly since he was afforded the freedom to explore new ideas and indulge in his passion for deep learning.
On the school’s website, he said that “Everything I learned at USC shaped how I do my research and how I learn and absorb information. It was a vibrant, tremendous research group pursuing bold ideas, and that’s rare.”
Vaswani then joined Google Brain as a staff research scientist in July 2016. At Google, he co-authored the now infamous 2017 paper entitled Attention Is All You Need with a team of researchers that included Niki Parmar, Noam Shazeer, and Lukasz Kaiser.
The paper, which details research on pure attention-based models such as transformers, has been cited more than 50,000 times and led to significant advances in the capabilities of models such as GPT, Google’s BERT, and Microsoft’s MT-DNN.
Vaswani’s contribution to transformer models and AI
In the above-mentioned paper, Vaswani introduced transformers as a new paradigm for sequence-to-sequence modeling without using recurrent neural networks (RNNs) or convolutional neural networks (CNNs).
Instead, the transformer model’s self-attention mechanism allows it to capture long-range dependencies and process sequences in parallel, which makes it highly efficient and effective for a wide range of language tasks.
Transformers have since become a foundational concept in many state-of-the-art AI models and, as we touched on earlier, are now widely referenced and utilized in both academia and industry.
Later, Vaswani referred to the development of ChatGPT as a “clear landmark in the arc of AI” and that “We’re seeing the beginnings of profound tools for thought that will eventually make us much more capable in the digital world.”
But this idea was never Vaswani’s intention while he was working on the transformer architecture. Instead, he wanted to develop a single model that would ”consolidate all modalities and exchange information between them, just like the human brain.”
Vaswani has also researched ways to improve the efficiency and scalability of machine learning models. To that end, he has developed various techniques for model compression, quantization, and acceleration to substantially advance the field of efficient deep learning.
These techniques enable machine learning models to run faster and consume less computational resources, which makes them suitable for deployment in resource-constrained environments like mobile devices.
What’s more, Vaswani’s work on efficient deep learning has been highly influential and has helped democratize AI by making it more accessible and practical for a wider range of applications.
Adept AI Labs
On April 27, 2022, Vaswani announced on Twitter that “After 5+ wonderful years in Google Brain, working at the forefront of ML alongside inspiring colleagues, I’m excited to share my new adventure. We started Adept with the mission to build the future of human-computer collaboration.”
Among Adept’s co-founders were Google Brain colleagues Niki Parmar and Anmol Gulati who created Google’s speech recognition model. Also on board were former Google software engineers Fred Bertsch, Max Nye, and Augustus Odena who built the company’s code generation model.
Vaswani served as Adept’s chief scientist for most of 2022. In September, he released another tweet announcing a new model called Action Transformer (ACT-1). In a video demonstration of the model on real estate platform Redfin, a user types the prompt “Find me a house in Houston that works for a family of 4. My budget is 600K” and the model populates a list with eligible properties.
According to his LinkedIn profile, Vaswani left Adept in a partial or complete capacity in November 2022 for reasons undisclosed.
He then co-founded “Stealth Start-up” in December with details similarly scarce at the time of writing. Similar to Adept AI before it emerged in April 2022, it is likely the company is a stealth mode start-up that works on a product or service in secret before a public unveiling at some future date.
Later, tech news website The Information reported that Vaswani had been joined at the new company by co-founder and long-time colleague Niki Parmar. Parmar was also a senior research scientist at Google Brain and co-author of the seminal paper Attention Is All You Need.
- Ashish Vaswani is an artificial intelligence researcher and computer scientist who worked at Google Brain for over five years and also co-founded AI start-up Adept.
- Vaswani worked as a computer scientist in the Natural Language Group at USC’s Information Services Institute (ISI) for two years. He then worked at Google Brain for another five years where he co-authored the influential paper Attention Is All You Need.
- Vaswani co-founded Adept with various former colleagues and contacts from Google and elsewhere. According to his LinkedIn profile, Vaswani left Adept in a partial or complete capacity in November 2022 for reasons undisclosed.
Connected Business Model Analyses
Stability AI Ecosystem