Announcing Amazon Q – GenAI

By Jay Almers, Senior Solutions Architect

At AWS re:Invent 2023, Amazon announced the addition of an exciting new service that has been added to their already-impressive AI portfolio called Amazon Q. What makes this service so intriguing, especially in today’s world of GenAI, or Generative Artificial Intelligence, is the data sources the AI models are trained on.

I asked one of the most notable GenAI systems available to the general public, ChatGPT (the chat-based Generative AI solution developed by OpenAI) to describe the data sources its models were trained on. These data sources include a wide selection of books from numerous genres and time periods, websites that cover topics like news, science, art, culture and forums. It also sourced its data from scientific papers, including technical and scientific information from many different fields and areas of study, as well as “common crawl data” and “other textual sources” which are comprised of newspapers, magazines, articles and other written content available on the Internet. An important thing to note about ChatGPT and similar solutions is that its responses may not include the most accurate or up-to-date data as there is a specific cutoff date for which it has data; as of this post, this cutoff date is April 2023. That’s where Amazon Q enters the conversation.

A first person response from ChatGPT

So, how is Amazon Q different?

Much like ChatGPT and other GenAI systems out there, Amazon Q takes a starting question (prompt) and generates and returns a response based on the data sources its models had been trained on. Using natural language processing, the AI solution is usually able to interpret and understand the context and subject matter being asked of it. It also has a concept of tonality, complexity, or if it needs further clarification because the initial prompt was too vague. But this is where Amazon Q seemingly splits away from the other GenAI solutions that are available today. According to AWS, you can use Amazon Q to have conversations, solve problems, generate content, gain insights and take action by connecting to your company’s information repositories, code, data, and enterprise systems. I emphasized this last part because of this major shift how the solution is trained. Rather than OpenAI’s data sources being derived from the “Public Web”, Amazon Q can be connected to an organization’s internal documents, code, and proprietary information so that the information it returns is organization centric.

Imagine the Human Resources department of your organization needing to sift through countless procedural documents to change a VERY legally problematic occurrence of the word “and” to “or”. Utilizing an AI solution that has been trained on your organization’s documents to do the monotonous task of sifting through that information for you would be a tremendous time saver.

Now imagine you have lost a couple of key software engineers or application developers and their documentation is not as polished as we’d all like. Having a solution that could be trained on your own proprietary source code could provide you with a deeper understanding of the code as well as provide your current and future developers with the starting point they are looking for where the documentation is lacking.

If Amazon Q is connected to our data, how do we keep it secure?

Amazon Q, like every other AWS service, follows a shared security model where AWS is responsible for protecting all the infrastructure that runs the AWS Cloud globally and the customer is responsible for securing and protecting the content that runs within the AWS Cloud. Amazon Q uses “data source connectors” to access and crawl your organization’s data, such as Confluence Wikis, GitHub repositories, and Salesforce instances to name a few, for use in its training and responses. Using these connectors stores credentials in AWS Secrets Manager and IAM policies to control access to these secrets.

It is important to note that AWS warns users and organizations to follow general security best practices as it pertains to unintentional disclosure of sensitive information:

“We strongly recommend that you never put confidential or sensitive information, such as your customers’ email addresses, into tags or free-form text fields such as a Name field. This includes when you work with Amazon Q or other AWS services using the console, API, AWS CLI, or AWS SDKs. Any data that you enter into tags or free-form text fields used for names may be used for billing or diagnostic logs. If you provide a URL to an external server, we strongly recommend that you do not include credentials information in the URL to validate your request to that server.”

We are certainly just getting started in the world of Generative AI and with all new technologies, there’s still a lot of refinement that needs to be done as use-cases and enhancements are developed. Artificial Intelligence is here to stay and what will end up setting individuals and organizations apart will be how they choose to approach AI; resist it and be swallowed by the wave or embrace it and use it as a tool to make better use of our most precious and valuable commodity – time.