Machine learning is constantly evolving and plays a huge role in the global economy, as it allows for quick and automatic analysis of large portions of data.
In order to bring machine learning technology even closer to programmers, Amazon currently offers over 10 machine learning and artificial intelligence services on its AWS platform. With these services, you can start building models in a simple way, which can raise your business to the next level.
Most of these services are fully managed, which means that in order to use them, you do not need any machine learning experience as these tools leverage pre-trained models for working with data. Depending on your business problem, you can choose from pre-trained ML services in areas like computer vision, natural language processing, recommendations, and forecasting. Figure 1 shows a Machine Learning solution workflow, along with AWS tools that can be used at each of the stages.
The most important element in creating ML solutions is data. There are 3 types of data: structured, semi-structured, and unstructured.
Structured data’s elements are addressable and can be stored in a relational database. This type of data has a predefined schema. An example of structured data is a relational database with numeric and string (text) data.
Semi-Structured datasets do not reside in relational databases, but they nonetheless have some predefined elements (schema) that make them easier to analyze. Examples of semi-structured data file types are XML, HTML, RDF or JSON.
Unstructured data is everything else. This data type doesn’t have a predefined structure and they are usually stored as a set of files. The most popular unstructured data examples are text documents, photos, video, and audio files and application logs.
AWS Kinesis service ingests data that can be generated continuously from various sources, e.g. web and mobile applications. It is a real-time data streaming service that can very quickly capture gigabytes of data. Kinesis offers the following tools:
- Kinesis Video Streaming - a tool which can help you stream video from devices to AWS
- Kinesis Data Streaming - a tool which can help you collect data such as IT logs, website clicks or financial transactions
- Kinesis Data Firehose - a tool to load streamed data into data stores (e.g. S3, Redshift) or analytics tools
- Kinesis Data Analytics - a tool which processes streamed data in real-time with SQL or Java
Another AWS service that can help with data loading is Glue which is managed by Apache Spark. It is an extract, transform, and load tool (ETL) which can be used to prepare data before it is used for analytics. Glue can work with both structured and semi-structured data. Glue’s elements are Data Catalog, ETL engine, and a scheduler. The Glue Data Catalog is the most important part of the tool. It saves the metadata about the given data, automatically discovered by crawlers that go through the data sources and detect their schema. ETL engines can generate Python and Scala code for use in the ETL process for non-programming users. It can also process data with a code provided by the user. The scheduler can monitor jobs, run tasks, and trigger them based on some events (e.g. at a specific time on every Monday, or when another task is completed successfully or fails).
Machine Learning Tools
After we have collected the data we need, we can start building our ML solutions. AWS offers a few Machine Learning Tools that can process data of various types. Let us now take a look at each of these tools, and present their main possible areas of application in business.
SageMaker is most useful for machine learning developers and data scientists. This service is a complete solution that helps take machine learning models from concept to production with minimal effort. Amazon SageMaker has a rich set of tools (Ground Truth, Notebooks, Experiments, Debugger, Model Monitor, Neo) that can help in labeling data, building, optimizing, training, testing and deploying models. Finding the right algorithm manually for a given problem often requires hours of training and testing. SageMaker has an AutoPilot option, which uses 50 different pre-trained ML models to automatically find the best ML model for the case at hand. This solution can be used by developers to quickly find a baseline model.
Personalize is a machine learning service which helps to build recommendation systems. Personalize can process activity streams from applications, e.g. clicks, page views, purchases, and use them to create personalized recommendations. You can also use additional information about your users, such as age, or geographic location. Showing recommendation results in your application can be simplified with short API calls. Machine learning technology used in Personalize has been improved for years of use by Amazon.com.
Comprehend is a Natural Language Processing (NLP) service which uses machine learning to extract valuable insights from unstructured textual data. This service applies sentiment analysis, part-of-speech extraction, and tokenization to detect key features of text. Comprehend can be helpful in understanding how positive or negative a given text is. Comprehend has an additional tool called Amazon Comprehend Medical, which is dedicated specifically to the medical industry. Amazon Comprehend Medical can analyze medical documentation (like medical records of patients, clinical notes) and extract information about medications, doses and frequencies. Comprehend is a fully managed service.
Forecast uses machine learning to build time-series prediction models. It can combine historical time series data with additional variables (which you believe may impact forecasts) to build predictive models. This Amazon solution can be applied for predicting values like stock prices or customer product demand. Forecast is also a fully managed service, and can be scaled to business needs.
Lex uses automatic speech recognition (ASR) to convert speech to text, and natural language understanding (NLU) to recognize the intent of the text. This solution enables the user to build conversational bots. Lex can be used, for example, as a replacement for manual customer support that will automatically answer customer queries. Amazon Lex uses the same deep learning technology as Amazon Alexa (Amazon’s virtual assistant AI).
Polly is a cloud service that uses deep learning algorithms to convert text to lifelike speech. It currently supports 60 male and female voices across 29 languages, including Japanese, Chinese, Korean and Arabic. Polly can also handle time, dates, units, fractions, and abbreviations. This solution allows the user to create applications that can talk.
Fraud Detector is an AWS service that can help identify fraudulent online activities, such as payment frauds or fake accounts. This service is fully managed so a fraud detection model can be created with just a few clicks.
Textract is a service that can automatically read data from scanned documents. Textract can process millions of pages in a matter of hours and can help in automating document workflows. This service is useful in processing documents like loan applications or medical documentation.
Translate is an AWS machine learning serviceable to perform language-to-language text translation. It uses deep learning models to deliver more accurate and more natural sounding translation, compared to traditional statistical algorithms. Translate supports 54 languages (including e.g. Afrikaans, Bulgarian, Estonian), and 2,804 language pairs.
Rekognition is a computer vision service that can recognize objects, people, and text from images and movies. Rekognition is able to identify and compare faces, analyze them and identify some facial features, like mouth, nose, or eyes. Rekognition has a module to automatically detect emotions such as happiness, sadness or surprise in facial images. It can also perform user face verification, which will confirm the user’s identity by comparing the real-time image with the stored reference image.
Deploy ML Solutions
The most widely used method of deploying models is SageMaker Service, which you can use in one of two ways:
- Using SageMaker Hosting Service to set up HTTPS endpoints. In this solution, clients applications send requests to HTTPS endpoints to get predictions from deployed models. To use this solution, you must provide it with your Docker image. If you need to deploy multiple models, you can also use multi-model endpoints.
- Using SageMaker Batch Transform, which helps you to get predictions for an entire dataset. To deploy a model using Batch Transform you need an S3 bucket where the model, datasets and predictions are stored.
The deploying alternative is using AWS IoT Greengrass. This service extends AWS to the internet of things (IoT) devices. Using this service, devices can collect, filter, process data and they also can run Lambda functions, Docker containers and execute predictions based on ML models even without cloud connection. When connected to the internet, Greengrass synchronizes all data with cloud services.
As you can see, Amazon Web Service offers a rich set of tools that can help you to create impactful machine learning solutions for your business. With ML AWS tools you can add new features to your applications, like face detection, chatbots, speech recognition, sentiment analysis of social media content. AWS adds new ML services, based on new use cases, every few months, which makes it one of the fastest-growing platforms for creating AI solutions.
Editor's note: This is a guest post from Miquido. Miquido is an award-winning digital product development company that excels at building AI-driven apps and web services. The laureate of Deloitte Technology Fast 50 CE, the winner of UK App Awards 2018. Certified by Google, covered by TIME & Forbes.