I would like to migrate one of my on-premise data ingestion solution to AWS. The Solution that I have currently is implemented in an on-premise single Redhat enterprise Linux server. Daily all throughout the day – I have hundreds of Thousands of files with file sizes from 1 KB up to 10 GB are pushed/processed into the server. Files pushed to my server are encrypted and compressed. I need to design a solution in AWS that would need to decrypt and then unzip the file before storing the data in the storage location. The average processing time for each file ranges from a few seconds up to 30 minutes. Due to the increase in the number of files and the varying file sizes to be processed, the single Linux server is turning into bottleneck. Expectation is that the number of files would be increased by 25 times in the next 12 months and looking for a highly available, scalable, cost-effective and secure solution for this data ingestion. Any suggestions around this ?
Are you looking to bypass the RHEL server entirely and directly ingest all data directly to AWS? Or do you want to continue compressing and encrypting on premise before moving data to the AWS?
If you just need to get the data off the RHEL instance and put the data in the could, then deprecate the on premise server, I would use Snowball https://aws.amazon.com/snowball/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc.
If you need to keep the RHEL server and ingest data to AWS regularly, I would look into a services like Storage Gateway or Direct Connect https://aws.amazon.com/cloud-data-migration/.
There is also DataSync, but I don’t recall hearing about that when I originally certified back in the day so I guess it’s relatively new. https://aws.amazon.com/datasync/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc.