Is it possible to create a new table by joining existing two athena tables in AWS?
I have two different tables A and B with its own schema structure whose data comes from a s3 bucket location. I would like to join the table and create a new table in athena. Is this possible in AWS Athena?
See also questions close to this topic
-
SQL query to calculate the OS Type and OS Language (SQL Server)
I did research and could not find any query to calculate the OS Type and OS Language. Can anyone help me with the same
-
Group by similarity in Oracle SQL
I have a table with several entries that are mostly equal, except for some tokens in them. These are messages returned from a web service and they must be stored this way.
Given this example query:
SELECT entry_error.error_desc, Count(entry_error.id) FROM entry_message err_entries full outer join entry_error ON err_entries.id = entry_error.id full outer join error_code ON entry_error.error_code = error_code.error_code WHERE NOT EXISTS(SELECT id_father FROM entry_message creator WHERE err_entries.id = creator.id_father) GROUP BY entry_error.error_desc;
I get an output like this:
entry_error.error_desc count(entry_error.id) First Sample Text: 321; Second Sample Text: 123; 1 First Sample Text: 456; Second Sample Text: 654; 1 First Sample Text: 789; Second Sample Text: 987; 1
But I'd like it to be something like:
entry_error.error_desc count(entry_error.id) First Sample Text: {0}; Second Sample Text: {1}; 3
Is it possible to do this directly in my query?
EDIT: Notice that the messages are just a sample and there are several different ones. They cannot be explicitly written in the query. I need the query to group similar generic messages, (using something like UTL_MATCH, idk), that are X% similar.
-
Update msAccess SQL with subquery in Where
I have 2 tables: tblAbsence
Name Start End Status John 4/2/18 4/5/18 Approved Sue 5/1/18 5/10/18 Denied
and tblManning
Date Required 4/1/18 3 4/2/18 4 4/3/18 2
I would like to be able to update Status if to "approved" as long as the Required during their absence request didn't exceed the limit (4 for example).
My best effort:
UPDATE tblAbsence SET tblAbsence.Status = "Approved" WHERE (select Required FROM tblManning WHERE tblManning.Date>tblAbsence.Start AND tblManning.Date<=tblAbsence.End + #23:59:59#) < 4;
tells me the
subquery
only returns at most one record. My logic/skill capacity can't drink punch my desire is pouring...my wife tells me the same thing. -
boto3 check if Athena database exists
Im making a script that creates a database in AWS Athena and then creates tables for that database, today the DB creation was taking ages, so the tables being created referred to a db that doesn't exists, is there a way to check if a DB is already created in Athena using boto3?
This is the part that created the db:
client = boto3.client('athena') client.start_query_execution( QueryString='create database {}'.format('db_name'), ResultConfiguration=config )
-
Create web service gate in AWS with AD authentication in Azure
I have Web API C# service in AWS. Also I have AD in Azure.
I need to setup some AWS service in front of API which will proxy requests to the real API (ec2), but also will validate auth token e.g. Azure API Management with ADFS
This can be a balancer, api gateway, etc.
What service in AWS can be used for this purpose?
-
Using aws cli to elicit "directory structure"/partitions of parquet
I've got a parquet object saved to S3, with directory structure like:
/root/key1=a/key2=1/part-00000-gobbledygook.gz.parquet /root/key1=a/key2=1/part-00001-gobbledygook.gz.parquet /root/key1=a/key2=2/part-00000-gobbledygook.gz.parquet /root/key1=a/key2=2/part-00001-gobbledygook.gz.parquet /root/key1=b/key2=1/part-00000-gobbledygook.gz.parquet /root/key1=b/key2=1/part-00001-gobbledygook.gz.parquet /root/key1=b/key2=2/part-00000-gobbledygook.gz.parquet /root/key1=b/key2=2/part-00001-gobbledygook.gz.parquet
key1=a/key2=1
etc. determine the partition structure of myparquet
s. I'd like a flexible/efficient way to learn the unique partitions.In practice, there may be a large number of
parquet
files (e.g. 200), so e.g. runningaws s3 ls s3:/root/ --recursive
will return 200x too many results, which can potentially slow down the command significantly (even though piping to, e.g.,
grep
would likely solve the problem).The man page suggests I'm out of luck (it was possible
--page-size
might fit my purposes, but this option doesn't work as I'd expected, e.g. specifying--page-size 1
didn't lead to just returning a single file in the output).Is running
--recursive
and piping the output the only option?aws s3 ls s3:/root/ --recursive | grep part-00000
Returns the correct number of files, but the first step overcounts significantly.
Alternatively, is there another pre-installed command line interface for reading the partition structure of a
parquet
object from itss3
path? -
HIVE_CURSOR_ERROR: Unexpected end of input stream
I'm moving the data from Mysql to S3 using data pipeline and it creates empty file for couple of days. I believe, it is making my athena query fails with
"HIVE_CURSOR_ERROR: Unexpected end of input stream".
Below is my script
CREATE EXTERNAL TABLE `test`( `col0` bigint, `col1` bigint, `col2` string, `col3` string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://dummy/'
Could you please let me know if there is any option to skip zero bytes S3 file?
-
Create a external table aws athena for a CSV
I`m trying to run a query in AWS Athena to create an external table mapping bunch of tables from one of my S3 buckets.
the thing is that every time I run the query I get an error that I`m not able to fire it out.
table: (code reference here)
CREATE EXTERNAL TABLE IF NOT EXISTS cust_cr_booking_orders ( field1 string, field2 int, field3 int ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'separatorChar' = ',' ) STORED AS TEXTFILE LOCATION 's3://My bucket/';
This is the retrieve output.
Workaround: I have created the table thru Glue, however, I want to know what this means. I checked the bucket permissions and I granted to put and get so should be enough if Glue is able to go thru.
thanks so your comments folk.