Why ClientRequest.* and storage.* metrics are not reported with builtin ConsoleReporter in DSE Cassandra?
I am trying to get the metrics from DSE Cassandra(dse: 5.1.0, Cassandra :188.8.131.522) using builtin reporters like ConsoleReporter. I could able to get all the metrics except the metrics under ClientRequest.* and Storage.* even though I have reads/writes to this cluster . The only metric under ClientRequest.* group is
I tried with different reporter config, but no luck and I didn't find any JIRA associated to this as well. The same behavior with StatsD Reporter as well.
Here is the reporter config with wildcard whitelist
outfile: '/tmp/metrics.out' period: 10 timeunit: 'SECONDS' predicate: color: "white" useQualifiedName: true patterns: - ".*"
Both the ClientRequest and Storage metrics are critical for me . Is any body has any pointers why I am not getting these metrics? I appreciate any insights on resolving this issue.
See also questions close to this topic
DataStax Hive Query Performance Issue -
I have been noticing performance issues with query running on DSE through spark hive.
CREATE TABLE tests( id text, user text, aname text, iname text, gentime bigint, snapdate bigint, action text, acount bigint, line bigint, PRIMARY KEY ((id, user), aname, iname, gentime, snapdate, action) ) WITH CLUSTERING ORDER BY (aname ASC, iname ASC, gentime ASC, snapdate ASC, action ASC)
select * from tests where (id='a37') and (aname='ABC') and (iname = 'ABC1') and (user is not null) and (gentime = 1520985600000) group by user, snapdate
2018-03-16 05:44:43,404 Stage-1 map = 100%, reduce = 32%, Cumulative CPU 4110.32 sec 2018-03-16 05:44:44,407 Stage-1 map = 100%, reduce = 32%, Cumulative CPU 4110.32 sec 2018-03-16 05:44:45,410 Stage-1 map = 100%, reduce = 32%, Cumulative CPU 4110.32 sec 2018-03-16 05:44:46,412 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 4110.32 sec 2018-03-16 05:44:47,416 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 4110.32 sec 2018-03-16 05:44:48,420 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 4110.32 sec 2018-03-16 05:44:49,426 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4111.27 sec 2018-03-16 05:44:50,428 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4111.27 sec 2018-03-16 05:44:51,431 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4111.27 sec 2018-03-16 05:44:52,434 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4111.27 sec 2018-03-16 05:44:53,437 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4111.27 sec MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 31 seconds 270 msec Ended Job = job_201801161541_0002 MapReduce Jobs Launched: Job 0: **Map: 167** Reduce: 1 **Cumulative CPU: 4111.27 sec** HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 days 1 hours 8 minutes 31 seconds 270 msec
To troubleshoot the reason for so many mappers and CPU latency problems, I have started reading about DSE configurations.
So far, I have found about
vnodesconfiguration in DSE Cassandra and that could be the potential issue.
I have also verified the value of
cassandra.yamlfile and yes, it is configured. I don't know the logic behind configuration.
- why so many mappers?
Job 0: **Map: 167**
- why is one executor being overloaded with multiple tasks?
- Do we need to disable vnodes to solve the problem of multiple mappers?
- CassandraSQLContext vs HiveContext. Is there any difference in terms of performance?
- why so many mappers?
SparkR datastax cassandra connection Error Rstudio
How to setup a sparkR session in rstudio that connect to a dse analytics cluster (datastax/cassandra/spark)?
I'm running into a error with the following setup.
Cassandra (datastax dse 5.1.5/ spark 2.0.2) via Rstudio with sparkR (3.1.1).
library(SparkR, lib.loc="/usr/share/dse/spark/R/lib") enter code hereSys.setenv(SPARK_HOME = "/usr/share/dse/spark") .libPaths("/usr/share/dse/spark/R/lib") mySparkEnvironment <- list( spark.cassandra.connection.host="<ip>", spark.cassandra.connection.port=9042, spark.cassandra.auth.username="<username>", spark.cassandra.auth.password="<password>") sc <- sparkR.session(master = "spark://<ip>:7077", sparkJars = '/usr/share/dse/spark/lib/spark-cassandra-connector-unshaded_2.11:jar:2.0.7', config = mySparkEnvironment) cass <- read.df(NULL, source = "org.apache.spark.sql.cassandra", keyspace="mvp", table="mock_data") collect(cass) > sc <- sparkR.session(master = "local", sparkJars = '/usr/share/dse/spark/lib/spark-cassandra-connector-unshaded_2.11:jar:2.0.7', config = mySparkEnvironment) Spark package found in SPARK_HOME: /usr/share/dse/spark Launching java with spark-submit command /usr/share/dse/spark/bin/spark-submit --jars /usr/share/dse/spark/lib/spark-cassandra-connector-unshaded_2.11:jar:2.0.7 sparkr-shell /tmp/RtmpOaDOLh/backend_port7135393f3f83 Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Seq at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: scala.collection.Seq at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 1 more
Prevent tombstones creation
I need to perform an insert to Cassandra table without creating tombstones for any column. I am using a query similar to this :
insert into my_table(col1,col2,col3) values(val1,val2,null)
where col1, col2 and col3 are all the attributes in my_table. Is there any other solution or workaround to prevent tombstone creation for say
col3apart from passing only non-null attributes in our query and letting cassandra set the remaining attributes to null?
Python cassandra driver installation
I have an installation shell script which installs the Cassandra python driver like below,
#6) Install the Cassandra-Driver echo "5) Install the cassandra-driver" echo $password | sudo -S pip install cassandra-driver >/dev/null echo " Installed the cassandra-driver"
Now the problem is for the first time,
sudo -S pip install cassandra-driveris taking around 15 mins to install. Is there a better way to install this like local pip repository (or) Is it possible to package this so that I just want to unzip and run.
Does Cassandra need all the servers to have the same time?
I have a .NET application on a Windows machine and a Cassandra database on a Linux (CentOS) server. The Windows machine might be with a couple of seconds in the past sometimes and when that thing happens, the deletes or updates queries does not take effect.
Does Cassandra require all servers to have the same time? Does the Cassandra driver send my query with timestamp? (I just write simple delete or update query, with not timestamp or TTL).
Update: I use the Datastax C# driver
Cassandra Batch statement-Multiple tables
I want to use batch statement to delete a row from 3 tables in my database to ensure atomicity. The partition key is going to be the same in all the 3 tables. In all the examples that I read about batch statements, all the queries were for a single table? In my case, is it a good idea to use batch statements? Or, should I avoid it?
I'm using Cassandra-3.11.2 and I execute my queries using the C++ driver.
DropWizard Metrics additional healthcheck registry
We need an additional healthcheck registry while using DropWizard Metrics. By default there is only a single global health check registry. Is it possible to create one more health check registry and http endpoint for it? Please clarify.
Report custom metrics for microservices
We are using dropwizard that includes the codehale's Metrics library. I want to use this library for reporting some metrics from a Microservice application. We have a custom metrics reporting engine that processes metrics emitted by our internal applications but expects the Metrics to report Http status/counter. What is the best way of making the Metrics library to report metrics in a custom format? When we use @Timed on method we are able to achieve it. If we use @Timed we are not able to use our custom metrics(Mean/TPS/Count).
I could not find anything about @Timed custom reporting in Metrics manual or on google. This also brings me to the question "is this the right thing to do in the first place?"
Any suggestions/ideas are welcome. Thanks
Dropwizard Metrics - Mean / Average Response Time
Recently I have integrated my web application (using Spring REST) with dropwizard metrics. I am using Timer and Meter as and when required for my APIs. I am able to derive all metrics like m1-rate, m5-rate etc however average response time - mean and p99/p999 aren't looking convincing.
My aim is to keep a track of average response time of my API at any point in time for which I am using mean. However the mean doesn't look correct and I don't see my graphite graph changing as per the increase / decrease of the response time. The access log of apache shows me varied response times than the ones over graph.
I am confused. Any leads on this would be helpful. Thanks!